Friday, Sep 22,2023 – siddharthjain

Update regarding the project: –

To begin with, I solved the issues which were there in the code for linear regression. The stastical parameters and the graph are as follows:

Linear regression graph: –

The visualizations show the relationship between the independent variables (“% INACTIVE” and “% OBESE”) and the dependent variable (“% DIABETIC”) for the test data. In each plot:

The blue points represent the actual “% DIABETIC” values.
The red points represent the predicted “% DIABETIC” values based on the linear regression model.

Key Metrics for the Model:

Mean Squared Error (MSE). Value: -0.400063

This represents the average of the squares of the errors between the predicted and actual values. Lower values are better, but the scale depends on the dependent variable.

R-squared. Value-0.395

This represents the proportion of the variance for the dependent variable that’s explained by the independent variables in the model. The \( R^2 \) value ranges from 0 to 1, with higher values indicating a better fit. An \( R^2 \) value of 0.395 means that the model explains approximately 39.5% of the variability in “% DIABETIC”.

Interpretation:

The \(R^2 \) value of 0.395 suggests that the model explains about 39.5% of the variance in the “% DIABETIC” variable, which is a moderate level of explanation.
The MSE of 0.400 is a measure of the model’s prediction error. Lower values are generally better.
The model efficiency is 39.5%, which I feel is not too great, but this is what could be achieved with the following data points.
To increase the model efficiency, we can do WLS. But still not sure how to implement it. Going to ask on Mondays class.

I would be trying to find a relationship with other parameters which are available on the website. I have considered Housing cost burden as a parameter to experiment with obesity.

Leave a Reply Cancel reply