Update regarding the project: –
To begin with, I solved the issues which were there in the code for linear regression. The stastical parameters and the graph are as follows:
Linear regression graph: –
The visualizations show the relationship between the independent variables (“% INACTIVE” and “% OBESE”) and the dependent variable (“% DIABETIC”) for the test data. In each plot:
- The blue points represent the actual “% DIABETIC” values.
- The red points represent the predicted “% DIABETIC” values based on the linear regression model.
Key Metrics for the Model:
- Mean Squared Error (MSE). Value: -0.400063
This represents the average of the squares of the errors between the predicted and actual values. Lower values are better, but the scale depends on the dependent variable.
- R-squared. Value-0.395
This represents the proportion of the variance for the dependent variable that’s explained by the independent variables in the model. The \( R^2 \) value ranges from 0 to 1, with higher values indicating a better fit. An \( R^2 \) value of 0.395 means that the model explains approximately 39.5% of the variability in “% DIABETIC”.
Interpretation:
- The \(R^2 \) value of 0.395 suggests that the model explains about 39.5% of the variance in the “% DIABETIC” variable, which is a moderate level of explanation.
- The MSE of 0.400 is a measure of the model’s prediction error. Lower values are generally better.
- The model efficiency is 39.5%, which I feel is not too great, but this is what could be achieved with the following data points.
- To increase the model efficiency, we can do WLS. But still not sure how to implement it. Going to ask on Mondays class.
I would be trying to find a relationship with other parameters which are available on the website. I have considered Housing cost burden as a parameter to experiment with obesity.