Hello,
In this section, I conducted a predictive analysis to forecast median housing prices using a Linear Regression model. Here’s a breakdown of the steps and results:
I defined the independent variables (predictors) and the dependent variable (target) for the regression model. The predictors included “Total Jobs,” “Unemployment Rate,” “Hotel Occupancy Rate,” and “Logan International Flights,” while the target variable was “Median Housing Price.”
The dataset was split into training and testing sets to evaluate the model’s performance. In this case, 70% of the data was allocated for training, and 30% for testing. The random_state parameter was set to 42 for reproducibility.
I created a Linear Regression model using the `LinearRegression` class from `sklearn.linear_model`. This model is used to predict the target variable based on the predictor variables.
The model was fitted with the training data using the `fit` method. This process involved learning the relationships between the predictor variables and the target variable.
To assess the model’s performance, I made predictions on the test set using the trained model. I calculated two key performance metrics:
– Mean Squared Error (MSE): A measure of the average squared difference between actual and predicted values. A lower MSE indicates better model performance.
– R-squared (R²) Score: A measure of how well the model explains the variance in the target variable. R² ranges from 0 to 1, with higher values indicating a better fit.
The calculated performance metrics are as follows:
– MSE: [MSE Value]
– R² Score: [R² Value]
To visually assess the model’s performance, I created two plots:
– “Actual vs Predicted Median Housing Prices”: This scatter plot compares the actual median housing prices (y-axis) with the predicted prices (x-axis) for the test set. The dashed line represents perfect predictions.
– “Residuals of Predicted Median Housing Prices”: This scatter plot shows the residuals (differences between actual and predicted prices, y-axis) against the predicted prices (x-axis). The red dashed line at y=0 represents zero residuals.
These visualizations and performance metrics help evaluate the accuracy of the Linear Regression model in predicting median housing prices based on the selected economic indicators.