Cross Validation, Sep 25-2023

What is Cross Validation?

Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations.

I applied cross validation in our project using python,

Detailed analysis based on the 5-fold cross-validation results

  1. Variability in R2Values:-
  • The R2 scores for the 5-fold cross-validation range from -0.0598 to 0.4617.
  • Only two of the folds resulted in an R2 value above 0.4, which is a moderate explanatory power. The other three folds had values close to zero or slightly negative.
  • Negative R2values in two of the folds indicate that the model’s predictions were worse than just predicting the mean of the target variable for those particular data splits.

Mean Absolute Error (MAE):

  • The MAE values range from 0.1641 to 0.5881 (ignoring the negative sign, which is due to the scoring convention).

This means that, on average, the model’s predictions can deviate from the actual values by this amount. The model seems to have a higher error in some folds compared to others.

 

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE):

    • The RMSE values for the 5-fold cross-validation range from 0.2342 to 0.7728.
    • The RMSE is particularly useful because it gives an idea of the size of the error in the same units as the target variable. An RMSE of 0.7728 means that the model’s predictions can be off by about 0.7728% (in terms of diabetic percentage) on average, in the worst-performing fold.

The variability in performance across the 5 folds suggests that the dataset might contain regions where the linear relationship between the features and the target variable isn’t strong.The presence of negative R2  values in two of the folds indicates regions where the linear model doesn’t fit the data well.

 

Leave a Reply

Your email address will not be published. Required fields are marked *