Today, I learned about regression analysis involving two variables.
The multiple regression formula is represented as
Y = β0 + β1X1 + β2X2…
In this context, Y represents the percentage of diabetics, X1 stands for the percentage of inactivity, and X2 indicates the percentage of obesity.
How did I use Linear regression in the project?
I started off by importing data from an Excel file, using the panda’s library in visual studio, I was able to efficiently process and analyze the data.
I then used linear regression to see the potential relationship between %diabetes and %obesity, as well as between %diabetes and %inactivity. I also generated smooth histograms for both %diabetes and %obesity. These were accompanied by some key statistical metrics, such as mean, median, skewness, and kurtosis, offering a deeper understanding of the data’s distribution and features.