In this analysis, I explored a dataset of fatal police shootings in the United States to predict the gender of individuals involved based on various features. I used logistic regression model and gain insights gained from the datasets.
I focused on categorical variables such as threat_type, flee_status, armed_with, race, and gender. I divided the data into training and testing sets, with 80% of the data used for training and 20% for testing.
The model achieved an accuracy of approximately 95.19% which is very good. Further I created a bar chart to visualize the coefficients of the logistic regression model used in this analysis.
This visualization provides a clear view of how each feature influences the prediction of gender in the context of fatal police shootings. Further I generated a confusion matrix, to describe the performance of a classification algorithm. It summarizes the number of correct and incorrect predictions, broken down by each class.
My analysis from the matrix are as follows:-
- Top-left cell (True Negative): The number of actual females correctly predicted as females. In this case, the count is 0, indicating that the model failed to correctly predict any of the female samples.
- Top-right cell (False Positive): The number of actual females incorrectly predicted as males. The count is 61, showing that all the females in the test set were misclassified as males.
- Bottom-left cell (False Negative): The number of actual males incorrectly predicted as females. The count is 0, indicating that there were no males misclassified as females.
- Bottom-right cell (True Positive): The number of actual males correctly predicted as males. The count is 1207, showing that the model was very effective at identifying the male samples.
- In the end I felt that the model is more biased towards male when compared with female.
Further, I am looking to ask my questions and concerns with the professor/ta in the next class.