Today I have done an in depth EDA on the Washington shooting dataset which contains 8,770 records with 19 columns.
The dataset contains a lot of missing data in several variables, particularly in “County” (55.38%) and demographic-related variables like “Race” (16.18%). Other variables such as “Flee Status” and location-related i.e, “Longitude” and “Latitude” also exhibit missing entries, that will affect geographical analyses.
After I moved towards demographic analysis. During the incidents , they were a lot of male individuals and primarily between the ages of 20 and 40, with a slight skew towards younger ages. Regarding race, White individuals were most frequently involved, followed by Black and Hispanic individuals.
While analyzing the geography, I plotted a bar chart to understand the trend of number of incidents in each state, and I observed that the incidents are not uniformly distributed across states.
California (1235 incidents), Texas (807 incidents), and Florida (559 incidents) have notably higher incidents compared to other states.
I am planning to conduct further detailed analysis and identify any correlations between these variables..Also, I am planning to ask my questions to the professor in the upcoming class.