Nov 29,2023

Hello,

In this section, I conducted a predictive analysis to forecast median housing prices using a Linear Regression model. Here’s a breakdown of the steps and results:

I defined the independent variables (predictors) and the dependent variable (target) for the regression model. The predictors included “Total Jobs,” “Unemployment Rate,” “Hotel Occupancy Rate,” and “Logan International Flights,” while the target variable was “Median Housing Price.”

The dataset was split into training and testing sets to evaluate the model’s performance. In this case, 70% of the data was allocated for training, and 30% for testing. The random_state parameter was set to 42 for reproducibility.

I created a Linear Regression model using the `LinearRegression` class from `sklearn.linear_model`. This model is used to predict the target variable based on the predictor variables.

The model was fitted with the training data using the `fit` method. This process involved learning the relationships between the predictor variables and the target variable.

To assess the model’s performance, I made predictions on the test set using the trained model. I calculated two key performance metrics:
– Mean Squared Error (MSE): A measure of the average squared difference between actual and predicted values. A lower MSE indicates better model performance.
– R-squared (R²) Score: A measure of how well the model explains the variance in the target variable. R² ranges from 0 to 1, with higher values indicating a better fit.

The calculated performance metrics are as follows:
– MSE: [MSE Value]
– R² Score: [R² Value]

To visually assess the model’s performance, I created two plots:
– “Actual vs Predicted Median Housing Prices”: This scatter plot compares the actual median housing prices (y-axis) with the predicted prices (x-axis) for the test set. The dashed line represents perfect predictions.
– “Residuals of Predicted Median Housing Prices”: This scatter plot shows the residuals (differences between actual and predicted prices, y-axis) against the predicted prices (x-axis). The red dashed line at y=0 represents zero residuals.

These visualizations and performance metrics help evaluate the accuracy of the Linear Regression model in predicting median housing prices based on the selected economic indicators.

Nov 27,2023

Hello,

To enhance the analysis, I converted the “Year” and “Month” columns into a single datetime column named “Date.” This conversion simplifies the time-based analysis and visualization of data trends.

I then plotted the trend of the unemployment rate in Boston from January 2013 to December 2019. The line graph provides a visual representation of how the unemployment rate changed over this period. It is evident that the unemployment rate experienced fluctuations during these years.

Additionally, I calculated the correlation between the unemployment rate and median housing prices. Correlation analysis helps us understand the relationship between these two variables. In this case, the correlation value quantifies the degree to which changes in the unemployment rate are associated with changes in median housing prices. This statistical measure provides valuable insights into the potential connections between economic indicators.

Nov 24,2023

Hello,

In this phase of my analysis, I delved into correlation analysis and predictive modeling, aiming to uncover relationships between economic indicators and forecast median housing prices. Let’s break down what I’ve accomplished:

Correlation Analysis:I began by selecting a subset of economic indicators, including “Logan Passengers,” “Hotel Occupancy Rate,” “Unemployment Rate,” “Median Housing Price,” and “Housing Sales Volume.” I then created a correlation matrix to visualize the relationships between these indicators. The correlation heatmap displayed the strength and direction of these relationships, providing insights into how changes in one variable may affect others.

Predictive Modeling: My analysis also involved predictive modeling, specifically focused on forecasting median housing prices. I identified predictor variables, which included “Unemployment Rate,” “Logan Passengers,” and “Total Jobs,” while the target variable was “Median Housing Price.” The dataset was split into training and testing sets to evaluate the model’s performance.

I applied a Linear Regression model to predict median housing prices based on the selected predictor variables. The model was trained on the training data, and predictions were made on the testing data. I assessed the model’s performance using Mean Squared Error (MSE) and R-squared (R²) as key metrics. These metrics provide insights into how well the model predicts median housing prices based on the chosen economic indicators.

In summary, this phase of the analysis involved correlation analysis to understand the relationships between economic indicators and predictive modeling to forecast median housing prices. These insights can be invaluable for decision-making in economic planning and housing market assessments.

Nov 22,2023

Hello,

In our analysis, i started by preparing the data for time series exploration. We created a new date column by combining the existing “Year” and “Month” columns and set it as the index for our DataFrame. This step was crucial for organizing the data in a time series format, enabling us to analyze how economic indicators change over time.

After the data preparation, we focused on trend analysis for several key economic indicators. These indicators included “Logan Passengers,” “Logan International Flights,” “Hotel Occupancy Rate,” “Hotel Average Daily Rate,” “Total Jobs,” “Unemployment Rate,” “Median Housing Price,” and “Housing Sales Volume.” Our objective was to visualize the trends in these economic factors over time.

I presented the trend analysis results through a series of line plots. Each plot represented one economic indicator, and the x-axis displayed time, while the y-axis represented the values of the respective indicator. This visualization allowed us to observe how these economic variables evolved over the years.

My analysis provided valuable insights into the long-term trends of these economic factors, which can be instrumental in making informed decisions related to economic planning, tourism strategies, and housing market assessments.

Nov 20, 2023

This dataset, called “the economic indicator,” has information about different economic factors, sorted by year and month. Here’s an easy explanation of what each part means:

– Year and Month: When the data was collected.
– Logan Passengers: How many people used Logan Airport.
– Logan Intl Flights: Number of international flights at Logan Airport.
– Hotel Occupancy Rate: How full hotels were.
– Hotel Average Daily Rate: Average cost per day to stay in a hotel.
– Total Jobs: The total number of jobs available.
– Unemployment Rate: The percentage of people without jobs.
– Labor Force Participation Rate: Percentage of people working or looking for work.
– Pipeline Unit: Info about housing or building projects, like how many units are there.
– Pipeline Total Development Cost: How much it costs to build these projects.
– Pipeline Square Footage: The total size of these building projects.
– Pipeline Construction Jobs: How many jobs are created for building these projects.
– Foreclosure Petitions: How many people asked for help to avoid losing their homes.
– Foreclosure Deeds: How many people actually lost their homes.
– Median Housing Price: The middle price for homes.
– Housing Sales Volume: How many houses were sold.
– New Housing Construction Permits: Number of permissions given to build new houses.
– New Affordable Housing Permits: Number of permissions for building affordable houses.

This dataset gives a good overview of different parts of the economy like air travel, hotels, jobs, real estate, and the housing market. It helps understand the financial health and trends of a specific area.

Nov 17,2023

Hi,

In todays analysis, I Compared time series models that helps you pick the best one for predicting future data. Different models work better for different kinds of data. Here’s a simple comparison:

1. ARIMA: Good for most data but not for data that changes in a pattern (like sales in different seasons). Best for regular data that goes up and down.

2. SARIMA: Like ARIMA, but better for data with seasonal patterns (like higher ice cream sales in summer). It’s a bit complicated to use.

3. Exponential Smoothing : Easy to use and good for data that has trends and patterns that repeat every year. Not great if the data changes unexpectedly.

4. VAR (Vector Autoregression): Great for when you have several types of data and want to see how they affect each other. Needs all data types to be steady and can take a lot of computer power.

5. LSTM (Long Short Term Memory): Really good for big datasets and can understand complicated patterns. Needs a lot of data and computer power to work well.

6. Prophet (by Facebook): Made for business data that’s recorded every day. It’s good at handling special days like holidays. Not as good for data that’s not daily or very messy.

In the end, the best model often comes from trying a few and seeing which one predicts the best for your specific data. By Monday we will start analyzing our dataset.

Nov 15,2023

Hello,

Today we searched the website “dataset.boston.gov” and we are planning to take dataset of economic indicators of Boston.

The primary objective of this project is to delve into the intricacies of Boston’s economy, using the economic indicators dataset spanning from January 2013 to December 2019. The goal is to gain a thorough understanding of Boston’s economic pulse, essential for informed decision-making and strategic planning.

Nov 13,2023

Hi,

In today’s class I learned about time series analysis. It’s commonly used to forecast future events based on past trends, identify patterns, and analyze the effects of certain decisions or events. There are several key components and methods in time series analysis which I learned in today’s class. those are as follows:-

  1. Trend Analysis: This involves identifying the underlying trend in the data, which could be increasing, decreasing, or constant over time.
  2. Seasonality: This refers to patterns that repeat at regular intervals, such as weekly, monthly, or yearly. Seasonality analysis helps in understanding and adjusting for these regular patterns.
  3. Stationarity: A time series is stationary if its statistical properties like mean, variance, and autocorrelation are constant over time. Many time series models require the data to be stationary.
  4. Models for Time Series Analysis: Common models include ARIMA (Autoregressive Integrated Moving Average), SARIMA (Seasonal ARIMA), and more advanced machine learning models like LSTM (Long Short-Term Memory) networks.

We also had a chance to look at the economic indicators data and learn practically about time series analysis. By next class, I will be selecting a dataset from data.boston.gov and discuss in the next class.

Report Making

Hello,

We have started to work on project report. update as of today, we have finalized our issues, discussion and result section. Planning to complete remaining sections by tomorrow.

Analyzing Trends in Flee Statuses in Fatal Police Encounters

Hello!!

Today, I have performed analysis on Flee Statuses in Fatal Police Encounters . I felt important to understand how people act in these situations, especially if they try to run away.

Monthly Analysis of 2022:In 2022, the flee statuses in fatal police encounters showed varied patterns. The analysis categorizes flee statuses into four types: car, foot, not fleeing, and other. For example, in January, there were 5 incidents of fleeing by car, 19 by foot, 40 where the individual did not flee, and 1 ‘other’. Throughout the year, the ‘not fleeing’ category consistently had the highest number of incidents each month, with the numbers varying. The occurrences of fleeing by car and foot also showed fluctuations, while the ‘other’ category, though the least frequent, varied between 1 to 6 incidents per month. A detailed line chart provides a visual representation of these monthly trends, highlighting the fluctuating nature of flee statuses throughout the year.

Yearly Analysis: The yearly trend offers a broader perspective. A line chart plotting the annual data from 2015 onwards indicates that the majority of individuals in these fatal encounters did not attempt to flee, with this trend showing a slight decrease over the years. The incidents of fleeing by car and foot exhibit slight variations but remain relatively consistent annually. The ‘other’ category of fleeing remains the least common across the years.

The analysis of flee statuses in fatal police encounters highlights crucial aspects of these incidents. While most individuals did not flee, those who did showed a preference for fleeing by car or foot, with slight year-to-year change. These insights are valuable for understanding the nature of fatal police encounters and can be used in the project. I have few queries with respect to the findings, which I intend to ask the professor in the next class.

 

An Analytical view into Fatal Police Shootings by Agency

Hello,

Today I performed analysis on police shooting data to determine the frequency of fatal shootings involving specific police agencies.

A detailed analysis of the data from 2015 to the present provides us with a clearer picture of how these tragic events are distributed across different law enforcement agencies. For instance, the Los Angeles Police Department (LAPD) stands out with the highest number of fatal shootings, a statistic that prompts a deeper examination of the protocols and community engagements specific to the LAPD. Below Bar chart showing the top 10 agencies by total fatal police shootings

 

The aspect of police shootings is the intersection with mental health. The analysis reveals significant variation in the proportion of incidents related to mental illness among different agencies. The Las Vegas Metropolitan Police Department, for example, shows a notably higher rate of mental illness-related incidents, which could reflect a broader narrative on the challenges police face when encountering individuals in mental health crises. The second graph is a bar chart that illustrates the top 10 police agencies by the percentage of mental illness-related shootings. This visual complements the second paragraph, which delves into the complexities of police interactions with individuals experiencing mental health crises.

For the final visualization to match the third paragraph, I  created a scatter plot that examines the relationship between the use of body cameras and the frequency of shootings by different agencies.

The third graph is a scatter plot that explores the correlation between the use of body cameras and the total number of fatal police shootings by agency. This visualization supports the third paragraph, highlighting how the deployment of body cameras might influence police interactions and the transparency of such critical incidents.

Together, the analysis  and  graphs provide a data-driven narrative about police practices and the factors influencing fatal police encounters.​​

I have few queries with respect to the findings, which I intend to ask the professor in the next class.

Exploring Racial Differences in Fatal Police Shootings

Hello,

In today’s analysis I performed a heat map analysis for different racial groups, to gain insights into how various factors are interrelated in incidents of fatal police shootings, with a particular focus on how these relationships vary across different racial groups. My analysis are as follows:-

  • Asian (A)
    – There is a positive correlation between latitude and longitude, suggesting that incidents involving Asian individuals tend to occur in specific geographical regions.
    – The age variable shows very little correlation with other variables, indicating that age does not play a significant role in these incidents for Asian individuals.
    – The was_mental_illness_related variable shows a slight negative correlation with body_camera, suggesting that incidents involving mental illness are less likely to be recorded by body cameras.
  • White (W)

– Similar to the Asian group, there is a positive correlation between latitude and longitude.

– The age variable shows a slight negative correlation with latitude and longitude, suggesting that incidents involving older white individuals might occur in different geographical areas compared to younger individuals.
– The was_mental_illness_related variable has very little correlation with other variables.

Hispanic (H)
– Again, there is a positive correlation between latitude and longitude.
– The age variable shows a slight negative correlation with latitude, longitude, and was_mental_illness_related.
– The body_camera variable shows very little correlation with other variables.

Black (B)
– The positive correlation between latitude and longitude is present, though slightly weaker compared to other racial groups.
– The age variable shows a slight negative correlation with latitude, longitude, and was_mental_illness_related.
– The body_camera variable shows a slight positive correlation with was_mental_illness_related.

Other (O)
– The positive correlation between latitude and longitude is weaker compared to other racial groups.
– The age variable shows very little correlation with other variables.
– The was_mental_illness_related variable shows a slight negative correlation with body_camera.

Native American (N)
– The correlation between latitude and longitude is weaker compared to other racial groups.
– The age variable shows very little correlation with other variables.
– The body_camera variable shows a slight negative correlation with was_mental_illness_related.

Black and Hispanic (B;H)
– This group has a very limited number of data points, and as such, the correlations should be interpreted with caution.
– The age variable shows a slight negative correlation with latitude and longitude.
– The was_mental_illness_related variable shows a slight positive correlation with body_camera.

General Observations
– Across all racial groups, there is a consistent positive correlation between latitude and longitude.
– The age variable generally shows little to no correlation with other variables.
– The relationship between was_mental_illness_related and body_camera varies across racial groups, indicating potential areas for further investigation.

These observations provide a starting point for further analysis and discussion. It is important to approach these findings with a critical eye and consider additional factors and context that might influence these relationships.