Daily post 11-10-2023

We aimed to forecast individual behaviors in a range of incident scenarios in our decision tree analysis. These included circumstances in which people did not flee and others in which they did so by vehicle, foot, or by some other means. With an accuracy of almost 67%, our model demonstrated a significant capacity for accurate prediction in the majority of instances. This degree of precision highlights the model’s overall efficacy in evaluating and deciphering a range of behavioral reactions across various incident settings.

On the other hand, more examination of the confusion matrix showed some incorrect classifications. The algorithm correctly predicted 676 fleeing episodes and successfully detected 37 cases in which people did not escape, but it also incorrectly classified a number of cases. Notably, in 125 cases, the model predicted fleeing when it didn’t actually happen.However, 33 cases were mistakenly predicted to be escaping on foot, and 136 cases were inaccurately labeled as being in a car. These incorrect classifications highlight particular parts of the model that need to be improved in order to increase its dependability and forecast accuracy.

October 27th 2023

The task for today was to write a Python script that would analyze an Excel dataset. Counting unique words inside designated dataset columns was the main goal. The procedure started with the import of necessary libraries, like the Counter class for word frequency computations and Pandas for data manipulation. The file path of the Excel document was supplied, and a list was used to designate the columns to be examined in order to make the analysis flexible. After that, a Pandas DataFrame was loaded with the data from the Excel file for additional processing. Word counts were tracked by initializing an empty dictionary. After that, the code extracted and converted the data into strings by looping over the designated columns. 

Each column’s text was tokenized into words, and each word’s frequency was carefully tallied and entered into the dictionary. Printing the word counts for each column and displaying the column name, unique words, and matching frequencies was the last stage. This code is a flexible tool for text analysis in certain Excel dataset columns. It produces an organized and detailed output that can be used to get additional analytical insights.

October 25th 2023

Large-scale data on fatal police shootings from The Washington Post’s database is available for analysis using statistical methods like p-tests. I’ll go into how p-tests could be used to draw conclusions from this dataset in this piece.P-tests are used to establish if a difference between groups that has been detected is statistically significant or more likely the result of chance. P-tests could be used in the following situations:

Evaluating racial differences in the per-capita shooting rates of Black and White victims, for example. A significant p-value might attest to the existence of true differences.Analyzing the armed state of victims in relation to several situational elements, such as location, mental illness, and flight. Significant interactions can be found using p-tests.- Examining patterns across time. Based on p-values, are quarterly shooting rate increases or declines from year to year significant?- Evaluating racial variations in the mean age of victims. 

Age differences would be significant if the p-value was low.

A significance level (often 0.05) and p-value computation allow researchers to draw statistical inferences about observed differences. The null hypothesis—that there is no true difference between the groups—is rejected by significant p-values.P-testing offers a simple way to use the Washington Post data to perform thorough statistical analyses. It goes beyond straightforward explanations to explicitly test theories on variables such as ethnicity, mental health, armed status, age, and geography and draw conclusions based on facts. This enables a clearer comprehension of the causative patterns behind police shootings.

October 23rd 2023

In order to address social justice and equity issues, my work in thoroughly analyzing crime and statistical data to comprehend the influence of an individual’s environment on their propensity to engage in criminal activities as well as my research on race-related data in the context of policing and criminal interactions are extremely relevant and significant. Below is a summary of the main points of your research:

Analysis of the Environment and Criminal Behavior:

Examining living situations, community dynamics, and socioeconomic factors: These elements have a big impact on a person’s propensity to commit crimes. By looking into these areas, you hope to find the underlying reasons for criminal conduct and maybe create intervention and preventative plans.

Analysis of Data Concerning Race:

Disproportionate impacts on racial groups in law enforcement incidents: The goal of your research is to identify patterns and trends pertaining to racial differences in interactions with police, specifically in shooting and use of force occurrences.

Factors that are responsible for these incidents: In order to address and resolve these problems, it is essential to comprehend the fundamental causes of these differences. Examining elements like racism, ties within the community, and policing tactics may fall under this category.

Recognizing Police-Police Relations:

Examining the reactions of people from various racial origins to interactions with the police: Understanding the dynamics of police interactions requires an understanding of this component of your research. It might be useful in determining whether there are differences in the ways that individuals from various racial backgrounds view and respond to law enforcement.

Contribution to Equity and Social Justice:

Your study attempts to shed more light on the reasons why members of particular racial groups are more likely to be shot by police. This knowledge can make a substantial contribution to the larger conversation about fairness and social justice.

Your work may help improve racial inequities and interactions between law enforcement and communities by bringing these concerns to light and influencing policy reforms, community initiatives, and law enforcement tactics.

It is essential to make sure that your data collecting and analysis procedures are transparent and rigorous in order to conduct this research successfully. Furthermore, disseminating your research results to a larger group of people—such as the public, community organizations, and policymakers—can encourage constructive change and advance a just and equitable society. This type of research is valuable in addressing pressing societal issues and promoting positive change in law enforcement and community dynamics.

October 20th 2023

The comprehensive dataset on fatal police shootings from The Washington Post provides an invaluable chance to delve deeply into the characteristics of the victims and their relationship to these instances.

In order to comprehend the age distribution, I first compute summary statistics and make histograms. The age distribution shown by the histogram is biased to the right, indicating that the majority of victims are between the ages of 20 and 40, with fewer being older. Given that the median and average ages are in their 30s, it is clear that many victims are young adults. It is evident that the age distribution of those who have been killed is disproportionately younger when we compare it to census data.

When we examine age in more detail and divide it by race, we find some noteworthy variations. Compared to White victims, Black victims are on average nearly five years younger. The age distribution by race would be represented by a curve that fits this data, showing that White victims peak in their 30s and Black victims typically occur in their 20s. We can evaluate the importance of this age disparity with the aid of formal statistical tests.

Now that we’re talking about race, the statistics shows that, despite Black Americans making up only 13% of the population overall, roughly a quarter of the victims are Black. Furthermore, more than half of unarmed victims are people of color. This indicates a concerning racial inequality that requires further, rigorous statistical testing, even without these additional tests, the descriptive data is concerning on its own.

This kind of dataset exploration from the Washington Post gives us important insights into initial linkages and demographic patterns. The basis for more sophisticated analytics that officially assess correlations and causal elements is provided by this descriptive analysis, which makes use of statistical techniques including regression, predictive modeling, and hypothesis testing.

October 18th

In the next phase of my research, I used geospatial data and relevant libraries to create a visual representation of police shooting incidents across the US. In order to do this, I took the dataset’s “latitude” and “longitude” characteristics and painstakingly filtered out any null values in these columns. In order to create a Geospatial Scatter Plot, an accurate geographic map of the United States was created and then enhanced with the addition of recognizable red markers. The resulting visualization provides important insights on the distribution of these episodes across the nation by presenting a spatially accurate depiction of the locations where they occurred. Plotting these occurrences on a map makes it easy to identify areas where there are clusters of police shootings, which aids in a better understanding of local patterns.

Policymakers, scholars, and the general public may now see the scatter plot for each state in the US, providing them with important new information about the geographic scope of police shootings. This could therefore lead to better-informed conversations and motivate activities meant to address this important problem. For the state of Massachusetts, as an example, I created a comparable Geospatial Scatter Plot, which is seen visually below.

Following our professor’s advice from the previous session, I plan to continue my analysis by investigating the use of clustering methods and diving into the world of GeoHistograms. In particular, K-Means and DBSCAN are two different clustering techniques that our professor presented. It’s important to remember that K-Means requires predefining the value of K, which may present a constraint. My objective is to apply both of these methods to the geographic locations of the gunshot data in Python and assess if they produce meaningful clusters. This stage is expected to reveal more levels of information and trends in the dataset, which will advance our understanding of this important problem.

October 16th 2023

I’ve successfully combined the advantages of clustering with GeoPy in my current data science project to provide a wealth of information about the geospatial features of my dataset. Robust Python package GeoPy has proven invaluable in accurately geocoding large datasets by translating addresses into precise latitude and longitude coordinates. This geocoding procedure is essential because it makes data visualization on geographic plots possible and provides a spatial context for the patterns and trends that are seen. I’ve applied clustering methods to this geocoded data by utilizing Python’s extensive libraries. Specifically, I’ve used the scikit-learn library’s K-Means clustering technique to group related data points based on their geospatial features. The results have been immensely illuminating.GeoPy’s Contribution: I was able to precisely plot data points on maps, like the one of the United States of America, by employing GeoPy to accomplish correct geocoding of my datasets.

Cluster Analysis: I found different clusters by combining GeoPy with DBSCAN, a density-based clustering algorithm, and using K-Means clustering. This allowed me to gain important geospatial insights.

Project Results:

Geospatial Customer Segmentation: By grouping customer information, I was able to identify different customer segments according to where they were located. This gave me important information about local preferences and habits. Consequently, focused marketing tactics are informed by this.

Trend Identification: By identifying regions of increased activity or interest, clustering provides insight into geospatial trends. Such realizations are essential for making well-informed decisions that direct the distribution of resources and plans for growth.

Our lecturer just introduced us to GeoPy and demonstrated its geocoding capabilities by charting data points on an American map during class. We looked specifically at California, where we investigated the relationship between crime rates and shootouts. We learned about clustering algorithms through this activity, with a focus on the DBSCAN approach. The conversation continued by examining issues such as the relationship between shootouts and crime rates, taking into account variables like the level of criminality in areas with different crime rates. This class session furthered our understanding of GeoPy by generating thought-provoking questions and lively debates and clustering’s potential applications.

October 13th 2023

Today, my focus was on studying Analysis of Variance (ANOVA), a potent statistical instrument that is widely used to compare the means of two or more groups within a dataset. Finding out if there are any notable variations between these group means is its main goal. ANOVA’s technique entails examining the variance within each group and comparing it to the variance across groups. An ANOVA appropriately indicates that the mean of at least one group differs significantly from the others if the variation between groups noticeably exceeds the variation within groups.

This statistical test is essential in many fields, including the social sciences, quality assurance, and scientific research. ANOVA gives researchers the ability to determine the statistical significance of observed differences by providing a p-value.When a p-value is less than a predefined cutoff point, usually 0.05, it indicates that it is improbable that the observed differences are the result of chance and calls for additional research.

ANOVA can take many various forms. Two of the most common ones are one-way ANOVA, which compares groups within a single factor, and two-way ANOVA, which evaluates the effects of two independent factors. The results obtained using ANOVA serve as decision-making benchmarks, empowering analysts and researchers to derive significant inferences and make educated decisions in their domains.

October 11th 2023

In Project 2, two separate datasets, “fatal-police-shootings-data” and “fatal-police-shootings-agencies,” are thoroughly examined. Each dataset is created to meet particular analytical goals. “fatal-police-shootings-data,” the first dataset, has 8770 rows and 19 columns covering the period from January 2, 2015, to October 7, 2023. The existence of missing values in critical columns including threat category, flee status, and location information must be noted. Notwithstanding these shortcomings, this dataset provides a plethora of information about deadly police shootings, covering important details such as threat levels, type of weapons used, demographic data, and more.

The “fatal-police-shootings-agencies” dataset, on the other hand, consists of 3322 rows and six columns. As with the first dataset, there are several missing data points, particularly in the column labeled “oricodes.”This dataset aims to provide information about law enforcement agencies, including names, identities, types, locations, and their involvement in shooting deaths involving police officers.

It is essential to take into account the context and formulate targeted questions that are in line with the analysis’s goals in order to glean insightful information and reach well-informed findings. These databases offer an invaluable chance to investigate and learn more about the law enforcement organizations involved, the complex interactions between the variables, and deadly police shootings.