# Load in the datasets
fatality <- read.csv("data/PoliceKillingsUS.csv", na.strings = "")
median_income <- read.csv("data/MedianHouseholdIncome2015.csv")
poverty_perc <- read.csv("data/PercentagePeopleBelowPovertyLevel.csv")
hs_perc <- read.csv("data/PercentOver25CompletedHighSchool.csv")
race_perc <- read.csv("data/ShareRaceByCity.csv")
city_pop <- read.csv("data/us-cities-top-1k-multi-year.csv")
state_pop <- read.csv("data/Population_Estimate_data_Statewise_2010-2023.csv")
RoboCops
Analysis of Fatal Police Shootings
Introduction
The police shooting data set is sourced from a Kaggle dataset originally created by Karolina Wullum. The data, consisting of shootings for all fifty states and DC, was originally logged by The Washington Post after the Black Lives Matter movement was born from the shooting of Michael Brown in 2014. The dataset contains information on fatal shootings that occurred from 2015 to 2017 and highlights information on fatal cases as well as the demographics of the city the individuals were shot in. The goal of this project is to contextualize police shootings to further understand how and why these unfortunate situations occur.
Why the Police Killings dataset?
Police shootings are a sad reality of policing practices that back decades. While many are justified to protect the lives of innocent civilians, there has been an uptick in recent years of police shooting people whom they perceive to be a threat in the heat of the moment, but are proven innocent after the fact. These shootings can often be hard to analyze because there are many psychological factors that come into play when an officer decides to discharge their weapon, such as their training, the perceived threat, and an officer’s internalized prejudice. Without further information on the psyche of a police officer, the reason why they discharge their weapon cannot be determined. However, their actions and surroundings can be analyzed to provide insight into how and why a shooting occurs.
Analysis Goals
Our analysis aims to understand the dataset on police killings, extending its scope beyond the original focus tracked by the Washington Post. We plan to include socio-economic factors like housing income, poverty rates, and population demographics. While initially associated with the Black Lives Matter movement, our goal is to explore how socio-economic factors intersect with incidents of police violence.
By incorporating these additional factors, we hope to uncover correlations and patterns that provide insight into the broader societal context surrounding police use of force. Through data visualization and analysis, we aim to contribute to a deeper understanding of these issues and support evidence-based policy-making for promoting equity and justice.
Dataset
Main Dataset
The PoliceKilllingsUS.csv
is a dataset sourced from Kaggle using the following link:
Fatal Police Shootings in the US
The .csv contains data between 2015 and 2017 and was compiled by The Washington Post. The dataset is comprised of 14 variables and 2535 shootings. An in depth description of the variables can be seen in the following table:
Variable | Description | Data Type | Values |
---|---|---|---|
id | Data ID | numeric | 2535 entries |
name | Name of deceased | character | Names |
date | Date of shooting | date | dd/mm/yy from 2015-2017 |
manner_of_death | Means of death | categorical | shot, shot and tasered |
armed | Weapon/Tool | categorical | 69 categories |
age | Age of deceased | numeric | 6-91 years old |
gender | Gender of deceased | logical | F/M |
race | Race of deceased | categorical | A, B, H, N, O, W, Unknown |
city | City | categorical | 1417 cities |
state | State | categorical | State 2 letter abbreviations and DC |
signs_of_mental_illness | Sign of mental illness during incident | logical | True/False |
threat_level | Deceased's threat level | categorical | attack, other, undetermined |
flee | Method to evade police | categorical | Car, Foot, Not fleeing, Other |
body_camera | Reports of police with body camera | logical | True/False |
Data Cleaning
According to another Kaggle repository on the same topic and sourcing the same resources (https://www.kaggle.com/datasets/mrmorj/data-police-shootings), race codes unknown values as ““. To avoid confusion, the values are recoded as Unknown using the mutate
function.
Additionally, a secondary dataframe was generated by grouping the dataset by city
and race
and acquiring the counts. This will allow multiple avenues of analyses such as ascertaining whether or not the proportions of shootings by race in each city differs from the proportions of race in the cities in the census datasets.
City Census Datasets
In addition to the dataset containing the fatal police shooting in the repository, there are four additional 2015 census datasets each containing over 29000 observations:
MedianHouseholdIncome2015.csv
PercentagePeopleBelowPovertyLevel.csv
PercentOver25CompletedHighSchool.csv
ShareRaceByCity.csv
Data | Description | Number of Cities |
---|---|---|
PercentOver25CompletedHighSchool.csv | Contains the percentage of individuals 25 years and older that are highschool graduates for cities | 29329 |
MedianHouseholdIncome2015.csv | Contains the median income for cities | 29322 |
PercentagePeopleBelowPovertyLevel.csv | Contains the percentage of the population below poverty for cities | 29329 |
ShareRaceByCity.csv | Contains the demographics for cities | 29268 |
Each census dataset contains two common columns: Geographic.Area
and City
. The values for City
is name of the city and a classifier (ie Alexander City city).
In addition to these two columns, each dataset contains information regarding the city’s demographic information: percent of highschool graduates, median income, poverty rate, and proportion of races (White, Black, Native American, Asian, Hispanic). Each of these values are numeric.
Data Cleaning
For ease of analysis, the four census datasets are concatenated together by city into one merged dataset.
Additionally, the City
value is edited to remove the classifier from each label so that the city value between the main dataset and the census datasets will be equivalent. Doing so will ease analysis.
City Population Data
In addition to city level census data, city populations were collected from the plotly dataset repository (https://github.com/plotly/datasets/blob/master/us-cities-top-1k-multi-year.csv). The .csv contains populations of the 1000 largest US cities from 2014 to 2018. The dataset contains 4000 observations and six variables detailed below:
Variable | Description | Data Type | Values |
---|---|---|---|
City | City | character | 925 entries |
State | State | character | 50 States and District of Colombia |
Population | Population | numeric | 10048-8405837 |
lat | Latitude | numeric | 21.31-61.22 |
lon | Longitude | numeric | -157.86 to -70.26 |
year | Year | numeric | 2014-2018 |
State Population Data
In addition to city level population data, a statewise population dataset was collected. Population_Estimate_data_Statewise_2010-2023.csv
contains population estimates from 2010-2023 for all fifty states, the District of Colombia, and Puerto Rico. The dataset has 15 variables including state and populations estimates for each year from 2010-2023.
Questions
How do socioeconomic factors such as median income, poverty rates, and educational relate with incidents of police use of force across different geographic areas?
To what extent do racial demographics, including the proportions of White, Black, Hispanic, Asian, and Native American residents, correlate with incidents of police use of force within communities?
Analysis plan
Question 1
How do socioeconomic factors such as median income, poverty rates, and educational relate with incidents of police use of force across different geographic areas?
Approach
In this part of our analysis, we aim to investigate whether police use of force is influenced by fundamental societal inequalities. This investigation is divided into two parts. Firstly, we will explore the relationship between the severity of police use of force and the level of poverty in communities. To do this, we will utilize the PercentagePeopleBelowPovertyLevel.csv
dataset, which provides insights into poverty rates across different areas. Secondly, we will examine any potential correlation between socio-economic status, particularly income & educational levels, and incidents of police killings. This approach will offer valuable insights into the broader relationship between socio-economic factors and police use of force.
Visualizations
Plot 1
To visually represent our findings, we propose using bar plots or column plots. These plots will display the distribution of threat levels perceived by police officers, categorized by poverty rate brackets or educational attainment levels. Annotations will be included where applicable to provide additional context and highlight significant observations.
Plot 2
For the second plot, we will employ regression analysis and half-eye plots for regressed variables. This approach will allow us to visualize the relationship between income levels and incidents of police killings more comprehensively. This plot will be compared to plot 1 to provide a clearer understanding of any potential correlations between socio-economic factors and police use of force.
Interpretation of Plots
Overall, these visualizations will help us gain insights into the complex interplay between socio-economic factors and incidents of police use of force. By exploring these relationships, we aim to contribute to a deeper understanding of the underlying societal dynamics driving police interactions and identify potential avenues for addressing systemic inequalities.
Datasets and variables to be used
Primary Dataset (Fatal Police Shooting)
Used Variables
manner_of_death: Manner of death of victims (shot or tasered)
threat_level: The level of threat the victim posed (Attack, Other, Undetermined)
Supplementary Datasets
PercentPeopleBelowPovertyLevel.csv: he median income for cities.
ShareRaceByCity: Contains the demographics for cities
Variables to be created
To be determined as the analysis progresses
Question 2
To what extent do racial demographics, including the proportions of White, Black, Hispanic, Asian, and Native American residents, correlate with incidents of police use of force within communities?
Approach
For this segment of our analysis, we aim to investigate whether police have a higher propensity to use force against specific racial groups. This investigation is divided into two parts to comprehensively understand the dynamics at play. Firstly, we will examine the severity of police actions relative to population changes over time. This analysis will be overlaid with racial population composition to discern any patterns or disparities. In the second part, we will explore whether the use of specific instruments by police varies across racial groups and age demographics. In the absence of specific data, we will prorate the racial population distribution even for incidents of police killings to facilitate comprehension.
Visualizations
Plot 1
To visually represent our findings, we propose using leaflet or ggmap plots alongside column plots. These plots will introduce a new variable, Shooting Intensity
, calculated as the ratio of Police Shootings to Population. Further, we will zoom into select major states, incorporating racial distribution as an additional factor.
Plot 2
For the second plot, we will utilize heatmaps and stacked bar plots to illustrate the impact of specific instruments concerning Shooting Intensity
across racial groups and age distributions. This approach will provide a comprehensive visualization of the relationship between racial and age demographics in police killings.
Interpretation of Plots
Overall, these visualizations will provide valuable insights into whether police use of force is driven by deliberate choices or influenced by inherent biases or chance factors related to crime.
Datasets and Variables to be used
Primary Dataset (Fatal Police Shootings):
Used variables
race: The race of the victim of the police shooting
manner_of_death: The manner in which victims lost their lives (shot, tasered, etc)
city: The city or community where the shooting occured
Supplementary Datasets
ShareRaceByCity: Contains the demographics for cities
Variables to be created
shooting Intensity: Police shootings / Population
Timeline
Task Name | Status | Assignee | Due | Priority | Summary |
---|---|---|---|---|---|
Proposal-For Peer Review | Complete | All | April 1 | High | Write proposal for peer review |
Peer Review | Complete | All | April 1 | High | Write up peer reviews for other groups |
Proposal-Updated | Complete | All | April 8 | Medium | Update proposal using peer feedback |
Proposal-Final | Complete | All | April 15 | Low | Update proposal using instructor feedback |
Question 1: Bar Plots | Complete | Surajit | April 22 | Medium | Generate plot/slides for plot |
Question 1: Density and Regression Plots | Complete | Peter | April 22 | Medium | Generate plot/slides for plot |
Question 1: Interpret | Complete | Christian | April 22 | Medium | Interpret plots |
Question 2: Shooting Intensity 1 | Complete | Arun | April 22 | Medium | Generate plot/slides for plot |
Question 2: Shooting Intensity 2 | Complete | Sachin | April 22 | Medium | Generate plot/slides for plot |
Question 2: Interpretation | Complete | Valerie | April 22 | Medium | Interpret plots |
Presentation Structure | Complete | Christian | Medium | Create presentation order | |
Write-up Abstract/Introduction | Complete | Christian | Medium | Write up introduction and abstract |
Disclaimer
There is no evident ethical concern because the data is publically available, but in consideration of the sensitivity of the topic, names of the deceased will not be included for the analysis.
Potential inherent biases in the dataset include classification of police shooting, the fact that the dataset only consists of fatal shootings, and breadth of reported incidents. While the inherent dataset does not appear to have variables to address these biases, these concerns can be addressed in future analyses.
References
https://www.kaggle.com/datasets/kwullum/fatal-police-shootings-in-the-us/data
https://github.com/plotly/datasets/blob/master/us-cities-top-1k-multi-year.csv