RoboCops

Analysis of Fatal Police Shootings

This project aims to comprehensively analyze fatal police shootings in the United States, delving into the underlying factors and motivations driving these tragic events.
Authors

RoboCops - Peter, Sachin, Arun,Surajit, Christian, Valerie

School of Information, University of Arizona

Introduction

The police shooting data set is sourced from a Kaggle dataset originally created by Karolina Wullum. The data, consisting of shootings for all fifty states and DC, was originally logged by The Washington Post after the Black Lives Matter movement was born from the shooting of Michael Brown in 2014. The dataset contains information on fatal shootings that occurred from 2015 to 2017 and highlights information on fatal cases as well as the demographics of the city the individuals were shot in. The goal of this project is to contextualize police shootings to further understand how and why these unfortunate situations occur.

Why the Police Killings dataset?

Police shootings are a sad reality of policing practices that back decades. While many are justified to protect the lives of innocent civilians, there has been an uptick in recent years of police shooting people whom they perceive to be a threat in the heat of the moment, but are proven innocent after the fact. These shootings can often be hard to analyze because there are many psychological factors that come into play when an officer decides to discharge their weapon, such as their training, the perceived threat, and an officer’s internalized prejudice. Without further information on the psyche of a police officer, the reason why they discharge their weapon cannot be determined. However, their actions and surroundings can be analyzed to provide insight into how and why a shooting occurs.

Analysis Goals

Our analysis aims to understand the dataset on police killings, extending its scope beyond the original focus tracked by the Washington Post. We plan to include socio-economic factors like housing income, poverty rates, and population demographics. While initially associated with the Black Lives Matter movement, our goal is to explore how socio-economic factors intersect with incidents of police violence.

By incorporating these additional factors, we hope to uncover correlations and patterns that provide insight into the broader societal context surrounding police use of force. Through data visualization and analysis, we aim to contribute to a deeper understanding of these issues and support evidence-based policy-making for promoting equity and justice.

Dataset

Main Dataset

# Load in the datasets
fatality <- read.csv("data/PoliceKillingsUS.csv", na.strings = "")
median_income <- read.csv("data/MedianHouseholdIncome2015.csv")
poverty_perc <- read.csv("data/PercentagePeopleBelowPovertyLevel.csv")
hs_perc <- read.csv("data/PercentOver25CompletedHighSchool.csv")
race_perc <- read.csv("data/ShareRaceByCity.csv")
city_pop <- read.csv("data/us-cities-top-1k-multi-year.csv")
state_pop <- read.csv("data/Population_Estimate_data_Statewise_2010-2023.csv")

The PoliceKilllingsUS.csv is a dataset sourced from Kaggle using the following link:

Fatal Police Shootings in the US

The .csv contains data between 2015 and 2017 and was compiled by The Washington Post. The dataset is comprised of 14 variables and 2535 shootings. An in depth description of the variables can be seen in the following table:

Variable Description Data Type Values
id Data ID numeric 2535 entries
name Name of deceased character Names
date Date of shooting date dd/mm/yy from 2015-2017
manner_of_death Means of death categorical shot, shot and tasered
armed Weapon/Tool categorical 69 categories
age Age of deceased numeric 6-91 years old
gender Gender of deceased logical F/M
race Race of deceased categorical A, B, H, N, O, W, Unknown
city City categorical 1417 cities
state State categorical State 2 letter abbreviations and DC
signs_of_mental_illness Sign of mental illness during incident logical True/False
threat_level Deceased's threat level categorical attack, other, undetermined
flee Method to evade police categorical Car, Foot, Not fleeing, Other
body_camera Reports of police with body camera logical True/False

Data Cleaning

According to another Kaggle repository on the same topic and sourcing the same resources (https://www.kaggle.com/datasets/mrmorj/data-police-shootings), race codes unknown values as ““. To avoid confusion, the values are recoded as Unknown using the mutate function.

Additionally, a secondary dataframe was generated by grouping the dataset by city and race and acquiring the counts. This will allow multiple avenues of analyses such as ascertaining whether or not the proportions of shootings by race in each city differs from the proportions of race in the cities in the census datasets.

City Census Datasets

In addition to the dataset containing the fatal police shooting in the repository, there are four additional 2015 census datasets each containing over 29000 observations:

  • MedianHouseholdIncome2015.csv

  • PercentagePeopleBelowPovertyLevel.csv

  • PercentOver25CompletedHighSchool.csv

  • ShareRaceByCity.csv

Data Description Number of Cities
PercentOver25CompletedHighSchool.csv Contains the percentage of individuals 25 years and older that are highschool graduates for cities 29329
MedianHouseholdIncome2015.csv Contains the median income for cities 29322
PercentagePeopleBelowPovertyLevel.csv Contains the percentage of the population below poverty for cities 29329
ShareRaceByCity.csv Contains the demographics for cities 29268

Each census dataset contains two common columns: Geographic.Area and City. The values for City is name of the city and a classifier (ie Alexander City city).

In addition to these two columns, each dataset contains information regarding the city’s demographic information: percent of highschool graduates, median income, poverty rate, and proportion of races (White, Black, Native American, Asian, Hispanic). Each of these values are numeric.

Data Cleaning

For ease of analysis, the four census datasets are concatenated together by city into one merged dataset.

Additionally, the City value is edited to remove the classifier from each label so that the city value between the main dataset and the census datasets will be equivalent. Doing so will ease analysis.

City Population Data

In addition to city level census data, city populations were collected from the plotly dataset repository (https://github.com/plotly/datasets/blob/master/us-cities-top-1k-multi-year.csv). The .csv contains populations of the 1000 largest US cities from 2014 to 2018. The dataset contains 4000 observations and six variables detailed below:

Variable Description Data Type Values
City City character 925 entries
State State character 50 States and District of Colombia
Population Population numeric 10048-8405837
lat Latitude numeric 21.31-61.22
lon Longitude numeric -157.86 to -70.26
year Year numeric 2014-2018

State Population Data

In addition to city level population data, a statewise population dataset was collected. Population_Estimate_data_Statewise_2010-2023.csv contains population estimates from 2010-2023 for all fifty states, the District of Colombia, and Puerto Rico. The dataset has 15 variables including state and populations estimates for each year from 2010-2023.

Questions

  1. How do socioeconomic factors such as median income, poverty rates, and educational relate with incidents of police use of force across different geographic areas?

  2. To what extent do racial demographics, including the proportions of White, Black, Hispanic, Asian, and Native American residents, correlate with incidents of police use of force within communities?

Analysis plan

Question 1

How do socioeconomic factors such as median income, poverty rates, and educational relate with incidents of police use of force across different geographic areas?

Approach

In this part of our analysis, we aim to investigate whether police use of force is influenced by fundamental societal inequalities. This investigation is divided into two parts. Firstly, we will explore the relationship between the severity of police use of force and the level of poverty in communities. To do this, we will utilize the PercentagePeopleBelowPovertyLevel.csv dataset, which provides insights into poverty rates across different areas. Secondly, we will examine any potential correlation between socio-economic status, particularly income & educational levels, and incidents of police killings. This approach will offer valuable insights into the broader relationship between socio-economic factors and police use of force.

Visualizations

Plot 1

To visually represent our findings, we propose using bar plots or column plots. These plots will display the distribution of threat levels perceived by police officers, categorized by poverty rate brackets or educational attainment levels. Annotations will be included where applicable to provide additional context and highlight significant observations.

Plot 2

For the second plot, we will employ regression analysis and half-eye plots for regressed variables. This approach will allow us to visualize the relationship between income levels and incidents of police killings more comprehensively. This plot will be compared to plot 1 to provide a clearer understanding of any potential correlations between socio-economic factors and police use of force.

Interpretation of Plots

Overall, these visualizations will help us gain insights into the complex interplay between socio-economic factors and incidents of police use of force. By exploring these relationships, we aim to contribute to a deeper understanding of the underlying societal dynamics driving police interactions and identify potential avenues for addressing systemic inequalities.

Datasets and variables to be used

Primary Dataset (Fatal Police Shooting)

Used Variables

manner_of_death: Manner of death of victims (shot or tasered)

threat_level: The level of threat the victim posed (Attack, Other, Undetermined)

Supplementary Datasets

PercentPeopleBelowPovertyLevel.csv: he median income for cities.

ShareRaceByCity: Contains the demographics for cities

Variables to be created

To be determined as the analysis progresses

Question 2

To what extent do racial demographics, including the proportions of White, Black, Hispanic, Asian, and Native American residents, correlate with incidents of police use of force within communities?

Approach

For this segment of our analysis, we aim to investigate whether police have a higher propensity to use force against specific racial groups. This investigation is divided into two parts to comprehensively understand the dynamics at play. Firstly, we will examine the severity of police actions relative to population changes over time. This analysis will be overlaid with racial population composition to discern any patterns or disparities. In the second part, we will explore whether the use of specific instruments by police varies across racial groups and age demographics. In the absence of specific data, we will prorate the racial population distribution even for incidents of police killings to facilitate comprehension.

Visualizations

Plot 1

To visually represent our findings, we propose using leaflet or ggmap plots alongside column plots. These plots will introduce a new variable, Shooting Intensity, calculated as the ratio of Police Shootings to Population. Further, we will zoom into select major states, incorporating racial distribution as an additional factor.

Plot 2

For the second plot, we will utilize heatmaps and stacked bar plots to illustrate the impact of specific instruments concerning Shooting Intensity across racial groups and age distributions. This approach will provide a comprehensive visualization of the relationship between racial and age demographics in police killings.

Interpretation of Plots

Overall, these visualizations will provide valuable insights into whether police use of force is driven by deliberate choices or influenced by inherent biases or chance factors related to crime.

Datasets and Variables to be used

Primary Dataset (Fatal Police Shootings):

Used variables

race: The race of the victim of the police shooting

manner_of_death: The manner in which victims lost their lives (shot, tasered, etc)

city: The city or community where the shooting occured

Supplementary Datasets

ShareRaceByCity: Contains the demographics for cities

Variables to be created

shooting Intensity: Police shootings / Population

Timeline

Task Name Status Assignee Due Priority Summary
Proposal-For Peer Review Complete All April 1 High Write proposal for peer review
Peer Review Complete All April 1 High Write up peer reviews for other groups
Proposal-Updated Complete All April 8 Medium Update proposal using peer feedback
Proposal-Final Complete All April 15 Low Update proposal using instructor feedback
Question 1: Bar Plots Complete Surajit April 22 Medium Generate plot/slides for plot
Question 1: Density and Regression Plots Complete Peter April 22 Medium Generate plot/slides for plot
Question 1: Interpret Complete Christian April 22 Medium Interpret plots
Question 2: Shooting Intensity 1 Complete Arun April 22 Medium Generate plot/slides for plot
Question 2: Shooting Intensity 2 Complete Sachin April 22 Medium Generate plot/slides for plot
Question 2: Interpretation Complete Valerie April 22 Medium Interpret plots
Presentation Structure Complete Christian Medium Create presentation order
Write-up Abstract/Introduction Complete Christian Medium Write up introduction and abstract

Disclaimer

There is no evident ethical concern because the data is publically available, but in consideration of the sensitivity of the topic, names of the deceased will not be included for the analysis.

Potential inherent biases in the dataset include classification of police shooting, the fact that the dataset only consists of fatal shootings, and breadth of reported incidents. While the inherent dataset does not appear to have variables to address these biases, these concerns can be addressed in future analyses.

References

https://www.kaggle.com/datasets/kwullum/fatal-police-shootings-in-the-us/data

https://github.com/plotly/datasets/blob/master/us-cities-top-1k-multi-year.csv