Analysis and Visualization of Police Shootings in the United States

INFO 526 - Final Project

This project aims to comprehensively analyze fatal police shootings in the United States, delving into the underlying factors and motivations driving these tragic events.
Author
Affiliation

RoboCops - Peter, Sachin, Arun, Surajit, Christian, Valerie

School of Information, University of Arizona

Abstract

The objective of this project is to analyze and contextualize police shootings, aiming to understand the underlying reasons and factors contributing to these unfortunate events. Acknowledging the necessity of police interventions for public safety, there has been a concerning uptick of documented unjustified shootings in recent years. We seek to expand the analysis beyond individual cases to explore socioeconomic factors like housing income, poverty rates and population demographics, and uncover correlations and patterns that shed light on the broader societal context.

This will be achieved through the use of data visualizations to contribute to a deeper understanding of police shootings.

Introduction

Police shootings are a sad reality of policing practices that date back decades. While many are justified to protect the lives of innocent civilians, there has been an uptick in recent years of police shooting people whom they perceive to be a threat in the heat of the moment, but are proven innocent after the fact. These shootings can often be hard to analyze because there are many psychological factors that come into play when an officer decides to discharge their weapon, such as their training, the perceived threat, and an officer’s internalized prejudice. Without further information on the psyche of a police officer, the reason why they discharge their weapon cannot be determined. However, their actions and surroundings can be analyzed to provide context into communities and situations that may be more prone to police shootings.

The police shooting data set is sourced from a Kaggle dataset originally created by Karolina Wullum. The dataset consists of shootings from all fifty states and DC, and is supplemented with informations on demographics including as median household income, distribution of race, city population, percent of people below poverty line, and percent of people over 25 who completed high school. The data is from 2015 to 2017.

This analysis aims to understand the dataset on police killings, extending its scope beyond the original focus tracked by the Washington Post. Socioeconomic factors like housing income, poverty rates, and population demographics will be used to understand and contextualize any patterns that appear. The goal is to explore how socioeconomic factors can play into incidents of police violence. The following table represents the variables relevant to the analysis.

Variable Description Data Type Values
id Data ID numeric 2535 entries
name Name of deceased character Names
date Date of shooting date dd/mm/yy from 2015-2017
manner_of_death Means of death categorical shot, shot and tasered
armed Weapon/Tool categorical 69 categories
age Age of deceased numeric 6-91 years old
gender Gender of deceased logical F/M
race Race of deceased categorical A, B, H, N, O, W, Unknown
city City categorical 1417 cities
state State categorical State 2 letter abbreviations and DC
signs_of_mental_illness Sign of mental illness during incident logical True/False
threat_level Deceased's threat level categorical attack, other, undetermined
flee Method to evade police categorical Car, Foot, Not fleeing, Other
body_camera Reports of police with body camera logical True/False

Additionally, six .csv files are utilized to provide supplemental demographic data such as poverty levels, percent high school graduates, city and state population, race percentage by city, and median household income.

Data Description Number of Rows
PercentOver25CompletedHighSchool.csv Contains the percentage of individuals 25 years and older that are highschool graduates for cities 29329
MedianHouseholdIncome2015.csv Contains the median income for cities 27385
PercentagePeopleBelowPovertyLevel.csv Contains the percentage of the population below poverty for cities 29329
ShareRaceByCity.csv Contains the demographics for cities 29268
Population_Estimate_data_Statewise_2010-2023.csv Contains state populations 52
us-cities-top-1k-multi-year.csv Contains populations of the top 1000 largest cities in the US 4000

By incorporating these additional factors, correlations and patterns can be uncovered that provide insight into the broader societal context surrounding police use of force. Through data visualization and analysis, contributions to a deeper understanding of these issues and support evidence-based policy-making for promoting equity and justice.

Question 1

  1. How do socioeconomic factors such as median income, poverty rates, and education relate with incidents of police use of force?

Description of Analysis: Bar Plots

Plot 1a

For the bar plot which illustrates the socioeconomic backdrop of fatal police encounters is classified meticulously by poverty and education levels. The underlying data, standardized for consistency, undergoes a transformation to quantify both poverty rates and high school completion rates, subsequently binned into quintiles for a granular analysis. The methodology ensures that each bracket spans an equal range of the data, capturing a full spectrum of socio-economic statuses. These brackets are then labeled with intuitive percentage ranges, setting the stage for the ensuing visual assessment. This careful numerical and categorical preparation underpins the creation of two bar charts, which, through the nuanced lens of ggplot2, lay bare the stark contrasts in threat level distributions across different socio-economic strata, painting a telling picture of the intersection between socio-economic factors and lethal police interventions.

Plot 1b

Description of Analysis: Density Plots

Plot 1c

For the density ridge plots, the population for each race was calculated from the supplemental datasets by multiplying the 2015 city population estimate with the estimated racial proportion of the city and divided by 100. The police shooting dataset, median income dataset, and the cities race estimates were concatenated according to city and state. The merged data was then subsetted to only include incidents with a median income and race estimates.

The first density plot, which is organized by the total count of fatalities by race, shows whites having the largest numbers of incidences with black following afterwards. This does not align with known race population proportions.

Plot 1d

A regression plot was generated to check the proportions of the races across the median incomes.

As suspected, the white population makes up the most of the population but the black population is a clear minority. Because of this discrepancy between the plots, the metrics used to organize the density plot is changed to reflect the population differences.

Plot 1e

The second density ridge plot is organized by the race’s relative proportion of fatalities (ie the number of fatalities divided by the estimated race’s total population).

After adjusting the order according to the percentages, the plot now indicates that the black population is the most likely to be involved in a fatal shooting incident and the white population dropped significantly in the ranking.

Discussion

The above visualizations focus on the distribution of shootings across three demographic categories: median income, percent high school graduates, and poverty level. Plot 1a and 1b take this further by breaking down the shootings by threat level across poverty level and percent high school graduates. Looking further at Plot 1a and 1b, two major observations can be made: cities with greater than 75% high school graduates and poverty levels between 18-25% experience the most shootings with roughly half of all incidents showing a threat level of “Attack”, meaning the shooting officer felt their safety was at risk via the suspect. Additionally, most shootings had a threat level of “Attack” or “Other”, and a minority of the shootings are undetermined, meaning most officers recorded a reason for discharging their weapon. These observations suggest that cities with more high school edcuated individuals and a poverty level that is twice the US average will experience more shootings. One explanation for this is very large US cities like New York City and Chicago. These cities have large populations and a large wage gap between class levels, so the high school graduate percentage is supplemented by higher income classes and the poverty level is driven up by lower income classes, and because of the sheer size of these large cities, there are more cases of shootings, even if the rate per count of population is the same as other cities. Additionally, it can be safely assumed that the lower percentages of high school graduate cities are outliers in the dataset.

Plot 1c breaks down these incidents by raw count of victims in each race population, and Plot 1e takes these raw counts and converts them to overall percent of victims per total population of each race. These plots show these representations across the median income of each city where the shooting occurred. From these plots, a few insights can be gained. First, most shootings occur in cities where the median income is roughly between $40k - $50k per year. Second, out of all races in a given city, white people are shot the most, with the highest raw count. However, when compared to the overall population of each race, black are shot the most, at 0.0014% of the black population being shot, the equivalent of 1 in every 71,429 black people being shot by the police. In contrast, white people have a 0.0003% chance of being shot by the police, or 1 in every 333,334 white people. That means that black people are 4.7 times more likely to be shot by the police. Finally, the last major observation that can be made is that most Native Americans are shot in cities where the median income is roughly $45k per year, based on the large spike at the $45k mark.

Question 2

  1. In what ways do the analyses of hotspot states, hotspot cities, and racial distribution contribute to understanding the intersection of incidents of police use of force across diverse geographic areas?

Description of Analysis

Plot 1

The plot provides an overview of state hotspots throughout 2015, 2016, and 2017. These hotspots are identified by considering population data and using a metric called “shooting intensity.” States where shooting intensity exceeds a threshold of 3 are specifically highlighted using geom_label_repel, drawing attention to regions with heightened incidents. Additionally, annotations identify the top three hotspot states, ranked by shooting intensity and total fatalities. Creating this plot starts with summarizing fatalities across different years. Individual datasets are created for each year to focus on shooting intensity for each specific year. Shooting intensities are calculated by dividing the fatalities count by the population and multiplying by 1,000,000 to obtain a per million population rate.

Plot 2

The second animated plot provides a detailed portrayal of major city hotspots throughout 2015, 2016, and 2017. These hotspots are identified using the same “shooting intensity” metric as in the previous plot. Cities are marked using geom_point, and those with a high shooting intensity exceeding 28 are further emphasized using geom_label_repel. This layered depiction elucidates the presence of cities with heightened incidents within states with relatively lower overall intensity. The process for creating the plot starts with summarizing fatalities across different years, similar to plot 1. Individual datasets are created for each year, and shooting intensities are calculated similarly of that of plot 1.

Plot 3

The final animated heat-map provides a portrayal of racial intensity across states from 2015 to 2017, highlighting the disparities in racial dynamics across different regions of the United States. Constructed by amalgamating data on police fatalities and categorized by race, state, and year, alongside demographic information on the racial composition of cities and population estimates for each state the visualization ensures consistency in evaluating racial dynamics across diverse population sizes through the normalization of racial intensity metrics. Each frame of the animation corresponds to a specific year, with states represented along the y-axis and racial groups along the x-axis. Utilizing color gradients, the intensity of each racial group within a state is vividly illustrated, offering insights into the evolving landscape of racial disparities over time.

Plot 1

Plot 2

Plot 3

Discussion

Plot 1 and 2 general overview

The map analyzes the shooting intensity across all 50 states in the United States from 2015 to 2017. It visualizes this intensity by color-coding each state based on the number of shootings per million people. States with higher shooting rates are shaded darker, while those with lower rates are lighter. States with no fatalities are marked in gray.

Plot 1

In 2015, Wyoming, New Mexico, and Oklahoma stood out for their high shooting intensities, while California, Texas, and Florida led in total fatalities. In 2016, the states with the highest shooting intensities shifted to New Mexico, Alaska, and the District of Columbia, while California, Texas, and Florida maintained the highest fatalities. By 2017, Maine, Alaska, and Oklahoma had the highest shooting intensities, while California, Texas, and Florida continued to lead in fatalities.

Plot 2

This map visualizes police shootings’ intensity and resulting fatalities in major cities across the United States over the years per million of the population. Each dot on the map represents shootings in different cities. Larger dots indicate cities with higher shooting intensities, while smaller dots represent cities with lower shooting intensities. The labels on the map specifically highlight cities with shooting intensities exceeding 28 for that particular year. Notably, Chicago and Los Angeles are consistently among the cities with shootings exceeding this intensity threshold in two out of the three years.

Plot 3

Consistently high shooting intensities are seen among Black individuals and Native Americans. In 2015, Wisconsin had the highest intensity for Native Americans, with Nebraska also showing high intensity for Blacks. By 2016, spikes were noted for Native Americans in Wisconsin, Blacks in Nebraska, and Asians in Tennessee. In 2017, Asians faced heightened intensity in South Dakota, while Blacks in Alaska and Native Americans in Wisconsin also experienced high rates. Notably, Nebraska, Alaska, and Iowa were consistent hotspots for Black shootings, with Wisconsin being prominent for Native Americans. Asians faced significant intensity in South Dakota. Conversely, Whites and Hispanics were less targeted, with Whites having the lowest intensity.