Tucson Car Collision Analysis

Proposal

Tucson Car Collision Analysis
Author
Affiliation

Data Dazzlers:
Sanja Dmitrovic, Jiayue He, Vidhyananth Sivashanmugam,
Naitik Shah, Varun Soni, Mohammad Ali Farmani

School of Information, University of Arizona

Introduction

In response to the critical issue of road safety in Tucson, we propose a project that utilizes the comprehensive accident crash data set provided by the Tucson Police Department. Our aim is to develop an interactive spatiotemporal visualization to highlight patterns, trends, and hot-spots of traffic accidents in the city. This initiative seeks to leverage data for societal benefit, offering insights that can contribute to improved safety measures and awareness within the Tucson community.

Dataset

This dataset is taken from the GIS Data from the City of Tucson, which displays Tucson Police’s publicly-available records of vehicle collisions from March 2022 to the present. Variables in this dataset include date of collision, injury severity, manner of collision, if the collision was fatal, etc. A full list of these variables is provided below.

Column Name Data Type Unique 5 Values
AccidentID numeric 1704080490.00, 1707080430.00, 1710020646.00, 1710110121.00, 1711030505.00
Dupe integer 1, NA, NA, NA, NA
Agency character TPD , TPD, NA, NA, NA
CollisionDate character 2017/04/08 00:00:00+00, 2017/07/08 00:00:00+00, 2017/10/02 00:00:00+00, 2017/10/11 00:00:00+00, 2017/11/03 00:00:00+00
YearOccu integer 2017, 2018, 2021, 2019, 2020
Month character April, July, October, November, January
MonthOccu integer 4, 7, 10, 11, 1
MonthAbbr character Apr, Jul, Oct, Nov, Jan
TimeOccu integer 1900, 1600, 2000, 700, 1700
Day character Saturday, Monday, Wednesday, Friday, Tuesday
YeartoDate character No, Yes, NA, NA, NA
YtdDupl integer 0, 1, NA, NA, NA
TrafficOffense numeric 3001.00, 2101.00, 3002.00, 3112.00, 3202.00
OffenseDesc character Fatal Motor Vehicle Accident: With Pedestrian, Driving Under the Influence: Fatal Accident, Fatal Motor Vehicle Accident: With Another Motor Vehicle, Personal Injury Motor Vehicle Traffic Accident: With Another Motor Vehicle Leaving the Scene, Property Damage Motor Vehicle Traffic Accident: With Another Motor Vehicle
InjurySeverity character Fatal Injury, Possible Injury, No Injury, Non-Incapacitating Injury, Incapacitating Injury
CollisionType character , Vehicle / Vehicle, Vehicle / Motorcycle, Single Vehicle, Vehicle / Pedestrian
CollisionManner character Unknown, Angle (Front to Side), Rear-End (Front to Back), Single Vehicle, Head-On (Front to Front)
Division character South, East, West, Midtown, Not Recorded
Squad character , 2, 6, 1, 5
Ward character 5, 2, 3, 6, 4
CollisionNeighborhd character T103, T401, T204, T304, T408
AccidentLocation character , PRINCE RD & FLOWING WELLS RD, SPEEDWAY BL & BRYANT AV, STONE AV & ROGER RD, CAMP LOWELL DR & SWAN RD
CollisionDirection character , Unknown At PRINCE RD , West From STONE AV , Unknown At CAMP LOWELL DR , West From JESSICA AV N
Distance character , Approximately 50 Feet, Approximately 150 Feet, Approximately 30 Feet, Approximately 100 Feet
HitandRun character No, Yes, , NA, NA
Pedestrian character No, Yes, NA, NA, NA
FatalCollision character Yes, No, NA, NA, NA
IntersectionRelated character No, Yes, NA, NA, NA
TrafficControlDevice integer 1, 0, NA, NA, NA
WorkZone integer 0, 1, NA, NA, NA
ViolationSpeed character No, Yes, NA, NA, NA
ViolationTooClose integer 0, 1, NA, NA, NA
ViolationTrafficControlDevice integer 1, 0, NA, NA, NA
ViolationTurning integer 0, 1, NA, NA, NA
ViolationWrongWay integer 0, 1, NA, NA, NA
ViolationLane integer 0, 1, NA, NA, NA
ViolationCrosswalk integer 0, 1, NA, NA, NA
ViolationFailureToYield integer 0, 1, NA, NA, NA
ViolationAggressiveDriving integer 0, 1, NA, NA, NA
OperatorImpaired character No, Yes, NA, NA, NA
OperatorDistracted character No, Yes, NA, NA, NA
XCoordinate numeric 993484.31, 1021945.49, 1000255.62, 1016600.14, 1054047.37
YCoordinate numeric 426065.59, 453977.15, 464006.13, 453793.27, 422146.31
TotalFatalities integer 1, 2, 0, 3, NA
TotalInjuries integer 1, 2, 0, 3, 6
TotalUnits integer 3, 2, 1, 5, 4
Latitude numeric 32.16779, 32.24378, 32.27190, 32.24341, 32.15539
Longitude numeric -110.9682, -110.8753, -110.9452, -110.8926, -110.7726
LatitudePublic numeric 32.16779, 32.24350, 32.27213, 32.24341, 32.15526
LongitudePublic numeric -110.9682, -110.8756, -110.9458, -110.8926, -110.7727
ESRI_OID integer 1, 2, 3, 4, 5

Why this dataset?

Choosing the Tucson accident crash dataset for project is a decision rooted in both relevance and practicality. As students at the University of Arizona, located in the heart of Tucson, this dataset offers a unique opportunity to study a subject that directly impacts the community. Analyzing traffic collision data within the city not only allows us to apply the theoretical knowledge we’ve gained in our data analysis courses but also provides insights into urban safety and transportation issues that affect peers and fellow residents. This project is not just an academic exercise; it’s a chance to contribute to a safer, more informed Tucson.

Objective

Our project’s primary goal is to develop a dynamic and user-friendly tool that offers detailed insights into the frequency, severity, and causes of traffic accidents across different times and locations in Tucson. By enabling various stakeholders, including city planners, policy-makers, local residents, and fellow students, to access and interact with our findings, we aspire to contribute to informed decision-making processes and the implementation of effective road safety measures.

Overall Project Plan

  1. Data Preparation and Analysis :
    • Begin with a thorough cleaning and pre-processing of the Tucson accident crash dataset to ensure accuracy and usability.
    • Conduct a comprehensive analysis focusing on temporal trends (time of day, day of the week, months), spatial distribution (accident hotspots), and accident severity (fatalities, injuries).
  2. Development of Interactive Visualizations :
    • Utilize R Shiny for the creation of an interactive dashboard, allowing users to filter and explore the data based on specific criteria (e.g., time period, accident severity).
    • Implement Leaflet for interactive maps to display spatial data and identify accident hotspots within Tucson using the latitude and longitude coordinates in the dataset.
    • Use ggplot2 and Plotly for creating dynamic and static visualizations that can be adjusted according to user input.
  3. Accessibility and User Experience :
    • Design the dashboard with a focus on accessibility, ensuring that our visualizations are informative and usable for all.
    • Incorporate user-friendly features, such as slicer to highlight key insights and facilitate understanding.
  4. Outreach and Impact :
    • Plan to present our findings to local authorities and community groups to foster discussions on improving road safety.
    • Publish our interactive dashboard online, making it accessible to the public for educational and awareness purposes.

Analysis Plan

The primary objective of our project is to delve deep into the Tucson accident crash dataset to uncover underlying patterns and factors contributing to road accidents within Tucson. By leveraging the extensive data provided by the Tucson Police Department, we aim to analyze how various dimensions such as day, time, injury severity, number of fatalities, and type of violation are all related to each other.

The two specific questions we aim to answer are:

Question 1

Members: Sanja Dmitrovic, Jiayue He, Vidhyananth Sivashanmugam

Does day of the week and/or time of day affect severity and the number of accidents?

Variables to use:

  • TimeOccu (time at which accident occurred)

  • Day (day of week accident occurred)

  • InjurySeverity (how serious the injury caused by accident was)

Some additional variables may be used such as:

  • OperatorImpaired (if the operator was impaired due to substances or other reasons)

  • OperatorDistracted (if the operator was distracted)

These variables may help us explore why certain days of the weeks or times of day lead to more accidents. For example, our hypothesis is that weekend nights would lead to more accidents due to more usage of drugs and alcohol by the driver. This factors could also explain why some accidents are more severe than others.

Plots to create: To answer this question, we plan to create daily and hourly time series plots to easily observe trends in accident frequency and injury severity. In the Shiny app, there can be a time series bar where the user drags the button and the plot shows the different numbers of accidents in different hours in a day and days in the week.

Question 2

Members: Naitik Shah, Varun Soni, Mohammad Ali Farmani

What is the relationship between the type of violation (e.g., failure to yield, aggressive driving) and if the accident resulted in a fatality?

Variables to use:

  • FatalCollision (yes or no if accident resulted in fatality or not)

  • TotalFatalities (number of fatalities reported as an integer)

  • All Violation variables which report yes or no (i.e., ViolationSpeed, ViolationTooClose, ViolationTrafficControlDevice, ViolationTurning, ViolationWrongWay, ViolationLane, ViolationFailuretoYield, ViolationCrosswalk, ViolationAggressiveDriving).

Plots to create: For the Shiny app, we will create a series of stacked bar plots that show the number of accidents based the type of violation. Pie charts may be included to provide a high-level summary of the percentage of crashes per violation type.

In the end, we aim for the user to hover over various areas in Tucson and see the time series plots and bar plots that give insights to the questions posed above.

Team Roles

Our team will leverage its diverse skill sets to ensure the success of the project. Although groups have been assigned to each analysis question, all members will perform data analysis, visualization design, dashboard development, and project management. Regular meetings will be scheduled to review progress, address challenges, and ensure that the project remains on track to meet its objectives.

Expected Outcomes

We anticipate that our project will not only enhance our understanding and application of data analysis techniques but also provide valuable insights into traffic accident trends in Tucson. By highlighting areas and times of high risk, our work aims to support efforts to reduce accidents and improve road safety for all city residents.

Conclusion

With a deep commitment to applying our analytical skills for the betterment of our community, we believe that this project represents a meaningful opportunity to make a tangible difference in the area of road safety in Tucson. We are excited about the potential of our interactive spatiotemporal visualization tool to illuminate insights that can lead to positive changes in our community.

Timeline to Completion

  • Week of 01 April
    • Have proposal prepared for peer review on 03 April.
    • Provide peer review for other groups on 03 April.
  • Week of 08 April
    • Make changes to proposal based on peer review for instructor review and resubmit by end of day 08 April.
  • Week of 15 April
    • Split group members into Q1 and Q2 teams and come up with plans to answer each question.
    • Do individual exploration of the data and start first drafts of visualizations to compare with other team members in/between each group.
  • Week of 22 April, 29 April
    • Create presentation-ready visualizations.
    • Implement these visualizations into the Shiny app and finalize dashboard app.
    • Complete write up.
    • Finalize presentation slide deck.
    • Compile website.
    • Practice presentation using timer.
  • Week of 06 May
    • Submit all materials by this deadline.
    • Give presentation in class.