Unveiling Trends in Data Breaches and Data Hacks

INFO 526 - Project Final

Datavista:
Akash Srinivasan, Abhishek Kumar, Divya Dhole, Noureen Mithaigar, Gowtham Theeda, Lakshmi Neharika Anchula

Introduction

  • Data Breaches Definition: A data breach refers to the unauthorized access, disclosure, or acquisition of sensitive or confidential information. It occurs when cybercriminals gain access to a system or network and extract valuable data without authorization.

  • Impact: Data breaches lead to financial losses, reputational damage, legal issues, and privacy breaches for individuals, businesses, and organizations. They also erode trust among stakeholders, leading to long-term consequences.

Data set

  • Data_Breaches_LATEST.CSV The “World’s Biggest Data Breaches & Hacks” dataset from informationisbeautiful.net spans from 2003 to 2023, providing key details such as year, date, sector, method, and data sensitivity for analyzing breach trends and impacts across sectors.
  • Examples of key variables in this data set are: -Organisation -Records Lost. -Year -Date -Sector -Method -Data Sensitivity

Project Approach

  • Data Collection: Obtain “World’s Biggest Data Breaches & Hacks” dataset from informationisbeautiful.net (2003-2023) with variables: organization, records lost, year, date, sector, method, and data sensitivity.

  • Data Cleaning: Ensure integrity by handling missing values, standardizing formats, and removing duplicates.

  • Exploratory Data Analysis (EDA): Uncover breach patterns and trends using statistical methods and visualizations.

Question 1

(General Assessment)

  • How have information breaches advanced over past decade(2013-2023), and what are the patterns with respect to their recurrence, seriousness, and affect over distinctive businesses?

Visualizing Information Breaches: A Multifaceted Approach (2013-2023)

  • Utilizing diverse visualizations: Employing 3D scatter plots, time series analysis, and animations.

  • 3D Scatter Plot: Detailed visualization using Plotly.

  • Time Series Analysis: Utilizing ggplot2 for temporal trends.

  • Vulnerable Sectors: Sectors with consistently higher breach lines or sudden spikes in the facet plot might be more susceptible and require additional security measures.

  • Year-to-Year Fluctuations: Identifying sectors with significant year-to-year variations in breaches can help focus resources on those sectors during periods of heightened risk.

  • Overall Security Landscape: The overall trend in total breaches can indicate a broader improvement or decline in data security practices over time.

Question 2

(Vulnerability Assessment)

Which sectors or types of data (e.g., personal, financial) are particularly susceptible to different breach methods like hacking or insider jobs, and what are the consequential impacts on both businesses and individuals?

  • Dominant Breach Methods: By observing the height of each stacked bar segment within a sector, you can identify the methods responsible for the most significant data loss in that sector.
  • Vulnerability Across Sectors: By comparing the total heights of stacked bars across different sectors, you can see which sectors experienced the most significant data loss overall, potentially suggesting higher vulnerability.

  • Year-to-Year Comparison: By observing the lines for each year, you can identify sectors where breach methods and the number of records lost fluctuate significantly.
  • Evolving Trends: You can see if specific methods become more or less prominent over time within a sector, potentially indicating a shift in attacker tactics or changing vulnerabilities.

  • Focus: It highlights the sectors that suffered the biggest data breaches in 2023 based on the total number of records lost.
  • Animation: The animation allows you to see how the ranking of sectors by records lost might change over time (assuming you have data for multiple years).

Conclusion

  • Identifying Sector Weaknesses: We pinpointed industries most vulnerable to cyberattacks, like healthcare and finance. These sectors require immediate security upgrades.

  • Prioritizing Breach Severity: We distinguished between minor leaks and data floods. Focusing on breaches with the most sensitive or critical data is crucial.

  • Balancing Security & Privacy: We explored the delicate balance between robust security and customer access.

References

  • Verizon Data Breach Investigations Report (Annual Report) :A comprehensive report analyzing data breaches and cybersecurity incidents from various industries worldwide

  • Industry Reports/White Papers: Data Breach Trends and Predictions (Annual Report) by Experian Provides insights into data breach trends and predictions based on analysis of real-world incidents.

  • Academic/Journal Articles: Leukfeldt, E. R., & Holt, T. J. (2020). Cybercrime by cash, corporations or nation-states