Analyzing Trends in Cricket Games

Proposal

Author
Affiliations

PlotWizards

High-level Goal

In this project we will create a dashboard of a live, ongoing cricket match that updates its plots at regular, automatic intervals.

Goal Details

Motivation:

For this project we wanted to showcase our data visualization skill-set with the challenge of a live dataset, and what better live dataset to use than sports, so we went with our favorite sport - cricket. For the purposes of demonstration we won’t use an actual real-time data as we found that there would be unwanted associated costs, so we will instead use a large historical dataset where our back-end server will automatically update the data table for the front-end at regular intervals, essentially mocking a live match.

Description of datasets:

We’ll use two datasets:

  1. ODI_Match_Data.csv: Provides facts about the location and season of the cricket matches along with team information and the play results from each team member. We’ll need this one to investigate partnerships between batsmen. It’s dimensions are 155432 rows of data by 23 variable columns

    Variables that we’ll use:

    Variable Name Data type Description
    match_id double A unique identifier for each ODI cricket match.
    season character The season in which the match took place
    start_date character The date on which the match started.
    innings double The innings number (1st innings or 2nd innings).
    ball double A numeric representation of the ball number bowled in the innings.
    batting_team character The name of the batting team for the current innings.
    bowling_team character The name of the bowling team for the current innings.
    striker character The batsman who is currently facing the ball.
    non_striker character The batsman at the non-striker’s end.
    bowler character The bowler who is delivering the ball.
    runs_off_bat double The number of runs scored off the bat (excluding extras).
    extras double The total number of extra runs (wides, no-balls, byes, leg-byes, penalty) in the current ball.
    wides double The number of wide deliveries bowled in the current ball.
    noballs double The number of no-ball deliveries bowled in the current ball.
    byes double The number of byes scored in the current ball.
    legbyes double The number of leg-byes scored in the current ball.
  2. ODI_Match_info.csv: Overlaps in data with the above but provides information on the umpire, performance, and the city the match took place. We’ll need this one to analyze the batting and bowling performance of each player. It’s dimensions are 2380 rows of data by 18 variable columns.

    Variables that we’ll use:

    Variable Name Data type Description
    id double A unique identifier for each cricket match.
    season character The season in which the match took place
    date date The date on which the match was played.
    team1 character The name of the first cricket team participating in the match.
    team2 character The name of the second cricket team participating in the match.
    result character The result of the match (e.g., “normal,” “tie,” “no result”).
    winner character The winning team of the match.
    win_by_runs double The margin of victory in runs (0 for wickets, if not applicable).
    win_by_wickets double The margin of victory in wickets (0 for runs, if not applicable).

Research questions :

Below we have the questions we would like to answer with our dashboard along with what we will need to answer them including variables, plot representations, and insights we can extract from the visuals.

  1. How did the (a) scoring rate, (b) player batting performances, and (c) bowlers’ economy rates, evolve throughout the match?

    We chose this question because it encompasses critical aspects of a cricket match, including scoring dynamics, standout individual performances, bowling efficiency, partnerships’ influence on runs, and overall player contributions, providing a holistic view of the match progression.

    a. Scorecard summary:

    Variables: Overs (X-axis), Runs Scored per Over (Y-axis) - from ODI_Match_info.csv

    Plot Representation: Line chart or bar chart

    Insights: Visualize the scoring rate throughout the innings, show periods of acceleration or deceleration in runs scored.

    b. Batting Performance:

    Variables: Batsman Name (X-axis), Runs Scored (Y-axis) - from ODI_Match_info.csv

    Plot Representation: Bar chart or stacked bar chart

    Insights: Compares runs scored by each batsman, highlighting top performers.

    For data wrangling: season, batsman, runs scored..

    c. Bowler’s economy rates:

    Variables: Bowler Name (X-axis), Economy Rate (Y-axis)

    Plot Representation: Box plot or violin plot

    Insights: Shows the distribution of economy rates among bowlers, indicating variations in performance.

  2. How did (a) partnerships influence total runs, and (b) what insights do team performances, player contributions and live updates provide about the match dynamics?

    We selected this question to highlight the significance of partnerships in contributing to total runs, while also delving into team performances and individual player contributions, offering insights into strategic game play and the match’s overall dynamics.

    a. Partnerships Overview:

    Variables: Batsman Pair (X-axis), Partnership Runs (Y-axis)

    Plot Representation: Network graph or line chart

    Insights: Highlights significant partnerships and their impact on total runs.

    b.

    • Player Comparison (merged from other questions):

      Variables: Player Name (X-axis), Batting/Bowling Statistics (Y-axis)

      Plot Representation: Radar chart or spider chart

      Insights: Provides a comprehensive comparison of players’ performances across multiple metrics.

    • Team Comparison (merged from other questions):

      Variables: Team Name (X-axis), Runs Scored/Wickets Taken (Y-axis)

      Plot Representation: Pie chart or half-eye plot

      Insights: Visualizes the proportion of runs scored or wickets taken by each team, offering a quick comparison view.

    • Dynamic Updates (merged from other questions):

      Variables: Live Data (e.g., Score, Commentary, Player Stats)

      Plot Representation: Live dashboard with real-time updates

      Insights: Access to latest match information for enhanced engagement.

The data visualization methods are subject to change while we analyze the data.

Plan of Attack

Mocking Live Data :

To avoid unnecessary costs associated with real-time data, we will split the data into two parts: past data and live data.

The past data will include information from years 2002 to 2022, while the live data will consist of data from the year 2023. Each entry from 2023 will be read from the actual CSV file and entered into a database table with an interval of 10 to 20 seconds between two consecutive entries. These entries will be considered as live data and will be sent to the API caller.

Diagram of plan:

Figure 1: Diagram of the methodology we’ll implement in this project.

Back-end APIs:

1) Batting Performance API:

Endpoint: /batting-performance

Parameters:

  • match_id: Filter data for a specific match.

  • player_name: Filter data for a specific player.

Response:

  • Returns data for batting performance including runs scored by each batsman.

API:

  • GET /batting-performance?match_id={id}&player_name={pname}

2) Bowling Performance API:

Endpoint: /bowling-performance

Parameters:

  • match_id: Filter data for a specific match.

  • player_name: Filter data for a specific player.

Response:

  • Returns data for bowling performance including wickets taken by each bowler and their economy rate.

API:

  • GET /bowling-performance?match_id={id}&player_name={pname}

3) Partnerships API:

Endpoint: /partnerships Parameters:

  • match_id: Filter data for a specific match.

Response:

  • Returns data for partnerships between batsmen.

API:

  • GET /partnerships?match_id={id}

4) Player Performance Tracking API:

Endpoint: /player-performance

Parameters:

  • match_id: The ongoing match ID.

  • player_name: The name of the player.

Response:

  • Returns real-time updates for the player’s performance including runs scored, wickets taken, strike rate, economy rate, etc.

API:

  • GET /player-performance?match_id={ongoing-id}&player_name={pname}

These APIs are subject to change, additions, or removals as we analyze the data.

Task Name Status Assigned to Due Priority Task Summary
Backend: JDBC Connectivity Srinivasan, Mohit April 5th High Connect the database, MySQL, to JAVA using JDBC Connectivity
Back-end: API Creativity Srinivasan, Mohit April 12th High Create all the necessary APIs mentioned in the proposal. Addition/changes could be made.
Converting Dataset into database entries Tejas, Mohit April 7th High Merge both the .csv files and put that data in MySQL
Mocking live data Mohit, Tejas April 12th Low Try to mock the data from 2023 as live data
Data Wrangling Anjani, Nandini, Alex April 5th High Merging datasets and cleaning them
Designing the initial draft of the plots (Question 1) Alex, Navya April 10th High Designing basic plots for Question 1(complex plots will be done later)
Designing the initial draft of the plots (Question 2) Anjani, Nandini April 10th High Designing basic plots for Question 2(complex plots will be done later)
Deciding on more complex plots All April 15th-22th High Complex plots will be planned and crafted for both the questions
Integrating frontend with the backend through API calls Srinivasan, Mohit April 15th -17th Low Calling APIs hosted on the backend in the frontend.
Quarto Dashboards for the initial draft plots Tejas, Anjani, Nandini, Navya, Alex April 17th High Create dashboards for all the plots (simple and complex) using quarto dashboards.

Repository Organization

Our github repository will remain mostly unchaged from the defualt. We will insert our data files into the data/ folder along with a corresponding readme.md file. The frontend of the project will be saved under index.qmd. The final repository will deployed on github pages.

References

  1. Utkarsh Tomar. (2023). ODI Men’s Cricket Match Data (2002-2023) [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/3780212