Tracing the COVID-19 Trajectory
Using Shiny
Introduction
Our project leverages a comprehensive COVID-19 dataset from Kaggle to unlock insights into the pandemic’s dynamics through advanced data visualization, utilizing the Shiny app package
for interactive user experiences. By analyzing case counts, recovery rates, mortality, and vaccination data, we aim to reveal trends and correlations across different regions and periods. This endeavor transcends academic boundaries, aiming to provide actionable insights for policymakers and the public. Through our analysis, which includes dynamic and interactive visualizations made possible by Shiny, we aspire to transform complex data into clear, insightful narratives, aiding in the informed response to this global health crisis.
Dataset
# A tibble: 6 × 5
Admin2 Date Case `Country/Region` `Province/State`
<chr> <date> <dbl> <chr> <chr>
1 Autauga 2020-01-22 0 US Alabama
2 Autauga 2020-01-23 0 US Alabama
3 Autauga 2020-01-24 0 US Alabama
4 Autauga 2020-01-25 0 US Alabama
5 Autauga 2020-01-26 0 US Alabama
6 Autauga 2020-01-27 0 US Alabama
# A tibble: 6 × 5
Admin2 Date Case `Country/Region` `Province/State`
<chr> <date> <dbl> <chr> <chr>
1 Autauga 2020-01-22 0 US Alabama
2 Autauga 2020-01-23 0 US Alabama
3 Autauga 2020-01-24 0 US Alabama
4 Autauga 2020-01-25 0 US Alabama
5 Autauga 2020-01-26 0 US Alabama
6 Autauga 2020-01-27 0 US Alabama
# A tibble: 6 × 2
Country Population
<chr> <dbl>
1 Afghanistan 39835428
2 Albania 2872934
3 Algeria 44616626
4 Andorra 77354
5 Angola 33933611
6 Antigua and Barbuda 98728
country_vaccinations_manufacture <- read_csv('./data/country_vaccinations_by_manufacturer.csv')
head(country_vaccinations_manufacture)
# A tibble: 6 × 4
location date vaccine total_vaccinations
<chr> <date> <chr> <dbl>
1 Argentina 2020-12-29 Moderna 2
2 Argentina 2020-12-29 Oxford/AstraZeneca 3
3 Argentina 2020-12-29 Sinopharm/Beijing 1
4 Argentina 2020-12-29 Sputnik V 20481
5 Argentina 2020-12-30 Moderna 2
6 Argentina 2020-12-30 Oxford/AstraZeneca 3
# A tibble: 6 × 5
Date Country Confirmed Recovered Deaths
<date> <chr> <dbl> <dbl> <dbl>
1 2020-01-22 Afghanistan 0 0 0
2 2020-01-23 Afghanistan 0 0 0
3 2020-01-24 Afghanistan 0 0 0
4 2020-01-25 Afghanistan 0 0 0
5 2020-01-26 Afghanistan 0 0 0
6 2020-01-27 Afghanistan 0 0 0
Description
The datasets employed for this project are derived from an extensive collection on Kaggle, curated by “imdevskp,” which meticulously compiles data pertaining to the COVID-19 pandemic from multiple global sources. This collection encompasses a variety of datasets, each tailored to different analytical needs: day-wise global statistics offer a temporal view of the pandemic’s progression, country-wise latest data provide a snapshot of the current status in each country, and detailed data grouped by country and date facilitate a comparative analysis over time.
Additionally, there is a dataset specifically focused on the USA, breaking down statistics to the county level, which allows for a more granular examination within the United States. These datasets collectively encompass critical metrics such as case counts, deaths, recoveries, and vaccination figures, equipping researchers with a robust toolkit for a comprehensive and nuanced analysis of the pandemic’s multifarious impacts and trends.
To enhance the interactive exploration and visualization of this data, we are utilizing the shiny package in R, a powerful tool for creating interactive web applications. The Shiny app will allow users to dynamically filter, explore, and visualize the data, offering an immersive and user-driven analysis experience that highlights the depth and complexity of the COVID-19 pandemic data.
The choice of these datasets was motivated by their depth, regular updates, and the potential insights they offer into the global and localized impacts of COVID-19.
COVID-19 Dataset Source Link And vaccination Dataset Source Link
Variable | Description |
---|---|
Date | Date of the record, crucial for any time-series analysis. |
Country/Region | Name of the country or region, essential for geographic comparisons. |
Confirmed | Total confirmed COVID-19 cases, a primary metric for pandemic assessment. |
Deaths | Total deaths due to COVID-19, important for understanding the fatality rate. |
Recovered | Total recoveries from COVID-19, indicative of the recovery rate. |
Active | Total active COVID-19 cases, key for current pandemic status assessment. |
New cases | New cases on a given date, vital for identifying trends over time. |
New deaths | New deaths on a given date, important for tracking fatality trends. |
New recovered | New recoveries on a given date, useful for monitoring recovery trends. |
Deaths / 100 Cases | Percentage of deaths per 100 confirmed cases, provides context to the death toll. |
Recovered / 100 Cases | Percentage of recoveries per 100 confirmed cases, gives insight into the recovery trend. |
Deaths / 100 Recovered | Percentage of deaths per 100 recovered cases, helps understand the severity among recovered cases. |
total_vaccinations | Total vaccinations administered, crucial for vaccination trend analysis. |
people_vaccinated | Total individuals vaccinated, key for understanding the vaccination reach. |
people_fully_vaccinated | Total individuals fully vaccinated, important for assessing herd immunity potential. |
daily_vaccinations | Number of vaccinations administered per day, essential for daily trend analysis. |
total_vaccinations_per_hundred | Total vaccinations per 100 people, allows for normalized comparison between regions. |
people_vaccinated_per_hundred | People vaccinated per 100 people, useful for comparing vaccination rates. |
people_fully_vaccinated_per_hundred | People fully vaccinated per 100 people, important for understanding comprehensive vaccination progress. |
Questions
Question 1
How has the COVID-19 pandemic trended globally over time, and what correlations can be observed between different pandemic metrics (like cases and recoveries)?
Question 2
What is the relationship between total vaccinations and the reduction in active COVID-19 cases across different Counties in USA?
Analysis plan
For the first question:
Dataset: We will utilize the day_wise_data
dataset, which provides daily global statistics on COVID-19.
Focus: Our refined analysis will focus on daily total cases, recoveries, and deaths to observe the pandemic’s evolution over time without losing granularity in the data.
Visualizations:
Time-Series Plots: We’ll employ time-series plots to illustrate the daily evolution of total cases, recoveries, and deaths, with separate line graphs for each metric and a combined graph for direct comparison.
Daily Percentage Changes: To capture the pandemic’s dynamics, we’ll calculate and plot daily percentage changes for each metric. Additionally, we’ll compute and visualize correlation coefficients to explore the relationships between these metrics over time.
Additional Analysis:
Rolling Average Analysis: While our primary focus will be on daily data, we’ll also use rolling averages to identify broader trends, ensuring we capture both immediate changes and more extended patterns.
Anomaly Detection: We plan to identify and analyze any unusual data points or trends, which could indicate significant pandemic events or turning points.
Interactive Visualization with Shiny:
To make our analysis more interactive and user-friendly, we will utilize the shiny package in R to develop a web application. This application will allow users to dynamically interact with the data through filters such as year, country name, and regions. Users can select specific parameters to refine the visualizations, providing a customized view that can highlight trends and patterns of interest within the global scope of the COVID-19 pandemic.
This dynamic approach will enable users to explore the data in a more granular and personalized manner, enhancing the analytical value of the visualizations.
For the second question:
Dataset Utilization: We will employ the usa_county_wise_data
dataset, which is rich in localized COVID-19 data for the United States, encompassing each county’s case and vaccination figures. And vaccination
dataset provides the details of vaccian all over the world.
Primary Focus: We aim to analyze and visualize how COVID-19 cases and vaccination rates have evolved over time across U.S. counties and globally, identifying patterns and correlations.
Visualizations
Choropleth Maps: We will craft choropleth maps to visually represent the density or intensity of COVID-19 cases and vaccination rates across U.S. counties. This will enable a straightforward interpretation of regional variances and trends.
Bubble Maps: To enhance our spatial analysis, bubble maps will be constructed where each bubble’s size indicates either the number of cases or vaccination rates in counties, offering a dual perspective on the pandemic’s magnitude and the vaccination effort.
Time-Lapse Visualizations: Implementing time-lapse maps will allow us to observe and analyze the progression of the pandemic and vaccination rollout over time, providing dynamic insights into how the situation has evolved across different regions.
Plan of Action
Task Name | Status | Assignee | Due | Priority | Summary |
---|---|---|---|---|---|
Proposal Description | Completed | Rajitha Reddy, Omid Zandi | 03 Apr 2024 | Moderate | Concise summary outlining the main idea of a proposal |
Dataset | Completed | Devendran Vemula , Gowtham GopalaKrishnan | 03 Apr 2024 | Moderate | Uploading and Loading the Dataset |
Questions | Completed | Everyone | 03 Apr 2024 | Moderate | Team consolidates findings to generate comprehensive questions aimed at exploring deeper insights |
Analysis for Question 1 | Completed | Devendran Vemula , G Sai Laasya , Rajitha Reddy | 03 Apr 2024 | Moderate | The team works together to analyze the data, combining their skills and viewpoints to gain a thorough understanding of the information. |
Analysis for Question 2 | Completed | Mrunali Yadav, Omid Zandi, Gowtham GopalaKrishnan | 03 Apr 2024 | Moderate | The team conducts analysis for the project through a collaborative approach, utilizing various analytical methods, tools, and expertise to thoroughly examine the data and derive meaningful insights. |
TimeLine and Work Flow | Completed | G Sai Laasya Mrunali Yadav | 03 Apr 2024 | Moderate | The work flow of the project |
Proposal Peer Review | Completed | Everyone | 03 Apr 2024 | Moderate | Reviewing Other Teams |
Working on the Bubble Map | Completed | G Sai Laasya, Rajitha Reddy | 06 Apr 2024 | Moderate | Making a bubble map on the COVID dataset |
Spatial Map | Completed | Omid Zandi | 10 Apr 2024 | Moderate | Creating the spatial maps for the number of cases |
Time Lapse Visualization | Completed | Devendran Vemula | 12 Apr 2024 | Moderate | The progress on the COVID cases or Vaccination |
Working on the Index.qmd file | Incomplete | G Sai Laasya, Rajitha Reddy, Mrunali Yadav | 15 Apr 2024 | High | Writing the analysis, summary part |
Adding aesthethics to ShinyApp | Incomplete | Devendran Vemula, Omid Zandi | 17 Apr 2024 | Moderate | Adding the aesthethics |
Presnetation | Incomplete | Devendran Vemula, Gowtham GopalaKrishnan | 20 Apr 2024 | High | Creating the presentation |
Repo Organization
The following folders comprise the project repository
- .github/: This folder contains GitHub-related assets such as action workflows and issue templates, which are used to automate tasks and standardize the issue-reporting process.
- _extra/: This folder stores “frozen” environment configuration files, detailing the project’s dependencies and setup specifics, enabling consistent reproduction of the development environment.
- _freeze/: There are frozen environment files in this directory that contain detailed information on the dependencies and environment setup of the project.
- data/: Dedicated to project data storage, this folder contains essential files like input data, datasets, and other critical information that the project relies on.
- images/:This is the repository for all visual content, including charts, diagrams, and screenshots, used throughout the project’s documentation and presentations.
- .gitignore: A configuration file that instructs Git on which files and directories to exclude from version control, ensuring efficient management of the repository.
- README.md: This is the project’s main information document, providing a comprehensive guide to setup, usage, and a general overview of the project’s goals and scope.
- _quarto.yml: This Quarto configuration file controls how Quarto documents are generated and displayed, allowing for customized output and various document settings.
- about.qmd: A Quarto Markdown file offering further project context, detailing the project’s objectives, background information, contributor details, and other related content.
- index.qmd: The main documentation page for the project, created with Quarto. It contains a detailed presentation of the project, including code snippets and visual elements.
- app.R: As our project involves a Shiny app or a similar interactive application, the app.R file would typically contain the R code needed to build and run the app, including UI elements, server logic, and reactive components.
:::