Tracing the COVID-19 Trajectory

Using Shiny

The project analyzes COVID-19 trends and vaccination impacts globally and in the USA, using visualizations to reveal key patterns and insights.

Authors

Affiliation

GraphGeeks - Devendran, Omid, G Sai Laasya

Rajitha, Gowtham, Mrunali

library(tidyverse)

Introduction

Our project leverages a comprehensive COVID-19 dataset from Kaggle to unlock insights into the pandemic’s dynamics through advanced data visualization, utilizing the Shiny app package for interactive user experiences. By analyzing case counts, recovery rates, mortality, and vaccination data, we aim to reveal trends and correlations across different regions and periods. This endeavor transcends academic boundaries, aiming to provide actionable insights for policymakers and the public. Through our analysis, which includes dynamic and interactive visualizations made possible by Shiny, we aspire to transform complex data into clear, insightful narratives, aiding in the informed response to this global health crisis.

Dataset

# Dataset Head
us_confirmed <- read_csv("./data/us_confirmed.csv")
print(head(us_confirmed))

# A tibble: 6 × 5
  Admin2  Date        Case `Country/Region` `Province/State`
  <chr>   <date>     <dbl> <chr>            <chr>           
1 Autauga 2020-01-22     0 US               Alabama         
2 Autauga 2020-01-23     0 US               Alabama         
3 Autauga 2020-01-24     0 US               Alabama         
4 Autauga 2020-01-25     0 US               Alabama         
5 Autauga 2020-01-26     0 US               Alabama         
6 Autauga 2020-01-27     0 US               Alabama

us_deaths <- read_csv("./data/us_deaths.csv")
print(head(us_deaths))

# A tibble: 6 × 5
  Admin2  Date        Case `Country/Region` `Province/State`
  <chr>   <date>     <dbl> <chr>            <chr>           
1 Autauga 2020-01-22     0 US               Alabama         
2 Autauga 2020-01-23     0 US               Alabama         
3 Autauga 2020-01-24     0 US               Alabama         
4 Autauga 2020-01-25     0 US               Alabama         
5 Autauga 2020-01-26     0 US               Alabama         
6 Autauga 2020-01-27     0 US               Alabama

populations <- read_csv('./data/populations.csv')
head(populations)

# A tibble: 6 × 2
  Country             Population
  <chr>                    <dbl>
1 Afghanistan           39835428
2 Albania                2872934
3 Algeria               44616626
4 Andorra                  77354
5 Angola                33933611
6 Antigua and Barbuda      98728

country_vaccinations_manufacture <-  read_csv('./data/country_vaccinations_by_manufacturer.csv')
head(country_vaccinations_manufacture)

# A tibble: 6 × 4
  location  date       vaccine            total_vaccinations
  <chr>     <date>     <chr>                           <dbl>
1 Argentina 2020-12-29 Moderna                             2
2 Argentina 2020-12-29 Oxford/AstraZeneca                  3
3 Argentina 2020-12-29 Sinopharm/Beijing                   1
4 Argentina 2020-12-29 Sputnik V                       20481
5 Argentina 2020-12-30 Moderna                             2
6 Argentina 2020-12-30 Oxford/AstraZeneca                  3

countries <- read_csv('./data/countries.csv')
head(countries)

# A tibble: 6 × 5
  Date       Country     Confirmed Recovered Deaths
  <date>     <chr>           <dbl>     <dbl>  <dbl>
1 2020-01-22 Afghanistan         0         0      0
2 2020-01-23 Afghanistan         0         0      0
3 2020-01-24 Afghanistan         0         0      0
4 2020-01-25 Afghanistan         0         0      0
5 2020-01-26 Afghanistan         0         0      0
6 2020-01-27 Afghanistan         0         0      0

Description

The datasets employed for this project are derived from an extensive collection on Kaggle, curated by “imdevskp,” which meticulously compiles data pertaining to the COVID-19 pandemic from multiple global sources. This collection encompasses a variety of datasets, each tailored to different analytical needs: day-wise global statistics offer a temporal view of the pandemic’s progression, country-wise latest data provide a snapshot of the current status in each country, and detailed data grouped by country and date facilitate a comparative analysis over time.
Additionally, there is a dataset specifically focused on the USA, breaking down statistics to the county level, which allows for a more granular examination within the United States. These datasets collectively encompass critical metrics such as case counts, deaths, recoveries, and vaccination figures, equipping researchers with a robust toolkit for a comprehensive and nuanced analysis of the pandemic’s multifarious impacts and trends.
To enhance the interactive exploration and visualization of this data, we are utilizing the shiny package in R, a powerful tool for creating interactive web applications. The Shiny app will allow users to dynamically filter, explore, and visualize the data, offering an immersive and user-driven analysis experience that highlights the depth and complexity of the COVID-19 pandemic data.
The choice of these datasets was motivated by their depth, regular updates, and the potential insights they offer into the global and localized impacts of COVID-19.

COVID-19 Dataset Source Link And vaccination Dataset Source Link

Variable	Description
Date	Date of the record, crucial for any time-series analysis.
Country/Region	Name of the country or region, essential for geographic comparisons.
Confirmed	Total confirmed COVID-19 cases, a primary metric for pandemic assessment.
Deaths	Total deaths due to COVID-19, important for understanding the fatality rate.
Recovered	Total recoveries from COVID-19, indicative of the recovery rate.
Active	Total active COVID-19 cases, key for current pandemic status assessment.
New cases	New cases on a given date, vital for identifying trends over time.
New deaths	New deaths on a given date, important for tracking fatality trends.
New recovered	New recoveries on a given date, useful for monitoring recovery trends.
Deaths / 100 Cases	Percentage of deaths per 100 confirmed cases, provides context to the death toll.
Recovered / 100 Cases	Percentage of recoveries per 100 confirmed cases, gives insight into the recovery trend.
Deaths / 100 Recovered	Percentage of deaths per 100 recovered cases, helps understand the severity among recovered cases.
total_vaccinations	Total vaccinations administered, crucial for vaccination trend analysis.
people_vaccinated	Total individuals vaccinated, key for understanding the vaccination reach.
people_fully_vaccinated	Total individuals fully vaccinated, important for assessing herd immunity potential.
daily_vaccinations	Number of vaccinations administered per day, essential for daily trend analysis.
total_vaccinations_per_hundred	Total vaccinations per 100 people, allows for normalized comparison between regions.
people_vaccinated_per_hundred	People vaccinated per 100 people, useful for comparing vaccination rates.
people_fully_vaccinated_per_hundred	People fully vaccinated per 100 people, important for understanding comprehensive vaccination progress.

Questions

Question 1

How has the COVID-19 pandemic trended globally over time, and what correlations can be observed between different pandemic metrics (like cases and recoveries)?

Question 2

What is the relationship between total vaccinations and the reduction in active COVID-19 cases across different Counties in USA?

Analysis plan

For the first question:

Dataset: We will utilize the day_wise_data dataset, which provides daily global statistics on COVID-19.

Focus: Our refined analysis will focus on daily total cases, recoveries, and deaths to observe the pandemic’s evolution over time without losing granularity in the data.

Visualizations:

Time-Series Plots: We’ll employ time-series plots to illustrate the daily evolution of total cases, recoveries, and deaths, with separate line graphs for each metric and a combined graph for direct comparison.
Daily Percentage Changes: To capture the pandemic’s dynamics, we’ll calculate and plot daily percentage changes for each metric. Additionally, we’ll compute and visualize correlation coefficients to explore the relationships between these metrics over time.

Additional Analysis:

Rolling Average Analysis: While our primary focus will be on daily data, we’ll also use rolling averages to identify broader trends, ensuring we capture both immediate changes and more extended patterns.
Anomaly Detection: We plan to identify and analyze any unusual data points or trends, which could indicate significant pandemic events or turning points.

Interactive Visualization with Shiny:

To make our analysis more interactive and user-friendly, we will utilize the shiny package in R to develop a web application. This application will allow users to dynamically interact with the data through filters such as year, country name, and regions. Users can select specific parameters to refine the visualizations, providing a customized view that can highlight trends and patterns of interest within the global scope of the COVID-19 pandemic.
This dynamic approach will enable users to explore the data in a more granular and personalized manner, enhancing the analytical value of the visualizations.

For the second question:

Dataset Utilization: We will employ the usa_county_wise_data dataset, which is rich in localized COVID-19 data for the United States, encompassing each county’s case and vaccination figures. And vaccination dataset provides the details of vaccian all over the world.

Primary Focus: We aim to analyze and visualize how COVID-19 cases and vaccination rates have evolved over time across U.S. counties and globally, identifying patterns and correlations.

Visualizations

Choropleth Maps: We will craft choropleth maps to visually represent the density or intensity of COVID-19 cases and vaccination rates across U.S. counties. This will enable a straightforward interpretation of regional variances and trends.
Bubble Maps: To enhance our spatial analysis, bubble maps will be constructed where each bubble’s size indicates either the number of cases or vaccination rates in counties, offering a dual perspective on the pandemic’s magnitude and the vaccination effort.
Time-Lapse Visualizations: Implementing time-lapse maps will allow us to observe and analyze the progression of the pandemic and vaccination rollout over time, providing dynamic insights into how the situation has evolved across different regions.

Plan of Action

Task Name	Status	Assignee	Due	Priority	Summary
Proposal Description	Completed	Rajitha Reddy, Omid Zandi	03 Apr 2024	Moderate	Concise summary outlining the main idea of a proposal
Dataset	Completed	Devendran Vemula , Gowtham GopalaKrishnan	03 Apr 2024	Moderate	Uploading and Loading the Dataset
Questions	Completed	Everyone	03 Apr 2024	Moderate	Team consolidates findings to generate comprehensive questions aimed at exploring deeper insights
Analysis for Question 1	Completed	Devendran Vemula , G Sai Laasya , Rajitha Reddy	03 Apr 2024	Moderate	The team works together to analyze the data, combining their skills and viewpoints to gain a thorough understanding of the information.
Analysis for Question 2	Completed	Mrunali Yadav, Omid Zandi, Gowtham GopalaKrishnan	03 Apr 2024	Moderate	The team conducts analysis for the project through a collaborative approach, utilizing various analytical methods, tools, and expertise to thoroughly examine the data and derive meaningful insights.
TimeLine and Work Flow	Completed	G Sai Laasya Mrunali Yadav	03 Apr 2024	Moderate	The work flow of the project
Proposal Peer Review	Completed	Everyone	03 Apr 2024	Moderate	Reviewing Other Teams
Working on the Bubble Map	Completed	G Sai Laasya, Rajitha Reddy	06 Apr 2024	Moderate	Making a bubble map on the COVID dataset
Spatial Map	Completed	Omid Zandi	10 Apr 2024	Moderate	Creating the spatial maps for the number of cases
Time Lapse Visualization	Completed	Devendran Vemula	12 Apr 2024	Moderate	The progress on the COVID cases or Vaccination
Working on the Index.qmd file	Incomplete	G Sai Laasya, Rajitha Reddy, Mrunali Yadav	15 Apr 2024	High	Writing the analysis, summary part
Adding aesthethics to ShinyApp	Incomplete	Devendran Vemula, Omid Zandi	17 Apr 2024	Moderate	Adding the aesthethics
Presnetation	Incomplete	Devendran Vemula, Gowtham GopalaKrishnan	20 Apr 2024	High	Creating the presentation

Repo Organization

The following folders comprise the project repository

.github/: This folder contains GitHub-related assets such as action workflows and issue templates, which are used to automate tasks and standardize the issue-reporting process.
_extra/: This folder stores “frozen” environment configuration files, detailing the project’s dependencies and setup specifics, enabling consistent reproduction of the development environment.
_freeze/: There are frozen environment files in this directory that contain detailed information on the dependencies and environment setup of the project.
data/: Dedicated to project data storage, this folder contains essential files like input data, datasets, and other critical information that the project relies on.
images/:This is the repository for all visual content, including charts, diagrams, and screenshots, used throughout the project’s documentation and presentations.
.gitignore: A configuration file that instructs Git on which files and directories to exclude from version control, ensuring efficient management of the repository.
README.md: This is the project’s main information document, providing a comprehensive guide to setup, usage, and a general overview of the project’s goals and scope.
_quarto.yml: This Quarto configuration file controls how Quarto documents are generated and displayed, allowing for customized output and various document settings.
about.qmd: A Quarto Markdown file offering further project context, detailing the project’s objectives, background information, contributor details, and other related content.
index.qmd: The main documentation page for the project, created with Quarto. It contains a detailed presentation of the project, including code snippets and visual elements.
app.R: As our project involves a Shiny app or a similar interactive application, the app.R file would typically contain the R code needed to build and run the app, including UI elements, server logic, and reactive components.

:::

--- title: "Tracing the COVID-19 Trajectory" subtitle: "Using **Shiny**" author: - name: "**GraphGeeks** - Devendran, Omid, G Sai Laasya" - name: "Rajitha, Gowtham, Mrunali" affiliations: - name: "School of Information, University of Arizona" description: "The project analyzes COVID-19 trends and vaccination impacts globally and in the USA, using visualizations to reveal key patterns and insights." format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: true editor: visual code-annotations: hover execute: warning: false --- ```{r} #| label: load-pkgs #| message: false library(tidyverse) ``` ## Introduction Our project leverages a comprehensive COVID-19 dataset from Kaggle to unlock insights into the pandemic's dynamics through advanced data visualization, utilizing the `Shiny app package` for interactive user experiences. By analyzing case counts, recovery rates, mortality, and vaccination data, we aim to reveal trends and correlations across different regions and periods. This endeavor transcends academic boundaries, aiming to provide actionable insights for policymakers and the public. Through our analysis, which includes dynamic and interactive visualizations made possible by Shiny, we aspire to transform complex data into clear, insightful narratives, aiding in the informed response to this global health crisis. ## Dataset ```{r} #| label: load-dataset #| message: false # Dataset Head us_confirmed <- read_csv("./data/us_confirmed.csv") print(head(us_confirmed)) us_deaths <- read_csv("./data/us_deaths.csv") print(head(us_deaths)) populations <- read_csv('./data/populations.csv') head(populations) country_vaccinations_manufacture <- read_csv('./data/country_vaccinations_by_manufacturer.csv') head(country_vaccinations_manufacture) countries <- read_csv('./data/countries.csv') head(countries) ``` ## Description - The datasets employed for this project are derived from an extensive collection on Kaggle, curated by "imdevskp," which meticulously compiles data pertaining to the COVID-19 pandemic from multiple global sources. This collection encompasses a variety of datasets, each tailored to different analytical needs: day-wise global statistics offer a temporal view of the pandemic's progression, country-wise latest data provide a snapshot of the current status in each country, and detailed data grouped by country and date facilitate a comparative analysis over time. - Additionally, there is a dataset specifically focused on the USA, breaking down statistics to the county level, which allows for a more granular examination within the United States. These datasets collectively encompass critical metrics such as case counts, deaths, recoveries, and vaccination figures, equipping researchers with a robust toolkit for a comprehensive and nuanced analysis of the pandemic's multifarious impacts and trends. - To enhance the interactive exploration and visualization of this data, we are utilizing the **shiny package** in R, a powerful tool for creating interactive web applications. The Shiny app will allow users to **dynamically filter**, explore, and visualize the data, offering an immersive and user-driven analysis experience that highlights the depth and complexity of the COVID-19 pandemic data. - The choice of these datasets was motivated by their depth, regular updates, and the potential insights they offer into the global and localized impacts of COVID-19. [**COVID-19 Dataset Source Link**](https://www.kaggle.com/datasets/imdevskp/corona-virus-report?resource=download&select=country_wise_latest.csv) And [**vaccination Dataset Source Link**](https://www.kaggle.com/datasets/gpreda/covid-world-vaccination-progress?select=country_vaccinations.csv) | Variable | Description | |-----------------------|-------------------------------------------------| | Date | Date of the record, crucial for any time-series analysis. | | Country/Region | Name of the country or region, essential for geographic comparisons. | | Confirmed | Total confirmed COVID-19 cases, a primary metric for pandemic assessment. | | Deaths | Total deaths due to COVID-19, important for understanding the fatality rate. | | Recovered | Total recoveries from COVID-19, indicative of the recovery rate. | | Active | Total active COVID-19 cases, key for current pandemic status assessment. | | New cases | New cases on a given date, vital for identifying trends over time. | | New deaths | New deaths on a given date, important for tracking fatality trends. | | New recovered | New recoveries on a given date, useful for monitoring recovery trends. | | Deaths / 100 Cases | Percentage of deaths per 100 confirmed cases, provides context to the death toll. | | Recovered / 100 Cases | Percentage of recoveries per 100 confirmed cases, gives insight into the recovery trend. | | Deaths / 100 Recovered | Percentage of deaths per 100 recovered cases, helps understand the severity among recovered cases. | | total_vaccinations | Total vaccinations administered, crucial for vaccination trend analysis. | | people_vaccinated | Total individuals vaccinated, key for understanding the vaccination reach. | | people_fully_vaccinated | Total individuals fully vaccinated, important for assessing herd immunity potential. | | daily_vaccinations | Number of vaccinations administered per day, essential for daily trend analysis. | | total_vaccinations_per_hundred | Total vaccinations per 100 people, allows for normalized comparison between regions. | | people_vaccinated_per_hundred | People vaccinated per 100 people, useful for comparing vaccination rates. | | people_fully_vaccinated_per_hundred | People fully vaccinated per 100 people, important for understanding comprehensive vaccination progress. | ## Questions ### Question 1 How has the COVID-19 pandemic trended globally over time, and what correlations can be observed between different pandemic metrics (like cases and recoveries)? ### Question 2 What is the relationship between total vaccinations and the reduction in active COVID-19 cases across different Counties in USA? ## Analysis plan ### For the first question: Dataset: We will utilize the `day_wise_data` dataset, which provides daily global statistics on COVID-19. **Focus**: Our refined analysis will focus on daily total cases, recoveries, and deaths to observe the pandemic's evolution over time without losing granularity in the data. **Visualizations:** - **Time-Series Plots**: We'll employ time-series plots to illustrate the daily evolution of total cases, recoveries, and deaths, with separate line graphs for each metric and a combined graph for direct comparison. - **Daily Percentage Changes**: To capture the pandemic's dynamics, we'll calculate and plot daily percentage changes for each metric. Additionally, we'll compute and visualize correlation coefficients to explore the relationships between these metrics over time. **Additional Analysis:** - Rolling Average Analysis: While our primary focus will be on daily data, we'll also use rolling averages to identify broader trends, ensuring we capture both immediate changes and more extended patterns. - Anomaly Detection: We plan to identify and analyze any unusual data points or trends, which could indicate significant pandemic events or turning points. **Interactive Visualization with Shiny:** - To make our analysis more interactive and user-friendly, we will utilize the shiny package in R to develop a web application. This application will allow users to dynamically interact with the data through filters such as year, country name, and regions. Users can select specific parameters to refine the visualizations, providing a customized view that can highlight trends and patterns of interest within the global scope of the COVID-19 pandemic. - This dynamic approach will enable users to explore the data in a more granular and personalized manner, enhancing the analytical value of the visualizations. ### For the second question: **Dataset Utilization**: We will employ the `usa_county_wise_data` dataset, which is rich in localized COVID-19 data for the United States, encompassing each county's case and vaccination figures. And `vaccination` dataset provides the details of vaccian all over the world. **Primary Focus**: We aim to analyze and visualize how COVID-19 cases and vaccination rates have evolved over time across U.S. counties and globally, identifying patterns and correlations. **Visualizations** - **Choropleth Maps**: We will craft choropleth maps to visually represent the density or intensity of COVID-19 cases and vaccination rates across U.S. counties. This will enable a straightforward interpretation of regional variances and trends. - **Bubble Maps**: To enhance our spatial analysis, bubble maps will be constructed where each bubble's size indicates either the number of cases or vaccination rates in counties, offering a dual perspective on the pandemic's magnitude and the vaccination effort. - **Time-Lapse Visualizations**: Implementing time-lapse maps will allow us to observe and analyze the progression of the pandemic and vaccination rollout over time, providing dynamic insights into how the situation has evolved across different regions. ### Plan of Action | Task Name | Status | Assignee | Due | Priority | Summary | |------------|------------|------------|------------|------------|------------| | Proposal Description | Completed | Rajitha Reddy, Omid Zandi | 03 Apr 2024 | Moderate | Concise summary outlining the main idea of a proposal | | Dataset | Completed | Devendran Vemula , Gowtham GopalaKrishnan | 03 Apr 2024 | Moderate | Uploading and Loading the Dataset | | Questions | Completed | Everyone | 03 Apr 2024 | Moderate | Team consolidates findings to generate comprehensive questions aimed at exploring deeper insights | | Analysis for Question 1 | Completed | Devendran Vemula , G Sai Laasya , Rajitha Reddy | 03 Apr 2024 | Moderate | The team works together to analyze the data, combining their skills and viewpoints to gain a thorough understanding of the information. | | Analysis for Question 2 | Completed | Mrunali Yadav, Omid Zandi, Gowtham GopalaKrishnan | 03 Apr 2024 | Moderate | The team conducts analysis for the project through a collaborative approach, utilizing various analytical methods, tools, and expertise to thoroughly examine the data and derive meaningful insights. | | TimeLine and Work Flow | Completed | G Sai Laasya Mrunali Yadav | 03 Apr 2024 | Moderate | The work flow of the project | | Proposal Peer Review | Completed | Everyone | 03 Apr 2024 | Moderate | Reviewing Other Teams | | Working on the Bubble Map | Completed | G Sai Laasya, Rajitha Reddy | 06 Apr 2024 | Moderate | Making a bubble map on the COVID dataset | | Spatial Map | Completed | Omid Zandi | 10 Apr 2024 | Moderate | Creating the spatial maps for the number of cases | | Time Lapse Visualization | Completed | Devendran Vemula | 12 Apr 2024 | Moderate | The progress on the COVID cases or Vaccination | | Working on the Index.qmd file | Incomplete | G Sai Laasya, Rajitha Reddy, Mrunali Yadav | 15 Apr 2024 | High | Writing the analysis, summary part | | Adding aesthethics to ShinyApp | Incomplete | Devendran Vemula, Omid Zandi | 17 Apr 2024 | Moderate | Adding the aesthethics | | Presnetation | Incomplete | Devendran Vemula, Gowtham GopalaKrishnan | 20 Apr 2024 | High | Creating the presentation | ## Repo Organization The following folders comprise the project repository - *.github/:* This folder contains GitHub-related assets such as action workflows and issue templates, which are used to automate tasks and standardize the issue-reporting process. - *\_extra/:* This folder stores "frozen" environment configuration files, detailing the project’s dependencies and setup specifics, enabling consistent reproduction of the development environment. - *\_freeze/:* There are frozen environment files in this directory that contain detailed information on the dependencies and environment setup of the project. - *data/:* Dedicated to project data storage, this folder contains essential files like input data, datasets, and other critical information that the project relies on. - *images/:*This is the repository for all visual content, including charts, diagrams, and screenshots, used throughout the project's documentation and presentations. - *.gitignore:* A configuration file that instructs Git on which files and directories to exclude from version control, ensuring efficient management of the repository. - *README.md:* This is the project's main information document, providing a comprehensive guide to setup, usage, and a general overview of the project's goals and scope. - *\_quarto.yml:* This Quarto configuration file controls how Quarto documents are generated and displayed, allowing for customized output and various document settings. - *about.qmd:* A Quarto Markdown file offering further project context, detailing the project’s objectives, background information, contributor details, and other related content. - *index.qmd:* The main documentation page for the project, created with Quarto. It contains a detailed presentation of the project, including code snippets and visual elements. - *app.R:* As our project involves a Shiny app or a similar interactive application, the app.R file would typically contain the R code needed to build and run the app, including UI elements, server logic, and reactive components. ::: {.column width="50%"} ![](images/Timeli.jpg) ::: ::: ![](images/Timeli.jpg){fig-align="center" width="500"}