Tracing the COVID-19 Trajectory

Using Shiny

The project analyzes COVID-19 trends and vaccination impacts globally and in the USA, using visualizations to reveal key patterns and insights.
Authors
Affiliation

GraphGeeks - Devendran, Omid, G Sai Laasya

Rajitha, Gowtham, Mrunali

library(tidyverse)

Introduction

Our project leverages a comprehensive COVID-19 dataset from Kaggle to unlock insights into the pandemic’s dynamics through advanced data visualization, utilizing the Shiny app package for interactive user experiences. By analyzing case counts, recovery rates, mortality, and vaccination data, we aim to reveal trends and correlations across different regions and periods. This endeavor transcends academic boundaries, aiming to provide actionable insights for policymakers and the public. Through our analysis, which includes dynamic and interactive visualizations made possible by Shiny, we aspire to transform complex data into clear, insightful narratives, aiding in the informed response to this global health crisis.

Dataset

# Dataset Head
us_confirmed <- read_csv("./data/us_confirmed.csv")
print(head(us_confirmed))
# A tibble: 6 × 5
  Admin2  Date        Case `Country/Region` `Province/State`
  <chr>   <date>     <dbl> <chr>            <chr>           
1 Autauga 2020-01-22     0 US               Alabama         
2 Autauga 2020-01-23     0 US               Alabama         
3 Autauga 2020-01-24     0 US               Alabama         
4 Autauga 2020-01-25     0 US               Alabama         
5 Autauga 2020-01-26     0 US               Alabama         
6 Autauga 2020-01-27     0 US               Alabama         
us_deaths <- read_csv("./data/us_deaths.csv")
print(head(us_deaths))
# A tibble: 6 × 5
  Admin2  Date        Case `Country/Region` `Province/State`
  <chr>   <date>     <dbl> <chr>            <chr>           
1 Autauga 2020-01-22     0 US               Alabama         
2 Autauga 2020-01-23     0 US               Alabama         
3 Autauga 2020-01-24     0 US               Alabama         
4 Autauga 2020-01-25     0 US               Alabama         
5 Autauga 2020-01-26     0 US               Alabama         
6 Autauga 2020-01-27     0 US               Alabama         
populations <- read_csv('./data/populations.csv')
head(populations)
# A tibble: 6 × 2
  Country             Population
  <chr>                    <dbl>
1 Afghanistan           39835428
2 Albania                2872934
3 Algeria               44616626
4 Andorra                  77354
5 Angola                33933611
6 Antigua and Barbuda      98728
country_vaccinations_manufacture <-  read_csv('./data/country_vaccinations_by_manufacturer.csv')
head(country_vaccinations_manufacture)
# A tibble: 6 × 4
  location  date       vaccine            total_vaccinations
  <chr>     <date>     <chr>                           <dbl>
1 Argentina 2020-12-29 Moderna                             2
2 Argentina 2020-12-29 Oxford/AstraZeneca                  3
3 Argentina 2020-12-29 Sinopharm/Beijing                   1
4 Argentina 2020-12-29 Sputnik V                       20481
5 Argentina 2020-12-30 Moderna                             2
6 Argentina 2020-12-30 Oxford/AstraZeneca                  3
countries <- read_csv('./data/countries.csv')
head(countries)
# A tibble: 6 × 5
  Date       Country     Confirmed Recovered Deaths
  <date>     <chr>           <dbl>     <dbl>  <dbl>
1 2020-01-22 Afghanistan         0         0      0
2 2020-01-23 Afghanistan         0         0      0
3 2020-01-24 Afghanistan         0         0      0
4 2020-01-25 Afghanistan         0         0      0
5 2020-01-26 Afghanistan         0         0      0
6 2020-01-27 Afghanistan         0         0      0

Description

  • The datasets employed for this project are derived from an extensive collection on Kaggle, curated by “imdevskp,” which meticulously compiles data pertaining to the COVID-19 pandemic from multiple global sources. This collection encompasses a variety of datasets, each tailored to different analytical needs: day-wise global statistics offer a temporal view of the pandemic’s progression, country-wise latest data provide a snapshot of the current status in each country, and detailed data grouped by country and date facilitate a comparative analysis over time.

  • Additionally, there is a dataset specifically focused on the USA, breaking down statistics to the county level, which allows for a more granular examination within the United States. These datasets collectively encompass critical metrics such as case counts, deaths, recoveries, and vaccination figures, equipping researchers with a robust toolkit for a comprehensive and nuanced analysis of the pandemic’s multifarious impacts and trends.

  • To enhance the interactive exploration and visualization of this data, we are utilizing the shiny package in R, a powerful tool for creating interactive web applications. The Shiny app will allow users to dynamically filter, explore, and visualize the data, offering an immersive and user-driven analysis experience that highlights the depth and complexity of the COVID-19 pandemic data.

  • The choice of these datasets was motivated by their depth, regular updates, and the potential insights they offer into the global and localized impacts of COVID-19.

COVID-19 Dataset Source Link And vaccination Dataset Source Link

Variable Description
Date Date of the record, crucial for any time-series analysis.
Country/Region Name of the country or region, essential for geographic comparisons.
Confirmed Total confirmed COVID-19 cases, a primary metric for pandemic assessment.
Deaths Total deaths due to COVID-19, important for understanding the fatality rate.
Recovered Total recoveries from COVID-19, indicative of the recovery rate.
Active Total active COVID-19 cases, key for current pandemic status assessment.
New cases New cases on a given date, vital for identifying trends over time.
New deaths New deaths on a given date, important for tracking fatality trends.
New recovered New recoveries on a given date, useful for monitoring recovery trends.
Deaths / 100 Cases Percentage of deaths per 100 confirmed cases, provides context to the death toll.
Recovered / 100 Cases Percentage of recoveries per 100 confirmed cases, gives insight into the recovery trend.
Deaths / 100 Recovered Percentage of deaths per 100 recovered cases, helps understand the severity among recovered cases.
total_vaccinations Total vaccinations administered, crucial for vaccination trend analysis.
people_vaccinated Total individuals vaccinated, key for understanding the vaccination reach.
people_fully_vaccinated Total individuals fully vaccinated, important for assessing herd immunity potential.
daily_vaccinations Number of vaccinations administered per day, essential for daily trend analysis.
total_vaccinations_per_hundred Total vaccinations per 100 people, allows for normalized comparison between regions.
people_vaccinated_per_hundred People vaccinated per 100 people, useful for comparing vaccination rates.
people_fully_vaccinated_per_hundred People fully vaccinated per 100 people, important for understanding comprehensive vaccination progress.

Questions

Question 1

How has the COVID-19 pandemic trended globally over time, and what correlations can be observed between different pandemic metrics (like cases and recoveries)?

Question 2

What is the relationship between total vaccinations and the reduction in active COVID-19 cases across different Counties in USA?

Analysis plan

For the first question:

Dataset: We will utilize the day_wise_data dataset, which provides daily global statistics on COVID-19.

Focus: Our refined analysis will focus on daily total cases, recoveries, and deaths to observe the pandemic’s evolution over time without losing granularity in the data.

Visualizations:

  • Time-Series Plots: We’ll employ time-series plots to illustrate the daily evolution of total cases, recoveries, and deaths, with separate line graphs for each metric and a combined graph for direct comparison.

  • Daily Percentage Changes: To capture the pandemic’s dynamics, we’ll calculate and plot daily percentage changes for each metric. Additionally, we’ll compute and visualize correlation coefficients to explore the relationships between these metrics over time.

Additional Analysis:

  • Rolling Average Analysis: While our primary focus will be on daily data, we’ll also use rolling averages to identify broader trends, ensuring we capture both immediate changes and more extended patterns.

  • Anomaly Detection: We plan to identify and analyze any unusual data points or trends, which could indicate significant pandemic events or turning points.

Interactive Visualization with Shiny:

  • To make our analysis more interactive and user-friendly, we will utilize the shiny package in R to develop a web application. This application will allow users to dynamically interact with the data through filters such as year, country name, and regions. Users can select specific parameters to refine the visualizations, providing a customized view that can highlight trends and patterns of interest within the global scope of the COVID-19 pandemic.

  • This dynamic approach will enable users to explore the data in a more granular and personalized manner, enhancing the analytical value of the visualizations.

For the second question:

Dataset Utilization: We will employ the usa_county_wise_data dataset, which is rich in localized COVID-19 data for the United States, encompassing each county’s case and vaccination figures. And vaccination dataset provides the details of vaccian all over the world.

Primary Focus: We aim to analyze and visualize how COVID-19 cases and vaccination rates have evolved over time across U.S. counties and globally, identifying patterns and correlations.

Visualizations

  • Choropleth Maps: We will craft choropleth maps to visually represent the density or intensity of COVID-19 cases and vaccination rates across U.S. counties. This will enable a straightforward interpretation of regional variances and trends.

  • Bubble Maps: To enhance our spatial analysis, bubble maps will be constructed where each bubble’s size indicates either the number of cases or vaccination rates in counties, offering a dual perspective on the pandemic’s magnitude and the vaccination effort.

  • Time-Lapse Visualizations: Implementing time-lapse maps will allow us to observe and analyze the progression of the pandemic and vaccination rollout over time, providing dynamic insights into how the situation has evolved across different regions.

Plan of Action

Task Name Status Assignee Due Priority Summary
Proposal Description Completed Rajitha Reddy, Omid Zandi 03 Apr 2024 Moderate Concise summary outlining the main idea of a proposal
Dataset Completed Devendran Vemula , Gowtham GopalaKrishnan 03 Apr 2024 Moderate Uploading and Loading the Dataset
Questions Completed Everyone 03 Apr 2024 Moderate Team consolidates findings to generate comprehensive questions aimed at exploring deeper insights
Analysis for Question 1 Completed Devendran Vemula , G Sai Laasya , Rajitha Reddy 03 Apr 2024 Moderate The team works together to analyze the data, combining their skills and viewpoints to gain a thorough understanding of the information.
Analysis for Question 2 Completed Mrunali Yadav, Omid Zandi, Gowtham GopalaKrishnan 03 Apr 2024 Moderate The team conducts analysis for the project through a collaborative approach, utilizing various analytical methods, tools, and expertise to thoroughly examine the data and derive meaningful insights.
TimeLine and Work Flow Completed G Sai Laasya Mrunali Yadav 03 Apr 2024 Moderate The work flow of the project
Proposal Peer Review Completed Everyone 03 Apr 2024 Moderate Reviewing Other Teams
Working on the Bubble Map Completed G Sai Laasya, Rajitha Reddy 06 Apr 2024 Moderate Making a bubble map on the COVID dataset
Spatial Map Completed Omid Zandi 10 Apr 2024 Moderate Creating the spatial maps for the number of cases
Time Lapse Visualization Completed Devendran Vemula 12 Apr 2024 Moderate The progress on the COVID cases or Vaccination
Working on the Index.qmd file Incomplete G Sai Laasya, Rajitha Reddy, Mrunali Yadav 15 Apr 2024 High Writing the analysis, summary part
Adding aesthethics to ShinyApp Incomplete Devendran Vemula, Omid Zandi 17 Apr 2024 Moderate Adding the aesthethics
Presnetation Incomplete Devendran Vemula, Gowtham GopalaKrishnan 20 Apr 2024 High Creating the presentation

Repo Organization

The following folders comprise the project repository

  • .github/: This folder contains GitHub-related assets such as action workflows and issue templates, which are used to automate tasks and standardize the issue-reporting process.
  • _extra/: This folder stores “frozen” environment configuration files, detailing the project’s dependencies and setup specifics, enabling consistent reproduction of the development environment.
  • _freeze/: There are frozen environment files in this directory that contain detailed information on the dependencies and environment setup of the project.
  • data/: Dedicated to project data storage, this folder contains essential files like input data, datasets, and other critical information that the project relies on.
  • images/:This is the repository for all visual content, including charts, diagrams, and screenshots, used throughout the project’s documentation and presentations.
  • .gitignore: A configuration file that instructs Git on which files and directories to exclude from version control, ensuring efficient management of the repository.
  • README.md: This is the project’s main information document, providing a comprehensive guide to setup, usage, and a general overview of the project’s goals and scope.
  • _quarto.yml: This Quarto configuration file controls how Quarto documents are generated and displayed, allowing for customized output and various document settings.
  • about.qmd: A Quarto Markdown file offering further project context, detailing the project’s objectives, background information, contributor details, and other related content.
  • index.qmd: The main documentation page for the project, created with Quarto. It contains a detailed presentation of the project, including code snippets and visual elements.
  • app.R: As our project involves a Shiny app or a similar interactive application, the app.R file would typically contain the R code needed to build and run the app, including UI elements, server logic, and reactive components.

:::