Cancer continues to be a significant challenge for global health, impacting mortality rates and quality of life across the world. In 2022 alone, an estimated 9.7 million cancer-related deaths occurred, accompanied by 20 million new cases worldwide. Despite remarkable strides in understanding and managing cancer, the relentless rise in global mortality rates persists unabated. Our project endeavors to tap into the extensive datasets available on cancer from https://ourworldindata.org/cancer, curated by Max Roser and Hannah Ritchie. By integrating diverse datasets, we seek to unravel the intricate patterns underlying cancer distribution and its multifaceted impacts on diverse populations.
Reasons for Choosing the Dataset
The datasets sourced from https://ourworldindata.org/cancer offer comprehensive and reliable information on various aspects of cancer, including mortality rates, types, demographics, and geographic variations.
Analyzing these datasets can provide valuable insights into the prevalence, distribution, and risk factors associated with different types of cancer.
The availability of multiple datasets allows for a holistic analysis of cancer-related trends and patterns, contributing to a better understanding of the disease’s global burden.
Datasets:
The project utilizes the following CSV files obtained from the provided source, each containing information related to cancer statistics. Below is an overview of the data files:
01_annual-number-of-deaths-by-cause: This file contains information on the annual number of deaths attributed to various causes, including cancer and encompasses data from 6,840 records across 35 different health conditions, including neoplasms. It likely includes data on the total number of deaths per year due to cancer, as well as deaths attributed to specific cancer types.
02_total-cancer-deaths-by-type: With 6841 records, This file provides details on the total number of cancer-related deaths categorized by cancer type. It may include information such as the number of deaths caused by lung cancer, breast cancer, prostate cancer, etc.
04_cancer-death-rates-by-age: Datasets, with 5,472 records, This dataset presents cancer death rates categorized by age groups. It offers insights into how cancer mortality rates vary across different age demographics.
05_share-of-population-with-cancer-crude: This file contains data on the crude prevalence of cancer within the population. It likely includes the percentage or proportion of the population diagnosed with cancer, without adjusting for age or other factors.
06_share-of-population-with-cancer-types: This dataset provides information on the prevalence of different cancer types within the population. It may include the percentage of individuals diagnosed with specific cancer types such as lung cancer, colorectal cancer, etc.
07_number-of-people-with-cancer-by-age: This file contains data on the number of individuals diagnosed with cancer categorized by age groups. It offers insights into the distribution of cancer cases across different age demographics.
11_cancer-deaths-rate-and-age-standardized-rate-index: This dataset likely includes information on cancer death rates and age-standardized rate indexes. It may provide insights into trends in cancer mortality rates over time and across different regions, adjusted for age variations in the population.
Chosen for their depth and breadth, these datasets collectively provide a detailed picture of cancer’s impact globally. They serve as a valuable resource for public health analysis, informing policy-making, and guiding research into effective cancer prevention and treatment strategies. The data likely originates from reputable health data collection entities, including national health ministries and global health organizations, reflecting a standardized and authoritative approach to health data compilation across countries and years.
What is the contribution of cancer in global mortality?
Question 2
which age groups are most affected by cancer
Question 3
which types of cancers are most prevalent worldwide.
Analysis plan
Question 1
we will sumarize the total number of deaths from other causes and the total number of deaths from cancers over years and Calculate the percentage contribution of cancer-related deaths to total deaths. We will use ggplot2 and plotly to create interactive plot to users where they can select years and countries
Question 2
We will combine the data over years to find the total or average death rate for each age group. Compare these rates across age groups to identify which ones have higher rates of cancer-related deaths. We will use bar charts to show cancer deaths by age groups and we will use ggplot2 and plotly to create interactive plot to users where they can select type of cancer they can select and years
Question 3
we will summarize the number of cases per 100 people for each cancer type. Rank these cancer types based on prevalence to identify the top ones and we will use horizontal bar chart to show different types of cancers and we will use ggplot and plotly to create interactive plot to the users where they can select different years and also types of cancer to compare
Timeline
Task
Deadline
Ownership
Status
Submit project name
20-Mar-2024
All
Complete
Choose project 2 dataset
27-Mar-2024
All
Complete
Proposal section -Introduction
02-Apr-2024
Monica
Complete
Proposal section -Why this data
02-Apr-2024
Rohit V K
Complete
Proposal section -Analysis Goal
02-Apr-2024
SRK
Complete
Proposal section - Data Strucure
02-Apr-2024
Ajay
Complete
Proposal section - Data Wrangling
02-Apr-2024
Surya
Complete
Proposal section - Project schedule / Time line
02-Apr-2024
Miki
Complete
Submit proposal draft for peer review
03-Apr-2024
All
Implement Peer review comments
06-Apr-2024
All
Resubmit Project Proposal for Final Review by Professor
---title: "Global Trends in Cancer Mortality: A Shiny App Exploration (1990-2019)"subtitle: "Proposal"author: - name: "VizMasters - Siva Rohit, Rohit Vatsava, Surya Vardhan, Monica Kommareddy, Miki, Ajay" affiliations: - name: "School of Information, University of Arizona"description: "Project description"format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: trueeditor: visualcode-annotations: hoverexecute: warning: false---```{r, echo = FALSE}#| label: load-pkgs#| message: falselibrary(tidyverse)library(janitor)library(kableExtra)```# Project OverviewCancer continues to be a significant challenge for global health, impacting mortality rates and quality of life across the world. In 2022 alone, an estimated 9.7 million cancer-related deaths occurred, accompanied by 20 million new cases worldwide. Despite remarkable strides in understanding and managing cancer, the relentless rise in global mortality rates persists unabated. Our project endeavors to tap into the extensive datasets available on cancer from <https://ourworldindata.org/cancer>, curated by Max Roser and Hannah Ritchie. By integrating diverse datasets, we seek to unravel the intricate patterns underlying cancer distribution and its multifaceted impacts on diverse populations.# Reasons for Choosing the Dataset- The datasets sourced from <https://ourworldindata.org/cancer> offer comprehensive and reliable information on various aspects of cancer, including mortality rates, types, demographics, and geographic variations.- Analyzing these datasets can provide valuable insights into the prevalence, distribution, and risk factors associated with different types of cancer.- The availability of multiple datasets allows for a holistic analysis of cancer-related trends and patterns, contributing to a better understanding of the disease's global burden.# Datasets:The project utilizes the following CSV files obtained from the provided source, each containing information related to cancer statistics. Below is an overview of the data files:**01_annual-number-of-deaths-by-cause:** This file contains information on the annual number of deaths attributed to various causes, including cancer and encompasses data from 6,840 records across 35 different health conditions, including neoplasms. It likely includes data on the total number of deaths per year due to cancer, as well as deaths attributed to specific cancer types.**02_total-cancer-deaths-by-type:** With 6841 records, This file provides details on the total number of cancer-related deaths categorized by cancer type. It may include information such as the number of deaths caused by lung cancer, breast cancer, prostate cancer, etc.**04_cancer-death-rates-by-age:** Datasets, with 5,472 records, This dataset presents cancer death rates categorized by age groups. It offers insights into how cancer mortality rates vary across different age demographics.**05_share-of-population-with-cancer-crude**: This file contains data on the crude prevalence of cancer within the population. It likely includes the percentage or proportion of the population diagnosed with cancer, without adjusting for age or other factors.**06_share-of-population-with-cancer-types:** This dataset provides information on the prevalence of different cancer types within the population. It may include the percentage of individuals diagnosed with specific cancer types such as lung cancer, colorectal cancer, etc.**07_number-of-people-with-cancer-by-age:** This file contains data on the number of individuals diagnosed with cancer categorized by age groups. It offers insights into the distribution of cancer cases across different age demographics.**11_cancer-deaths-rate-and-age-standardized-rate-index:** This dataset likely includes information on cancer death rates and age-standardized rate indexes. It may provide insights into trends in cancer mortality rates over time and across different regions, adjusted for age variations in the population.```{r}#| label: load-dataset#| message: false```Chosen for their depth and breadth, these datasets collectively provide a detailed picture of cancer's impact globally. They serve as a valuable resource for public health analysis, informing policy-making, and guiding research into effective cancer prevention and treatment strategies. The data likely originates from reputable health data collection entities, including national health ministries and global health organizations, reflecting a standardized and authoritative approach to health data compilation across countries and years.# Data Wrangling and Displaying the Data```{r, echo = FALSE}#| label: Data wrangling#| message: false# Loading the datasetsannual_deaths_by_cause <- read_csv("./data/01_annual-number-of-deaths-by-cause.csv")cancer_deaths_by_type <- read_csv("./data/02_total-cancer-deaths-by-type.csv")cancer_deaths_by_age <- read_csv("./data/04_cancer-death-rates-by-age.csv")population_with_cancer_crude <- read_csv("./data/05_share-of-population-with-cancer-crude.csv")population_with_cancer_types <- read_csv("./data/06_share-of-population-with-cancer-types.csv")population_with_cancer_by_age <- read_csv("./data/07_number-of-people-with-cancer-by-age.csv")standardized_death_age_rate_index <- read_csv("./data/11_cancer-deaths-rate-and-age-standardized-rate-index.csv")# Cleaning them using janitor packageannual_deaths_by_cause <- annual_deaths_by_cause |> clean_names()cancer_deaths_by_type <- cancer_deaths_by_type |> clean_names()cancer_deaths_by_age <- cancer_deaths_by_age |> clean_names()population_with_cancer_crude <- population_with_cancer_crude |> clean_names()population_with_cancer_types <- population_with_cancer_types |> clean_names()population_with_cancer_by_age <- population_with_cancer_by_age |> clean_names()standardized_death_age_rate_index <- standardized_death_age_rate_index |> clean_names()# Displaying the datasetsglimpse(annual_deaths_by_cause)glimpse(cancer_deaths_by_type)glimpse(cancer_deaths_by_age)glimpse(population_with_cancer_crude)glimpse(population_with_cancer_types)glimpse(population_with_cancer_by_age)glimpse(standardized_death_age_rate_index)```# Questions to Answer### Question 1What is the contribution of cancer in global mortality?### Question 2which age groups are most affected by cancer### Question 3which types of cancers are most prevalent worldwide.# Analysis plan### Question 1we will sumarize the total number of deaths from other causes and the total number of deaths from cancers over years and Calculate the percentage contribution of cancer-related deaths to total deaths. We will use `ggplot2` and `plotly` to create interactive plot to users where they can select years and countries### Question 2We will combine the data over years to find the total or average death rate for each age group. Compare these rates across age groups to identify which ones have higher rates of cancer-related deaths. We will use bar charts to show cancer deaths by age groups and we will use `ggplot2` and `plotly` to create interactive plot to users where they can select type of cancer they can select and years### Question 3we will summarize the number of cases per 100 people for each cancer type. Rank these cancer types based on prevalence to identify the top ones and we will use horizontal bar chart to show different types of cancers and we will use `ggplot` and `plotly` to create interactive plot to the users where they can select different years and also types of cancer to compare# Timeline```{r}#| label: timeline-table#| echo: false#| message: false# Create vectors to represent the taskstasks <-c("Submit project name","Choose project 2 dataset","Proposal section -Introduction","Proposal section -Why this data","Proposal section -Analysis Goal","Proposal section - Data Strucure","Proposal section - Data Wrangling","Proposal section - Project schedule / Time line","Submit proposal draft for peer review ","Implement Peer review comments ","Resubmit Project Proposal for Final Review by Professor","Update About section of Project website","Incorporate Feedback from Professor","Publish project 2 website","Code - Data cleasing","Code - Data Wrangling","Code - Plot creation ","Code - Interpretation","Internal project review","Peer Code review","First draft of presentation","Final draft of presentation","Final submission")# Create vector to represent the deadlineturn_in <-c("20-Mar-2024","27-Mar-2024","02-Apr-2024","02-Apr-2024","02-Apr-2024","02-Apr-2024","02-Apr-2024","02-Apr-2024","03-Apr-2024","06-Apr-2024","08-Apr-2024","14-Apr-2024","14-Apr-2024","15-Apr-2024","23-Apr-2024","23-Apr-2024","23-Apr-2024","23-Apr-2024","26-Apr-2024","29-Apr-2024","03-May-2024","06-May-2024","06-May-2024" )# Create vector to represent ownershipmember <-c("All","All","Monica","Rohit V K","SRK","Ajay","Surya","Miki","All","All","All","All","All","SRK","Surya & Ajay","Rohit V K & Monica","Miki & SRK","All","All","Peer Team","All","All","All")# Create vector to represent statusstatus <-c("Complete","Complete","Complete","Complete","Complete","Complete","Complete","Complete","","","","","","","","","","","","","","","")# Bind the columnstable_3 <-as.table(cbind(tasks, turn_in, member, status))colnames(table_3) <-c("Task", "Deadline", "Ownership", "Status")row.names(table_3) <-seq(1:nrow(table_3))# Generate a nice table with kabletimeline_table <-kable(table_3, "html") %>%kable_styling(full_width =FALSE)# Display the tabletimeline_table```<!-- # Implications and Recommendations --><!-- # Conclusion --><!-- ## Additional Points : --># Citations### Source of Data :Global cancer burden growing, amidst mounting need for services. (2024, February 1). "<https://ourworldindata.org/cancer>"