Meteoric Fall: a comet-ment to data

Proposal

Data visualization

Developing a Shiny app to explore and analyze meteorite landings data, unraveling patterns and insights into meteor impact distribution and historical events

Author

Affiliation

Viz Wizards: Nick Ferrante, Jeremiah Gaiser, Tanya Evita George,
Mrunal Jadhav, Jasdeep Singh Jhajj, Gillian McGinnis, Agastya Deshraju

School of Information, University of Arizona

Project Goal

The Goal of the Project is to make use of Shiny app on the Meteorite Landings dataset to create an interactive platform that allows users to explore and analyze meteorite landings data through dynamic visualizations.

Introduction

The proposed project aims to create an interactive platform using Shiny app to explore and analyze meteorite landings data. This platform will allow users to delve into the data through dynamic visualizations, enhancing their understanding of meteorite landings on Earth. The project seeks to answer key questions about the distribution of meteor impacts across the earth, the continents with the most accumulated total mass of meteors, and the relationship between historical events and the observations and discoveries of meteors. By analyzing this data, we hope to explain the reasons behind certain historical events and predict the most vulnerable places for future meteorite crashes. Such insights could significantly contribute to scientific research in fields such as astrology, data analysis, and geology.

Dataset

Load the Meteorite Landings data

# Loading the data
meteorite_data <- read.csv("data/Meteorite_Landings.csv")

metadata_meteorite <- tibble(
  Column = names(meteorite_data),
  DataType = sapply(meteorite_data, class),
  SampleData = sapply(meteorite_data, function(column) {
    first_non_na <- column[!is.na(column)][1]
    if (is.numeric(first_non_na)) {
      return(format(first_non_na, nsmall = 2))
    } else {
      return(as.character(first_non_na))
    }
  })
)

Dataset Description and Motivation

The data set Meteorite_Landings.csv, which will be used for this project has been sourced from NASA’s Open Data Portal. It contains the complete set of information of all known meteorite landings on Earth. The dimensions of the data set include 10 variables total which consist of 6 numerical and 4 categorical variables, across 45716 observations. The reason why this data set was chosen was because of the opportunity to scientifically explore all the factors that go into studying how and why meteorites land on specific parts of the Earth. Using this data set, the project could explain the reasons behind certain historical events and predict the most vulnerable places for meteorite crashes which could help in their scientific study and research in fields such as astrology, data analysis and geology.

Metadata

The data includes 10 variables- 6 numerical and 4 categorical. The metadata is shown down below, with a sample of the individual variable as well. The descriptions of what each variable represents is shown as well.

Column	DataType	SampleData
name	character	Aachen
id	integer	1
nametype	character	Valid
recclass	character	L5
mass..g.	numeric	21.00
fall	character	Fell
year	integer	1880
reclat	numeric	50.775
reclong	numeric	6.08333
GeoLocation	character	(50.775, 6.08333)

Description of the metadata

name: The name of the meteorite (usually the location of the meteorite landing)
id: The unique identifier number assigned to the meteorite.
nametype: Can be one of two different categories:
- valid: A regular meteorite.
- relict: A meteorite that has been degraded over the years due to weather.
recclass: The recommended class of the meteorite; that is classified based on certain characteristics of the meteorite such as, chemical, isotopic, and mineralogical properties.
mass(g): The mass of the meteorite, given in grams.
fall: Can be one of two different categories:
- fell: Classified as fell when the fall of the meteorite is observed.
- found: Classified as found when the fall of the meteorite is not observed, but the meteorite was found later.
reclat: The latitude of the meteorite’s landing.
reclong: The longitude of the meteorite’s landing.
GeoLocation: Combination of the latitude and longitude of the meteorite’s landing.

Questions

Question 1- What does the distribution of meteor impacts look like across the earth?

Which continents have accumulated the most total mass of meteors?

Question 2- How do historical events relate to the observations and discoveries of meteors?

Question 3- Are trends observed in frequency, type, or location of meteors, relateded to known celestial events?

Data Cleaning

Variables	Data Type	Number of missing values	Percentage of missing values	Number of unique values	Rate of unique values
name	character	0	0.0000000	45716	1.0000000
id	integer	0	0.0000000	45716	1.0000000
nametype	character	0	0.0000000	2	0.0000437
recclass	character	0	0.0000000	466	0.0101934
mass..g.	numeric	131	0.2865518	12577	0.2751116
fall	character	0	0.0000000	2	0.0000437
year	integer	291	0.6365386	266	0.0058185
reclat	numeric	7315	16.0009625	12739	0.2786552
reclong	numeric	7315	16.0009625	14641	0.3202599
GeoLocation	character	0	0.0000000	17101	0.3740703

The above table shows the number of missing values from each column, as well as the number of unique values present in the dataset.

The first step that will be taken to clean the data, includes filtering out all the values in the year column. All years before ‘860 CE’ and after ‘2016 CE’ are incorrect and can be discarded.
Then, all longitude and latitude values, from the reclong and reclat columns respectively, greater than ‘180°’ and less than ‘-180°’ can be filtered out.
All the reportings at (0°N, 0°E) can be treated as ‘NULL’ values as these reportings didn’t have an exact location when they were sighted, or they were found in areas such as Antartica where reporting on the exact locations would present some challenges.
All ‘NULL’ values would be ignored for Question 1, which deals with plotting the locations of each reporting on an interactive map.
These values could however prove valuable for Questions 2 & 3 and will therefore be left in the dataset for them.

Analysis plan

Approach for Question 1

To display the distribution of meteor impacts on earth we are going to implement an interactive leaflet map allowing the user to navigate and explore specific regions of the world and the meteors in those regions. We will use the reclat and reclong variables as the latitude and longitude to plot the locations of these meteors. The interactive aspect of the map will allow users to explore specific observations including the year the meteor fell or the year that it was found, the mass of the meteor, and this could also potentially allow for information regarding the classification of the meteor’s characteristics. The interactive portion of the plot will be implemented using the variables fall, year, and recclass.

To answer the question of which continents have accumulated the most total mass of meteors, we will use the mass..g. variable along with the frequency of meteor observations in each continent to determine the total mass to then create a density map displaying these densities for each continent.

Approach for Question 2

To answer the question, “How do historical events relate to the observations and discoveries of meteors?”, we are going to create an animated line graph of the amount of meteor observations per year and add annotations representing major historical events, thus allowing for the analysis of the frequency of meteor observations and their relation to historical events. The animation will display the progression of meteor observations over time with relevant historical events appearing as they are reached on the timeline (x axis). This investigation will allow us to see if there were any historical events that sparked increased activity in the study of meteors, such as an increase in meteor findings after the space race. The main variables here will be the year variable, a meteor frequency variable that will be created through data wrangling, along with an external reference to historical events.

Approach for Question 3

Meteors are stray particles and debris ejected from neighboring celestial bodies within our solar system. We anticipate that patterns in meteor falls will align with Earth’s proximity to astronomical objects like comets, planets, and the moon.

Our goal is to pinpoint statistically significant patterns in meteor data and link these to specific celestial occurrences. For instance, the annual Lyrids meteor shower is the consequence of Earth’s passing through the debris path of Comet Thatcher. As a result, we observe more frequent meteor impacts on the surface of the moon.

To identify such events, we will analyze meteor data over time to spot any activity clusters that significantly vary from average observations. Next, we will consult external data sources, such as NASA.gov, to check if these periods of heightened activity coincide with known astronomical events, such as annual meteor showers or close encounters with other planets.

We plan to incorporate these findings into our meteor impact visualizations. This could involve color-coding individual meteors to show which celestial event triggered them, or adding a basic map of the solar system to show Earth’s position relative to these events.

By embedding this information into our visualizations, we aim to provide a deeper understanding of Earth’s role in the solar system and illustrate how meteor falls are interconnected with the dynamic cycles of space.

Timeline - Weekly Plan of Attack

Repository organization

Main/root: All relevant Quarto markdown code, YAML settings, and folders.
- .gitignore: List of files or file types to not track via version control.
- README.md: Brief project description, viewable for the repository landing page.
- *.qmd: Quarto markdown files for the publication of the GitHub website, including the main landing page to contain the final report (index.qmd), the project proposal (proposal.qmd), a list of project members (about.qmd), and the slidedeck for the final presentation (presentation.qmd).
/data folder:
- Meteorite_Landings.csv: Raw data file
- README.md: Data dictionary for meteors.csv
/images folder: Any images to be added for the presentation.qmd or GitHub website.
/shiny folder:
- app.R: All R code for both the Shiny UI and server. To be independent from the main folder to allow for publication on a separate Shiny server.
- README.md: Description of the folder
Other:
- /.github folder: For automated workflows.
- /_extra folder: For any drafting work or notes not to be included in the final project.
- /_freeze folder: Automatically-generated GitHub website publication code.

--- title: "Meteoric Fall: a comet-ment to data" subtitle: "Proposal" author: - name: "**Viz Wizards**: Nick Ferrante, Jeremiah Gaiser, Tanya Evita George, Mrunal Jadhav, Jasdeep Singh Jhajj, Gillian McGinnis, Agastya Deshraju" affiliations: - name: "School of Information, University of Arizona" description: "Developing a Shiny app to explore and analyze meteorite landings data, unraveling patterns and insights into meteor impact distribution and historical events" format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: true editor: visual code-annotations: hover categories: - Data visualization execute: warning: false message: false code-fold: true --- ```{r load-pkgs} #| label: load-pkgs #| message: false #| echo: false #| code-summary: "Install Packages" if (!require("pacman")) install.packages("pacman") # Loading the packages pacman::p_load(tidyverse, kableExtra, DT) ``` ## Project Goal The Goal of the Project is to make use of Shiny app on the Meteorite Landings dataset to create an interactive platform that allows users to explore and analyze meteorite landings data through dynamic visualizations. ## Introduction The proposed project aims to create an interactive platform using Shiny app to explore and analyze meteorite landings data. This platform will allow users to delve into the data through dynamic visualizations, enhancing their understanding of meteorite landings on Earth. The project seeks to answer key questions about the distribution of meteor impacts across the earth, the continents with the most accumulated total mass of meteors, and the relationship between historical events and the observations and discoveries of meteors. By analyzing this data, we hope to explain the reasons behind certain historical events and predict the most vulnerable places for future meteorite crashes. Such insights could significantly contribute to scientific research in fields such as astrology, data analysis, and geology. ## Dataset ```{r load-dataset} #| label: load-dataset #| code-summary: "Load the Meteorite Landings data" # Loading the data meteorite_data <- read.csv("data/Meteorite_Landings.csv") metadata_meteorite <- tibble( Column = names(meteorite_data), DataType = sapply(meteorite_data, class), SampleData = sapply(meteorite_data, function(column) { first_non_na <- column[!is.na(column)][1] if (is.numeric(first_non_na)) { return(format(first_non_na, nsmall = 2)) } else { return(as.character(first_non_na)) } }) ) ``` ## Dataset Description and Motivation The data set `Meteorite_Landings.csv`, which will be used for this project has been sourced from [NASA's Open Data Portal](https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/about_data). It contains the complete set of information of all known meteorite landings on Earth. The dimensions of the data set include 10 variables total which consist of 6 numerical and 4 categorical variables, across `r nrow(meteorite_data)` observations. The reason why this data set was chosen was because of the opportunity to scientifically explore all the factors that go into studying how and why meteorites land on specific parts of the Earth. Using this data set, the project could explain the reasons behind certain historical events and predict the most vulnerable places for meteorite crashes which could help in their scientific study and research in fields such as astrology, data analysis and geology. ### Metadata The data includes 10 variables- 6 numerical and 4 categorical. The metadata is shown down below, with a sample of the individual variable as well. The descriptions of what each variable represents is shown as well. ```{r dataset-explore} #| label: dataset-explore #| code-summary: "Exploration of Dataset" #| echo: false metadata_meteorite %>% kable("html") %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>% column_spec(1, bold = T) ``` **Description of the metadata** - **name**: The name of the meteorite (usually the location of the meteorite landing) - **id**: The unique identifier number assigned to the meteorite. - **nametype**: Can be one of two different categories: - *valid*: A regular meteorite. - *relict*: A meteorite that has been degraded over the years due to weather. - **recclass**: The recommended class of the meteorite; that is classified based on certain characteristics of the meteorite such as, chemical, isotopic, and mineralogical properties. - **mass(g)**: The mass of the meteorite, given in grams. - **fall**: Can be one of two different categories: - *fell*: Classified as fell when the fall of the meteorite is observed. - *found*: Classified as found when the fall of the meteorite is not observed, but the meteorite was found later. - **reclat**: The latitude of the meteorite's landing. - **reclong**: The longitude of the meteorite's landing. - **GeoLocation**: Combination of the latitude and longitude of the meteorite's landing. ## Questions **Question 1**- What does the distribution of meteor impacts look like across the earth? - Which continents have accumulated the most total mass of meteors? **Question 2**- How do historical events relate to the observations and discoveries of meteors? **Question 3**- Are trends observed in frequency, type, or location of meteors, relateded to known celestial events? ## Data Cleaning ```{r} #| echo: false #| warning: false investigate <- dlookr::diagnose(meteorite_data) colnames(investigate) <- c("Variables", "Data Type", "Number of missing values", "Percentage of missing values", "Number of unique values", "Rate of unique values") table <- kable(investigate, "html") |> kable_styling(full_width = FALSE) table ``` The above table shows the number of missing values from each column, as well as the number of unique values present in the dataset. - The first step that will be taken to clean the data, includes filtering out all the values in the `year` column. All years before '860 CE' and after '2016 CE' are incorrect and can be discarded. - Then, all longitude and latitude values, from the `reclong` and `reclat` columns respectively, greater than '180°' and less than '-180°' can be filtered out. - All the reportings at (0°N, 0°E) can be treated as 'NULL' values as these reportings didn't have an exact location when they were sighted, or they were found in areas such as Antartica where reporting on the exact locations would present some challenges. - All 'NULL' values would be ignored for Question 1, which deals with plotting the locations of each reporting on an interactive map. - These values could however prove valuable for Questions 2 & 3 and will therefore be left in the dataset for them. ## Analysis plan ### Approach for Question 1 To display the distribution of meteor impacts on earth we are going to implement an interactive leaflet map allowing the user to navigate and explore specific regions of the world and the meteors in those regions. We will use the `reclat` and `reclong` variables as the latitude and longitude to plot the locations of these meteors. The interactive aspect of the map will allow users to explore specific observations including the year the meteor fell or the year that it was found, the mass of the meteor, and this could also potentially allow for information regarding the classification of the meteor's characteristics. The interactive portion of the plot will be implemented using the variables `fall`, `year`, and `recclass`. To answer the question of which continents have accumulated the most total mass of meteors, we will use the `mass..g.` variable along with the frequency of meteor observations in each continent to determine the total mass to then create a density map displaying these densities for each continent. ### Approach for Question 2 To answer the question, "How do historical events relate to the observations and discoveries of meteors?", we are going to create an animated line graph of the amount of meteor observations per year and add annotations representing major historical events, thus allowing for the analysis of the frequency of meteor observations and their relation to historical events. The animation will display the progression of meteor observations over time with relevant historical events appearing as they are reached on the timeline (x axis). This investigation will allow us to see if there were any historical events that sparked increased activity in the study of meteors, such as an increase in meteor findings after the space race. The main variables here will be the `year` variable, a meteor frequency variable that will be created through data wrangling, along with an external reference to historical events. ### Approach for Question 3 Meteors are stray particles and debris ejected from neighboring celestial bodies within our solar system. We anticipate that patterns in meteor falls will align with Earth's proximity to astronomical objects like comets, planets, and the moon. Our goal is to pinpoint statistically significant patterns in meteor data and link these to specific celestial occurrences. For instance, the annual Lyrids meteor shower is the consequence of Earth's passing through the debris path of Comet Thatcher. As a result, we observe more frequent meteor impacts on the surface of the moon. To identify such events, we will analyze meteor data over time to spot any activity clusters that significantly vary from average observations. Next, we will consult external data sources, such as NASA.gov, to check if these periods of heightened activity coincide with known astronomical events, such as annual meteor showers or close encounters with other planets. We plan to incorporate these findings into our meteor impact visualizations. This could involve color-coding individual meteors to show which celestial event triggered them, or adding a basic map of the solar system to show Earth's position relative to these events. By embedding this information into our visualizations, we aim to provide a deeper understanding of Earth's role in the solar system and illustrate how meteor falls are interconnected with the dynamic cycles of space. ## Timeline - Weekly Plan of Attack ```{r timeline} #| label: timeline #| code-summary: "Project Timeline" #| echo: false # Create the df for timeline tasks task_data <- data.frame( Task_Name = c("Dataset exploration", "Introduction & Goals description", "Dataset Description & Motivation", "Research & Analysis Plan", "Timeline & workflow", "Repository organization", "Peer Review1", "Peer Review2", "Finalize Proposal", "Initial Data cleaning", "Shiny app Setup", "Plot Development for Q1", "Plot Development for Q2", "Plot Development for Q3", "Shiny app Frontend", "Writeup content for index", "Presentation Writeup", "Add Plot 1 to presentation", "Add Plot 2 to presentation", "Add Plot 3 to presentation"), Status = c("Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done"), Assignee = c("Everyone", "Mrunal", "Agastya / Tanya", "Nick / Jeremiah", "Jasdeep", "Gillian", "Everyone", "Everyone", "Everyone", "Agastya", "Gillian", "Agastya / Jasdeep/ Tanya / Mrunal", "Nick", "Jeremiah", "Gillian / Jeremiah", "Everyone", "Tanya / Jasdeep/ Mrunal", "Tanya / Mrunal", "Agastya / Jasdeep", "Jeremiah"), Due = c("04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/08/24", "04/08/24", "04/08/24", "04/11/24", "04/15/24", "04/18/24", "04/20/24", "04/22/24", "04/24/24", "04/26/24", "04/30/24", "05/02/24", "05/04/24", "05/04/24"), Priority = c("High", "High", "Medium", "High", "Medium", "Medium", "High", "High", "High", "Medium", "Medium", "High", "High", "High", "Medium", "Medium", "Medium", "High", "High", "High"), Summary = c("Explore Ideas along with relevant Dataset", "Explain Goals of the Project", "Describe variables & columns in dataset", "Analysis & Plan for Implementation", "Create Plan of Attack template", "Explain Repository structure", "Fix suggestions by Peer-Review1", "Fix suggestions by Peer-Review2", "Proposal Final Changes", "Clean values form dataset", "Create Layout for Shiny", "Code writeup for Plot1", "Code writeup for Plot2", "Code writeup for Plot3", "Create interactive feautures on shiny", "Explanation for each Question", "Create Presentation layout", "Add plot1 to Presentation", "Add plot2 to Presentation", "Add plot3 to Presentation") ) # Add headers for major checkpoints during Project task_data <- rbind(c("Proposal", "", "", "", "", ""), task_data[1:6, ], c("Peer Review", "", "", "", "", ""), task_data[7:9, ], c("Implementation & Write-up", "", "", "", "", ""), task_data[10:16, ], c("Presentation", "", "", "", "", ""), task_data[17:nrow(task_data), ]) # Display table datatable(task_data, options = list(dom = 't', paging = FALSE), rownames = FALSE, colnames = c('Task Name', 'Status', 'Assignee', 'Due', 'Priority', 'Summary'), class = 'display nowrap compact', escape = FALSE) ``` ## Repository organization - Main/root: All relevant Quarto markdown code, YAML settings, and folders. - `.gitignore`: List of files or file types to not track via version control. - `README.md`: Brief project description, viewable for the [repository landing page](https://github.com/INFO-526-S24/project-final-VizWizards). - `*.qmd`: Quarto markdown files for the publication of the GitHub website, including the main landing page to contain the final report (`index.qmd`), the project proposal (`proposal.qmd`), a list of project members (`about.qmd`), and the slidedeck for the final presentation (`presentation.qmd`). - `/data` folder: - `Meteorite_Landings.csv`: Raw data file - `README.md`: Data dictionary for meteors.csv - `/images` folder: Any images to be added for the presentation.qmd or GitHub website. - `/shiny` folder: - `app.R`: All R code for both the Shiny UI and server. To be independent from the main folder to allow for publication on a separate Shiny server. - `README.md`: Description of the folder - Other: - `/.github` folder: For automated workflows. - `/_extra` folder: For any drafting work or notes not to be included in the final project. - `/_freeze` folder: Automatically-generated GitHub website publication code.