Developing a Shiny app to explore and analyze meteorite landings data, unraveling patterns and insights into meteor impact distribution and historical events
Author
Affiliation
Viz Wizards: Nick Ferrante, Jeremiah Gaiser, Tanya Evita George, Mrunal Jadhav, Jasdeep Singh Jhajj, Gillian McGinnis, Agastya Deshraju
School of Information, University of Arizona
Project Goal
The Goal of the Project is to make use of Shiny app on the Meteorite Landings dataset to create an interactive platform that allows users to explore and analyze meteorite landings data through dynamic visualizations.
Introduction
The proposed project aims to create an interactive platform using Shiny app to explore and analyze meteorite landings data. This platform will allow users to delve into the data through dynamic visualizations, enhancing their understanding of meteorite landings on Earth. The project seeks to answer key questions about the distribution of meteor impacts across the earth, the continents with the most accumulated total mass of meteors, and the relationship between historical events and the observations and discoveries of meteors. By analyzing this data, we hope to explain the reasons behind certain historical events and predict the most vulnerable places for future meteorite crashes. Such insights could significantly contribute to scientific research in fields such as astrology, data analysis, and geology.
The data set Meteorite_Landings.csv, which will be used for this project has been sourced from NASA’s Open Data Portal. It contains the complete set of information of all known meteorite landings on Earth. The dimensions of the data set include 10 variables total which consist of 6 numerical and 4 categorical variables, across 45716 observations. The reason why this data set was chosen was because of the opportunity to scientifically explore all the factors that go into studying how and why meteorites land on specific parts of the Earth. Using this data set, the project could explain the reasons behind certain historical events and predict the most vulnerable places for meteorite crashes which could help in their scientific study and research in fields such as astrology, data analysis and geology.
Metadata
The data includes 10 variables- 6 numerical and 4 categorical. The metadata is shown down below, with a sample of the individual variable as well. The descriptions of what each variable represents is shown as well.
Column
DataType
SampleData
name
character
Aachen
id
integer
1
nametype
character
Valid
recclass
character
L5
mass..g.
numeric
21.00
fall
character
Fell
year
integer
1880
reclat
numeric
50.775
reclong
numeric
6.08333
GeoLocation
character
(50.775, 6.08333)
Description of the metadata
name: The name of the meteorite (usually the location of the meteorite landing)
id: The unique identifier number assigned to the meteorite.
nametype: Can be one of two different categories:
valid: A regular meteorite.
relict: A meteorite that has been degraded over the years due to weather.
recclass: The recommended class of the meteorite; that is classified based on certain characteristics of the meteorite such as, chemical, isotopic, and mineralogical properties.
mass(g): The mass of the meteorite, given in grams.
fall: Can be one of two different categories:
fell: Classified as fell when the fall of the meteorite is observed.
found: Classified as found when the fall of the meteorite is not observed, but the meteorite was found later.
reclat: The latitude of the meteorite’s landing.
reclong: The longitude of the meteorite’s landing.
GeoLocation: Combination of the latitude and longitude of the meteorite’s landing.
Questions
Question 1- What does the distribution of meteor impacts look like across the earth?
Which continents have accumulated the most total mass of meteors?
Question 2- How do historical events relate to the observations and discoveries of meteors?
Question 3- Are trends observed in frequency, type, or location of meteors, relateded to known celestial events?
Data Cleaning
Variables
Data Type
Number of missing values
Percentage of missing values
Number of unique values
Rate of unique values
name
character
0
0.0000000
45716
1.0000000
id
integer
0
0.0000000
45716
1.0000000
nametype
character
0
0.0000000
2
0.0000437
recclass
character
0
0.0000000
466
0.0101934
mass..g.
numeric
131
0.2865518
12577
0.2751116
fall
character
0
0.0000000
2
0.0000437
year
integer
291
0.6365386
266
0.0058185
reclat
numeric
7315
16.0009625
12739
0.2786552
reclong
numeric
7315
16.0009625
14641
0.3202599
GeoLocation
character
0
0.0000000
17101
0.3740703
The above table shows the number of missing values from each column, as well as the number of unique values present in the dataset.
The first step that will be taken to clean the data, includes filtering out all the values in the year column. All years before ‘860 CE’ and after ‘2016 CE’ are incorrect and can be discarded.
Then, all longitude and latitude values, from the reclong and reclat columns respectively, greater than ‘180°’ and less than ‘-180°’ can be filtered out.
All the reportings at (0°N, 0°E) can be treated as ‘NULL’ values as these reportings didn’t have an exact location when they were sighted, or they were found in areas such as Antartica where reporting on the exact locations would present some challenges.
All ‘NULL’ values would be ignored for Question 1, which deals with plotting the locations of each reporting on an interactive map.
These values could however prove valuable for Questions 2 & 3 and will therefore be left in the dataset for them.
Analysis plan
Approach for Question 1
To display the distribution of meteor impacts on earth we are going to implement an interactive leaflet map allowing the user to navigate and explore specific regions of the world and the meteors in those regions. We will use the reclat and reclong variables as the latitude and longitude to plot the locations of these meteors. The interactive aspect of the map will allow users to explore specific observations including the year the meteor fell or the year that it was found, the mass of the meteor, and this could also potentially allow for information regarding the classification of the meteor’s characteristics. The interactive portion of the plot will be implemented using the variables fall, year, and recclass.
To answer the question of which continents have accumulated the most total mass of meteors, we will use the mass..g. variable along with the frequency of meteor observations in each continent to determine the total mass to then create a density map displaying these densities for each continent.
Approach for Question 2
To answer the question, “How do historical events relate to the observations and discoveries of meteors?”, we are going to create an animated line graph of the amount of meteor observations per year and add annotations representing major historical events, thus allowing for the analysis of the frequency of meteor observations and their relation to historical events. The animation will display the progression of meteor observations over time with relevant historical events appearing as they are reached on the timeline (x axis). This investigation will allow us to see if there were any historical events that sparked increased activity in the study of meteors, such as an increase in meteor findings after the space race. The main variables here will be the year variable, a meteor frequency variable that will be created through data wrangling, along with an external reference to historical events.
Approach for Question 3
Meteors are stray particles and debris ejected from neighboring celestial bodies within our solar system. We anticipate that patterns in meteor falls will align with Earth’s proximity to astronomical objects like comets, planets, and the moon.
Our goal is to pinpoint statistically significant patterns in meteor data and link these to specific celestial occurrences. For instance, the annual Lyrids meteor shower is the consequence of Earth’s passing through the debris path of Comet Thatcher. As a result, we observe more frequent meteor impacts on the surface of the moon.
To identify such events, we will analyze meteor data over time to spot any activity clusters that significantly vary from average observations. Next, we will consult external data sources, such as NASA.gov, to check if these periods of heightened activity coincide with known astronomical events, such as annual meteor showers or close encounters with other planets.
We plan to incorporate these findings into our meteor impact visualizations. This could involve color-coding individual meteors to show which celestial event triggered them, or adding a basic map of the solar system to show Earth’s position relative to these events.
By embedding this information into our visualizations, we aim to provide a deeper understanding of Earth’s role in the solar system and illustrate how meteor falls are interconnected with the dynamic cycles of space.
Timeline - Weekly Plan of Attack
Repository organization
Main/root: All relevant Quarto markdown code, YAML settings, and folders.
.gitignore: List of files or file types to not track via version control.
*.qmd: Quarto markdown files for the publication of the GitHub website, including the main landing page to contain the final report (index.qmd), the project proposal (proposal.qmd), a list of project members (about.qmd), and the slidedeck for the final presentation (presentation.qmd).
/data folder:
Meteorite_Landings.csv: Raw data file
README.md: Data dictionary for meteors.csv
/images folder: Any images to be added for the presentation.qmd or GitHub website.
/shiny folder:
app.R: All R code for both the Shiny UI and server. To be independent from the main folder to allow for publication on a separate Shiny server.
README.md: Description of the folder
Other:
/.github folder: For automated workflows.
/_extra folder: For any drafting work or notes not to be included in the final project.
---title: "Meteoric Fall: a comet-ment to data"subtitle: "Proposal"author: - name: "**Viz Wizards**: Nick Ferrante, Jeremiah Gaiser, Tanya Evita George,<br> Mrunal Jadhav, Jasdeep Singh Jhajj, Gillian McGinnis, Agastya Deshraju" affiliations: - name: "School of Information, University of Arizona"description: "Developing a Shiny app to explore and analyze meteorite landings data, unraveling patterns and insights into meteor impact distribution and historical events"format: html: code-tools: true code-overflow: wrap code-line-numbers: true embed-resources: trueeditor: visualcode-annotations: hovercategories: - Data visualizationexecute: warning: false message: falsecode-fold: true---```{r load-pkgs}#| label: load-pkgs#| message: false#| echo: false#| code-summary: "Install Packages"if (!require("pacman")) install.packages("pacman")# Loading the packagespacman::p_load(tidyverse, kableExtra, DT) ```## Project GoalThe Goal of the Project is to make use of Shiny app on the Meteorite Landings dataset to create an interactive platform that allows users to explore and analyze meteorite landings data through dynamic visualizations.## IntroductionThe proposed project aims to create an interactive platform using Shiny app to explore and analyze meteorite landings data. This platform will allow users to delve into the data through dynamic visualizations, enhancing their understanding of meteorite landings on Earth. The project seeks to answer key questions about the distribution of meteor impacts across the earth, the continents with the most accumulated total mass of meteors, and the relationship between historical events and the observations and discoveries of meteors. By analyzing this data, we hope to explain the reasons behind certain historical events and predict the most vulnerable places for future meteorite crashes. Such insights could significantly contribute to scientific research in fields such as astrology, data analysis, and geology.## Dataset```{r load-dataset}#| label: load-dataset#| code-summary: "Load the Meteorite Landings data"# Loading the datameteorite_data <- read.csv("data/Meteorite_Landings.csv")metadata_meteorite <- tibble( Column = names(meteorite_data), DataType = sapply(meteorite_data, class), SampleData = sapply(meteorite_data, function(column) { first_non_na <- column[!is.na(column)][1] if (is.numeric(first_non_na)) { return(format(first_non_na, nsmall = 2)) } else { return(as.character(first_non_na)) } }))```## Dataset Description and MotivationThe data set `Meteorite_Landings.csv`, which will be used for this project has been sourced from [NASA's Open Data Portal](https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/about_data). It contains the complete set of information of all known meteorite landings on Earth. The dimensions of the data set include 10 variables total which consist of 6 numerical and 4 categorical variables, across `r nrow(meteorite_data)` observations. The reason why this data set was chosen was because of the opportunity to scientifically explore all the factors that go into studying how and why meteorites land on specific parts of the Earth. Using this data set, the project could explain the reasons behind certain historical events and predict the most vulnerable places for meteorite crashes which could help in their scientific study and research in fields such as astrology, data analysis and geology.### MetadataThe data includes 10 variables- 6 numerical and 4 categorical. The metadata is shown down below, with a sample of the individual variable as well. The descriptions of what each variable represents is shown as well.```{r dataset-explore}#| label: dataset-explore#| code-summary: "Exploration of Dataset"#| echo: falsemetadata_meteorite %>% kable("html") %>% kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>% column_spec(1, bold = T)```**Description of the metadata**- **name**: The name of the meteorite (usually the location of the meteorite landing)- **id**: The unique identifier number assigned to the meteorite.- **nametype**: Can be one of two different categories: - *valid*: A regular meteorite. - *relict*: A meteorite that has been degraded over the years due to weather.- **recclass**: The recommended class of the meteorite; that is classified based on certain characteristics of the meteorite such as, chemical, isotopic, and mineralogical properties.- **mass(g)**: The mass of the meteorite, given in grams.- **fall**: Can be one of two different categories: - *fell*: Classified as fell when the fall of the meteorite is observed. - *found*: Classified as found when the fall of the meteorite is not observed, but the meteorite was found later.- **reclat**: The latitude of the meteorite's landing.- **reclong**: The longitude of the meteorite's landing.- **GeoLocation**: Combination of the latitude and longitude of the meteorite's landing.## Questions**Question 1**- What does the distribution of meteor impacts look like across the earth?- Which continents have accumulated the most total mass of meteors?**Question 2**- How do historical events relate to the observations and discoveries of meteors?**Question 3**- Are trends observed in frequency, type, or location of meteors, relateded to known celestial events?## Data Cleaning```{r}#| echo: false#| warning: falseinvestigate <- dlookr::diagnose(meteorite_data)colnames(investigate) <-c("Variables", "Data Type", "Number of missing values", "Percentage of missing values", "Number of unique values", "Rate of unique values")table <-kable(investigate, "html") |>kable_styling(full_width =FALSE)table```The above table shows the number of missing values from each column, as well as the number of unique values present in the dataset.- The first step that will be taken to clean the data, includes filtering out all the values in the `year` column. All years before '860 CE' and after '2016 CE' are incorrect and can be discarded.- Then, all longitude and latitude values, from the `reclong` and `reclat` columns respectively, greater than '180°' and less than '-180°' can be filtered out.- All the reportings at (0°N, 0°E) can be treated as 'NULL' values as these reportings didn't have an exact location when they were sighted, or they were found in areas such as Antartica where reporting on the exact locations would present some challenges.- All 'NULL' values would be ignored for Question 1, which deals with plotting the locations of each reporting on an interactive map. - These values could however prove valuable for Questions 2 & 3 and will therefore be left in the dataset for them.## Analysis plan### Approach for Question 1To display the distribution of meteor impacts on earth we are going to implement an interactive leaflet map allowing the user to navigate and explore specific regions of the world and the meteors in those regions. We will use the `reclat` and `reclong` variables as the latitude and longitude to plot the locations of these meteors. The interactive aspect of the map will allow users to explore specific observations including the year the meteor fell or the year that it was found, the mass of the meteor, and this could also potentially allow for information regarding the classification of the meteor's characteristics. The interactive portion of the plot will be implemented using the variables `fall`, `year`, and `recclass`.To answer the question of which continents have accumulated the most total mass of meteors, we will use the `mass..g.` variable along with the frequency of meteor observations in each continent to determine the total mass to then create a density map displaying these densities for each continent.### Approach for Question 2To answer the question, "How do historical events relate to the observations and discoveries of meteors?", we are going to create an animated line graph of the amount of meteor observations per year and add annotations representing major historical events, thus allowing for the analysis of the frequency of meteor observations and their relation to historical events. The animation will display the progression of meteor observations over time with relevant historical events appearing as they are reached on the timeline (x axis). This investigation will allow us to see if there were any historical events that sparked increased activity in the study of meteors, such as an increase in meteor findings after the space race. The main variables here will be the `year` variable, a meteor frequency variable that will be created through data wrangling, along with an external reference to historical events.### Approach for Question 3Meteors are stray particles and debris ejected from neighboring celestial bodies within our solar system. We anticipate that patterns in meteor falls will align with Earth's proximity to astronomical objects like comets, planets, and the moon.Our goal is to pinpoint statistically significant patterns in meteor data and link these to specific celestial occurrences. For instance, the annual Lyrids meteor shower is the consequence of Earth's passing through the debris path of Comet Thatcher. As a result, we observe more frequent meteor impacts on the surface of the moon.To identify such events, we will analyze meteor data over time to spot any activity clusters that significantly vary from average observations. Next, we will consult external data sources, such as NASA.gov, to check if these periods of heightened activity coincide with known astronomical events, such as annual meteor showers or close encounters with other planets.We plan to incorporate these findings into our meteor impact visualizations. This could involve color-coding individual meteors to show which celestial event triggered them, or adding a basic map of the solar system to show Earth's position relative to these events.By embedding this information into our visualizations, we aim to provide a deeper understanding of Earth's role in the solar system and illustrate how meteor falls are interconnected with the dynamic cycles of space.## Timeline - Weekly Plan of Attack```{r timeline}#| label: timeline#| code-summary: "Project Timeline"#| echo: false# Create the df for timeline taskstask_data <- data.frame( Task_Name = c("Dataset exploration", "Introduction & Goals<br>description", "Dataset Description &<br>Motivation", "Research & Analysis Plan", "Timeline & workflow", "Repository organization", "Peer Review1", "Peer Review2", "Finalize Proposal", "Initial Data cleaning", "Shiny app Setup", "Plot Development for Q1", "Plot Development for Q2", "Plot Development for Q3", "Shiny app Frontend", "Writeup content for index", "Presentation Writeup", "Add Plot 1 to presentation", "Add Plot 2 to presentation", "Add Plot 3 to presentation"), Status = c("Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done", "Done"), Assignee = c("Everyone", "Mrunal", "Agastya / Tanya", "Nick / Jeremiah", "Jasdeep", "Gillian", "Everyone", "Everyone", "Everyone", "Agastya", "Gillian", "Agastya / Jasdeep/<br>Tanya / Mrunal", "Nick", "Jeremiah", "Gillian / Jeremiah", "Everyone", "Tanya / Jasdeep/<br>Mrunal", "Tanya / Mrunal", "Agastya / Jasdeep", "Jeremiah"), Due = c("04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/03/24", "04/08/24", "04/08/24", "04/08/24", "04/11/24", "04/15/24", "04/18/24", "04/20/24", "04/22/24", "04/24/24", "04/26/24", "04/30/24", "05/02/24", "05/04/24", "05/04/24"), Priority = c("High", "High", "Medium", "High", "Medium", "Medium", "High", "High", "High", "Medium", "Medium", "High", "High", "High", "Medium", "Medium", "Medium", "High", "High", "High"), Summary = c("Explore Ideas along<br>with relevant Dataset", "Explain Goals of<br>the Project", "Describe variables &<br>columns in dataset", "Analysis & Plan<br>for Implementation", "Create Plan of<br>Attack template", "Explain Repository<br>structure", "Fix suggestions<br>by Peer-Review1", "Fix suggestions<br>by Peer-Review2", "Proposal Final<br>Changes", "Clean values form dataset", "Create Layout for Shiny", "Code writeup for Plot1", "Code writeup for Plot2", "Code writeup for Plot3", "Create interactive<br>feautures on shiny", "Explanation for<br>each Question", "Create Presentation<br>layout", "Add plot1 to<br>Presentation", "Add plot2 to<br>Presentation", "Add plot3 to<br>Presentation"))# Add headers for major checkpoints during Projecttask_data <- rbind(c("<b>Proposal</b>", "", "", "", "", ""), task_data[1:6, ], c("<b>Peer Review</b>", "", "", "", "", ""), task_data[7:9, ], c("<b>Implementation & Write-up</b>", "", "", "", "", ""), task_data[10:16, ], c("<b>Presentation</b>", "", "", "", "", ""), task_data[17:nrow(task_data), ])# Display tabledatatable(task_data, options = list(dom = 't', paging = FALSE), rownames = FALSE, colnames = c('Task Name', 'Status', 'Assignee', 'Due', 'Priority', 'Summary'), class = 'display nowrap compact', escape = FALSE)```## Repository organization- Main/root: All relevant Quarto markdown code, YAML settings, and folders. - `.gitignore`: List of files or file types to not track via version control. - `README.md`: Brief project description, viewable for the [repository landing page](https://github.com/INFO-526-S24/project-final-VizWizards). - `*.qmd`: Quarto markdown files for the publication of the GitHub website, including the main landing page to contain the final report (`index.qmd`), the project proposal (`proposal.qmd`), a list of project members (`about.qmd`), and the slidedeck for the final presentation (`presentation.qmd`).- `/data` folder: - `Meteorite_Landings.csv`: Raw data file - `README.md`: Data dictionary for meteors.csv- `/images` folder: Any images to be added for the presentation.qmd or GitHub website.- `/shiny` folder: - `app.R`: All R code for both the Shiny UI and server. To be independent from the main folder to allow for publication on a separate Shiny server. - `README.md`: Description of the folder- Other: - `/.github` folder: For automated workflows. - `/_extra` folder: For any drafting work or notes not to be included in the final project. - `/_freeze` folder: Automatically-generated GitHub website publication code.