Meteoric Fall: a comet-ment to data

Proposal

Data visualization
Developing a Shiny app to explore and analyze meteorite landings data, unraveling patterns and insights into meteor impact distribution and historical events
Author
Affiliation

Viz Wizards: Nick Ferrante, Jeremiah Gaiser, Tanya Evita George,
Mrunal Jadhav, Jasdeep Singh Jhajj, Gillian McGinnis, Agastya Deshraju

School of Information, University of Arizona

Project Goal

The Goal of the Project is to make use of Shiny app on the Meteorite Landings dataset to create an interactive platform that allows users to explore and analyze meteorite landings data through dynamic visualizations.

Introduction

The proposed project aims to create an interactive platform using Shiny app to explore and analyze meteorite landings data. This platform will allow users to delve into the data through dynamic visualizations, enhancing their understanding of meteorite landings on Earth. The project seeks to answer key questions about the distribution of meteor impacts across the earth, the continents with the most accumulated total mass of meteors, and the relationship between historical events and the observations and discoveries of meteors. By analyzing this data, we hope to explain the reasons behind certain historical events and predict the most vulnerable places for future meteorite crashes. Such insights could significantly contribute to scientific research in fields such as astrology, data analysis, and geology.

Dataset

Load the Meteorite Landings data
# Loading the data
meteorite_data <- read.csv("data/Meteorite_Landings.csv")

metadata_meteorite <- tibble(
  Column = names(meteorite_data),
  DataType = sapply(meteorite_data, class),
  SampleData = sapply(meteorite_data, function(column) {
    first_non_na <- column[!is.na(column)][1]
    if (is.numeric(first_non_na)) {
      return(format(first_non_na, nsmall = 2))
    } else {
      return(as.character(first_non_na))
    }
  })
)

Dataset Description and Motivation

The data set Meteorite_Landings.csv, which will be used for this project has been sourced from NASA’s Open Data Portal. It contains the complete set of information of all known meteorite landings on Earth. The dimensions of the data set include 10 variables total which consist of 6 numerical and 4 categorical variables, across 45716 observations. The reason why this data set was chosen was because of the opportunity to scientifically explore all the factors that go into studying how and why meteorites land on specific parts of the Earth. Using this data set, the project could explain the reasons behind certain historical events and predict the most vulnerable places for meteorite crashes which could help in their scientific study and research in fields such as astrology, data analysis and geology.

Metadata

The data includes 10 variables- 6 numerical and 4 categorical. The metadata is shown down below, with a sample of the individual variable as well. The descriptions of what each variable represents is shown as well.

Column DataType SampleData
name character Aachen
id integer 1
nametype character Valid
recclass character L5
mass..g. numeric 21.00
fall character Fell
year integer 1880
reclat numeric 50.775
reclong numeric 6.08333
GeoLocation character (50.775, 6.08333)

Description of the metadata

  • name: The name of the meteorite (usually the location of the meteorite landing)
  • id: The unique identifier number assigned to the meteorite.
  • nametype: Can be one of two different categories:
    • valid: A regular meteorite.
    • relict: A meteorite that has been degraded over the years due to weather.
  • recclass: The recommended class of the meteorite; that is classified based on certain characteristics of the meteorite such as, chemical, isotopic, and mineralogical properties.
  • mass(g): The mass of the meteorite, given in grams.
  • fall: Can be one of two different categories:
    • fell: Classified as fell when the fall of the meteorite is observed.
    • found: Classified as found when the fall of the meteorite is not observed, but the meteorite was found later.
  • reclat: The latitude of the meteorite’s landing.
  • reclong: The longitude of the meteorite’s landing.
  • GeoLocation: Combination of the latitude and longitude of the meteorite’s landing.

Questions

Question 1- What does the distribution of meteor impacts look like across the earth?

  • Which continents have accumulated the most total mass of meteors?

Question 2- How do historical events relate to the observations and discoveries of meteors?

Question 3- Are trends observed in frequency, type, or location of meteors, relateded to known celestial events?

Data Cleaning

Variables Data Type Number of missing values Percentage of missing values Number of unique values Rate of unique values
name character 0 0.0000000 45716 1.0000000
id integer 0 0.0000000 45716 1.0000000
nametype character 0 0.0000000 2 0.0000437
recclass character 0 0.0000000 466 0.0101934
mass..g. numeric 131 0.2865518 12577 0.2751116
fall character 0 0.0000000 2 0.0000437
year integer 291 0.6365386 266 0.0058185
reclat numeric 7315 16.0009625 12739 0.2786552
reclong numeric 7315 16.0009625 14641 0.3202599
GeoLocation character 0 0.0000000 17101 0.3740703

The above table shows the number of missing values from each column, as well as the number of unique values present in the dataset.

  • The first step that will be taken to clean the data, includes filtering out all the values in the year column. All years before ‘860 CE’ and after ‘2016 CE’ are incorrect and can be discarded.

  • Then, all longitude and latitude values, from the reclong and reclat columns respectively, greater than ‘180°’ and less than ‘-180°’ can be filtered out.

  • All the reportings at (0°N, 0°E) can be treated as ‘NULL’ values as these reportings didn’t have an exact location when they were sighted, or they were found in areas such as Antartica where reporting on the exact locations would present some challenges.

  • All ‘NULL’ values would be ignored for Question 1, which deals with plotting the locations of each reporting on an interactive map.

  • These values could however prove valuable for Questions 2 & 3 and will therefore be left in the dataset for them.

Analysis plan

Approach for Question 1

To display the distribution of meteor impacts on earth we are going to implement an interactive leaflet map allowing the user to navigate and explore specific regions of the world and the meteors in those regions. We will use the reclat and reclong variables as the latitude and longitude to plot the locations of these meteors. The interactive aspect of the map will allow users to explore specific observations including the year the meteor fell or the year that it was found, the mass of the meteor, and this could also potentially allow for information regarding the classification of the meteor’s characteristics. The interactive portion of the plot will be implemented using the variables fall, year, and recclass.

To answer the question of which continents have accumulated the most total mass of meteors, we will use the mass..g. variable along with the frequency of meteor observations in each continent to determine the total mass to then create a density map displaying these densities for each continent.

Approach for Question 2

To answer the question, “How do historical events relate to the observations and discoveries of meteors?”, we are going to create an animated line graph of the amount of meteor observations per year and add annotations representing major historical events, thus allowing for the analysis of the frequency of meteor observations and their relation to historical events. The animation will display the progression of meteor observations over time with relevant historical events appearing as they are reached on the timeline (x axis). This investigation will allow us to see if there were any historical events that sparked increased activity in the study of meteors, such as an increase in meteor findings after the space race. The main variables here will be the year variable, a meteor frequency variable that will be created through data wrangling, along with an external reference to historical events.

Approach for Question 3

Meteors are stray particles and debris ejected from neighboring celestial bodies within our solar system. We anticipate that patterns in meteor falls will align with Earth’s proximity to astronomical objects like comets, planets, and the moon.

Our goal is to pinpoint statistically significant patterns in meteor data and link these to specific celestial occurrences. For instance, the annual Lyrids meteor shower is the consequence of Earth’s passing through the debris path of Comet Thatcher. As a result, we observe more frequent meteor impacts on the surface of the moon.

To identify such events, we will analyze meteor data over time to spot any activity clusters that significantly vary from average observations. Next, we will consult external data sources, such as NASA.gov, to check if these periods of heightened activity coincide with known astronomical events, such as annual meteor showers or close encounters with other planets.

We plan to incorporate these findings into our meteor impact visualizations. This could involve color-coding individual meteors to show which celestial event triggered them, or adding a basic map of the solar system to show Earth’s position relative to these events.

By embedding this information into our visualizations, we aim to provide a deeper understanding of Earth’s role in the solar system and illustrate how meteor falls are interconnected with the dynamic cycles of space.

Timeline - Weekly Plan of Attack

Repository organization

  • Main/root: All relevant Quarto markdown code, YAML settings, and folders.
    • .gitignore: List of files or file types to not track via version control.
    • README.md: Brief project description, viewable for the repository landing page.
    • *.qmd: Quarto markdown files for the publication of the GitHub website, including the main landing page to contain the final report (index.qmd), the project proposal (proposal.qmd), a list of project members (about.qmd), and the slidedeck for the final presentation (presentation.qmd).
  • /data folder:
    • Meteorite_Landings.csv: Raw data file
    • README.md: Data dictionary for meteors.csv
  • /images folder: Any images to be added for the presentation.qmd or GitHub website.
  • /shiny folder:
    • app.R: All R code for both the Shiny UI and server. To be independent from the main folder to allow for publication on a separate Shiny server.
    • README.md: Description of the folder
  • Other:
    • /.github folder: For automated workflows.
    • /_extra folder: For any drafting work or notes not to be included in the final project.
    • /_freeze folder: Automatically-generated GitHub website publication code.