Revealing UFO Sighting Pattern

A Data-Driven Investigation into Global UFO Sightings

Author

Plot Crafters

Abstract

The project delves into the realm of UFO sightings by conducting a data-driven investigation into a comprehensive dataset comprising over 80,000 records dating back to 1949. Leveraging information such as latitude, longitude, date, and detailed descriptions of each sighting, the analysis aims to unravel temporal, spatial, and descriptive patterns associated with these unidentified flying objects. The dataset, originating from NUFORC, provides a robust foundation for exploring global UFO activities, and the project employs data wrangling, exploratory data analysis, and statistical approaches to answer key questions about the frequency and distribution of sightings worldwide.

Utilizing statistical approaches like geospatial clustering and spatial autocorrelation, the project aims to provide a comprehensive understanding of the worldwide prevalence of UFO sightings. Beyond the analytical aspects, the project emphasizes the broader impact on public perception, scientific inquiry, and the potential for future research. The dissemination of findings holds the promise of not only contributing to scientific discourse but also fostering public engagement and education, thereby enhancing awareness and knowledge about UFO sightings.

Introduction

The dataset under investigation is drawn from the TidyTuesday project, presenting a comprehensive compilation of records documenting encounters with unidentified flying objects (UFOs). With origins tracing back to diverse temporal, spatial, and descriptive dimensions, this dataset unfolds as a multifaceted exploration of the mysterious world of UFO sightings. Imbued with details such as precise date and time stamps, geographical coordinates, and intricate descriptions of each reported incident, this dataset forms a robust foundation for a thorough examination. As we embark on this data-driven journey, our objective is to discern latent patterns and unveil nuanced insights, contributing to a heightened comprehension of the intricate temporal and spatial dynamics characterizing UFO encounters on a global scale.

This repository of UFO sightings serves as a captivating reservoir for exploration, providing researchers, data enthusiasts, and the wider public with an opportunity to dissect the enigmatic phenomena surrounding unidentified flying objects. By delving into the layers of information embedded within this dataset, we aim to demystify the patterns and intricacies that define these sightings, fostering a deeper understanding of the uncharted territories within the UFO phenomenon.

Question 1 : How many UFO sightings have been reported worldwide, and are there particular regions where these sightings occur more frequently?

Introduction

For this question we want to know where the frequency and location, or the concentration, of UFO sightings. Under the ufo_sightings dataset we require the longitude and latitude of the sighting as well as the count. We’re interested in this question because it’s in our best interest to know where delusional people who claim ufo sightings are so we can avoid them.

Approach

We first ensure that the dataset is clean, with no missing or invalid values that could affect our analysis. We then calculate summary statistics such as the mean, median, and standard deviation of the number of sightings globally to give us an idea of the distribution of ufo sightings.

We then create a geographical visualization where we use latitude, longitude, and state data to create a heatmap and choropleth map to visualize the global distribution of UFO sightings. This will help identify geographical hotspots and trends. For our last visualization we do a spatial analysis via spatial clustering using algorithms like K-means to identify regions with a high concentration of sightings. Additionally, we’ll assess spatial autocorrelation to determine if there are spatial patterns in the distribution of sightings.

Analysis

suppressMessages({
library(tidyverse)
})
Warning: package 'ggplot2' was built under R version 4.3.3
Warning: package 'tidyr' was built under R version 4.3.3
#> Loading the Dataset 

#> UFO_Sightings


ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", show_col_types = FALSE)
suppressWarnings({
library(ggplot2)

# Load world map data
world_map <- map_data("world")

# Read the UFO sightings data
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", show_col_types = FALSE)

# Data Cleaning
ufo_sightings_clean <- ufo_sightings %>%
  filter(!is.na(state))

# Create Heatmap with map outline
heatmap_with_map <- ggplot() +
  geom_polygon(data = world_map, aes(x = long, y = lat, group = group), color = "black", fill = NA) +  # Map outline
  geom_bin2d(data = ufo_sightings_clean, aes(x = longitude, y = latitude), bins = 50, color = "black") +  # Heatmap
  scale_fill_gradient(name = "Frequency", low = "#045275", high = "#D32F2F") +  # Color gradient
  labs(title = "Heatmap of UFO Sightings", 
       subtitle = "Heatmap Showing Frequency of Sightings",
       x = "Longitude", y = "Latitude") +  # Axis labels
  theme_minimal() +  # Minimal theme
  theme(plot.title = element_text(size = 20, face = "bold"),  # Title font size and style
        plot.subtitle = element_text(size = 14, color = "gray"),  # Subtitle font size and color
        axis.text = element_text(size = 10),  # Axis text size
        legend.title = element_text(size = 12),  # Legend title size
        legend.text = element_text(size = 10))  # Legend text size

# Print the heatmap with map outline
print(heatmap_with_map)
})

This heatmap of UFO sightings worldwide, where each bin represents a certain area on the map, and the color intensity represents the frequency of sightings in that area. This visualization will help us identify regions with higher concentrations of UFO sightings. In the above Heatmap, we can tell most of the sightings are around the USA region and little over East Europe, the frequency is not distributed around as expected, the entire map is below 2000 sightings and UAS having higher than 2000. This suggests that while UFO sightings are reported worldwide, they remain rare events. Factors such as population density, cultural beliefs, and reporting biases likely contribute to the observed distribution patterns. Further exploration of these factors is necessary to better understand the underlying reasons for the observed variations in UFO sighting frequencies.

suppressWarnings({
# Count sightings by country
sightings_by_country <- ufo_sightings_clean %>%
  count(country)

# Read world map data from ggplot2
world_map <- map_data("world")

# Merge with sightings data
world_map_with_sightings <- left_join(world_map, sightings_by_country, by = c("region" = "country"))

# Create Choropleth Map with individual sightings as points
choropleth_map <- ggplot() +
  geom_polygon(data = world_map_with_sightings, aes(x = long, y = lat, group = group, fill = n), color = "black") +
  geom_point(data = ufo_sightings_clean, aes(x = longitude, y = latitude), color = "red", alpha = 0.3) +
  labs(title = "Global Distribution of UFO Sightings by Country",
       subtitle = "Choropleth Map with Individual Sightings",
       x = "Longitude",
       y = "Latitude",
       fill = "Number of Sightings") +  # Add legend title
  theme_minimal() +  # Use minimal theme
  theme(plot.title = element_text(size = 20, face = "bold"),  # Adjust title font size and style
        plot.subtitle = element_text(size = 14, color = "gray"),  # Adjust subtitle font size and color
        axis.title.x = element_text(size = 12),  # Adjust x-axis title size
        axis.title.y = element_text(size = 12),  # Adjust y-axis title size
        legend.title = element_text(size = 12),  # Adjust legend title size
        legend.text = element_text(size = 10)) +  # Adjust legend text size
  scale_fill_gradient(name = "Number of Sightings", low = "#FFEDA0", high = "#F44336")  # Adjust color gradient

# Print the choropleth map
print(choropleth_map)

})

This graph overlays the individual UFO sightings as red points on top of the choropleth map, allowing you to see both the country-level distribution and the specific locations of sightings. This along side the heatmap will give a better understanding where the freaquency and distribution of UFO sightings take place.

Discussion

The heat map and choropleth map above show that most UFO sightings occur in the USA, especially in the West Bay area. While most sightings are in the West Bay, some are also seen in the northern and eastern USA.

We speculate that there are several reasons for such distribution of sightings. First, higher population densities in the USA contribute to more reported UFO sightings, as there are more observers to witness and report such phenomena. Second, the USA’s rich history of UFO-related media, including movies and TV shows, may influence public perception and reporting behavior, leading to increased awareness and reporting of UFO sightings. Third, the presence of military installations in certain regions, coupled with the testing of experimental aircraft or other technology, may result in misidentifications and increased UFO sightings.

Question 2: Is there any specific time of the day when UFO sighting are more prevalent?

Introduction

For this question we want to visualize any temporal trends in UFO sightings throughout the day. We utilize the hour_of_day and sighting count variable from the ufo_sightings_with_time_of_day dataset. We’re interested in this question because we want to see what time of day people are most delusional.

Approach

We loaded the dataset ufo_sightings from the “TidyTuesday” source using the read.csv function in R. We verified the date_time column to ensure if it is the correct data type. After that we created a new variable under the name time_of_day, and upon extracting time from date_time variable we created the following time bins: morning_hours (6:00 AM - 11:59 AM), afternoon_hours(12:00 PM - 17:59 PM), evening_hours(18:00 PM - 21:59 PM), and night_hours(22:00 PM - 5:59 AM). We analyzed sighting patterns within each bin.

Lastly, we grouped the data for each bin and counted the number of ufo sightings and saved it in dataset under the name ufo_sightings_with_time_of_day.csv. We then plotted the ufo_sightings_with_time_of_day.csv dataset using horizontal bar chart with different time bins at x-axis and count of ufo_sightings at y-axis.

Analysis

suppressWarnings({
# Load necessary libraries
library(ggplot2)
library(png)
library(tidyverse)
library(lubridate)
suppressMessages({  
library(viridis)
})
library(png)
suppressMessages({  
library(cowplot)
})
# Read the data
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", show_col_types = FALSE)

# Convert character strings of date_time variable into POSIXct format
ufo_sightings$date_time <- as.POSIXct(ufo_sightings$date_time, format = "%m/%d/%Y %H:%M")

# Extract the hour component
ufo_sightings$hour_of_day <- hour(ufo_sightings$date_time)

# Create a summary table for counts of sightings per hour
hourly_counts <- table(ufo_sightings$hour_of_day)

# Convert the summary table to a data frame
hourly_counts_df <- data.frame(hour = as.numeric(names(hourly_counts)), count = as.numeric(hourly_counts))

# Plotting with polar coordinates
bar_plot_polar <- ggplot(hourly_counts_df, aes(x = factor(hour), y = count)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  scale_x_discrete(name = "Hour of Day") +
  scale_y_continuous(name = NULL) +
  labs(title = "UFO Sightings by Hour of Day") +
  theme_minimal() +
  theme(axis.text.y = element_blank(),
        axis.line.y = element_blank(),
        axis.ticks = element_blank(), 
        panel.grid.major.y = element_blank()) +
  coord_polar(start = -pi/1.45)

# Read the image
img <- readPNG("images/q2Clock.png")
ger <- grid::rasterGrob(img, interpolate=TRUE)

# Draw the plot
coord_plot <- ggdraw(bar_plot_polar)

# Combine the plot and the image
coord_plot + draw_grob(ger, 0.168, 0.18, 0.65, 0.65) + 
  draw_plot(bar_plot_polar)
})

  1. Optimal Visibility: Darker evening skies provide better visibility, aiding in the detection of unusual objects or lights.

  2. Leisure Activities: Evening leisure pursuits, like stargazing and outdoor activities, increase opportunities for people to spot UFOs while relaxing outdoors.

3.Cultural Influence: Cultural depictions of UFO sightings during nighttime and psychological factors may heighten vigilance and reporting of sightings during evening hours.

suppressWarnings({
# Load necessary libraries
library(tidyverse)

#> Loading the Dataset 

#> UFO_Sightings

ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", show_col_types = FALSE)

# Load the lubridate package
library(lubridate)


# converting character strings of date_time variable into a format that R can recognize and manipulate as date-time data.
ufo_sightings$date_time <- as.POSIXct(ufo_sightings$date_time, format = "%m/%d/%Y %H:%M")

# Extracting the hour component
hour_of_day <- hour(ufo_sightings$date_time)

# Define time ranges for each bin
morning_hours <- 6:11
afternoon_hours <- 12:17
evening_hours <- 18:21
night_hours <- c(22:23, 0:5)

# Function to assign time bins
assign_time_bin <- function(hour_of_day) {
  if (hour_of_day %in% morning_hours) {
    return("Morning")
  } else if (hour_of_day %in% afternoon_hours) {
    return("Afternoon")
  } else if (hour_of_day %in% evening_hours) {
    return("Evening")
  } else {
    return("Night")
  }
}


library(ggplot2)
library(dplyr)

# Read the CSV file and prepare the data
ufo_data <- read.csv("ufo_sightings_with_time_of_day.csv")

# Convert time_of_day to a factor with specified order
ufo_data$time_of_day <- factor(ufo_data$time_of_day, levels = c("Morning", "Afternoon", "Evening", "Night"))

# Group the data by time_of_day and summarize to calculate count of sightings
ufo_summary <- ufo_data %>%
  group_by(time_of_day) %>%
  summarize(count = n())

# Create the plot
ggplot(ufo_summary, aes(x = time_of_day, y = count, fill = count)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(name = "Count", low = "#ADD8E6", high = "#000080") +  # Use a gradient scale from light blue to navy blue
  geom_text(aes(label = count), vjust = -0.5, color = "black", size = 3) +  
  theme_minimal() +
  labs(title = "UFO Sightings Count by Time of Day",
       x = "Time of Day",
       y = "Count") +
  theme(plot.background = element_rect(fill = "white", color = NA),  
        panel.background = element_rect(fill = "white", color = NA),  
        panel.grid.major = element_line(color = "gray", size = 0.2),  
        panel.grid.minor = element_blank(),
        axis.line = element_line(color = "black"),
        axis.text = element_text(color = "black", size = 10),  # Adjust size of axis text
        axis.title = element_text(color = "black", size = 12),  # Adjust size of axis titles
        plot.title = element_text(hjust = 0.5, size = 20, color = "black", face = "bold"), 
        legend.position = "right") +  # Position legend on the right side
  guides(fill = guide_colorbar(title.position = "top", title.vjust = 0.5))  })

Discussion

According to the visualization above, it appears, unsurprisingly, that UFO sightings were most frequent at night time, followed by evening, afternoon, and morning. We speculate that this is so because when it is dark out people are less able to distincly recognize what object in the sky they are looking at. Why there are fewer sightings in the morning than in the afternoon makes little sense as there would be enough light in the sky to identify, or not identify, the object.