INFO 526 - Fall 2023 - Project 1
Our target dataset comes from the {refugees} R package
, which compiles extensive information on populations that have been strongly displaced from three main sources: UNHCR, UNRWA, and IDMC.
USA
.'year'
, 'coo_name'
, 'coa_name'
, from 'refugees'
data for plotting.Our Project’s aim is to prepare a comprehensive analysis that aims to uncover and present patterns in refugee populations globally over time. This study intends to delve into the patterns of refugee migration to the USA
, tracing movements from various countries over time. Through analyzing refugee data on an annual basis and adjusting for population figures, our goal is to identify trends and correlate them with major global occurrences such as wars, environmental crises, Policy changes, and economic shifts.
Define Pre-processing Function:
Defined a function processRefugees
to preprocess the refugee dataset. Selected relevant columns and handled missing data. Rename countries to standard names and created a categorical variable for grouping. Returned the pre-processed dataset. Process Data and Create Plots:
We Split population data into decades. Applied pre-processing function to each decade’s data. Defined a function generateRefugeePlot
to generate a ggplot map plot. Used the pre-processed data to create plots for each year.
Code
processRefugees <- function (dataset, unique_countries) {
filtered_data <- dataset |>
# filtering only country name, year and refugees columns
select(coo_name,
year,
refugees) |>
bind_rows(
# anti_join() is used to return only the rows from the first dataset that isn't having matching rows in the second dataset based on specified key columns
anti_join(unique_countries,
dataset,
by = c("region" = "coo_name")) |>
# adding year and number of refugees for that specific year as NA
mutate(year = as.integer(dataset[1,
"year"]),
refugees = NA)
) |>
mutate(
coo_name = case_when(
coo_name == "United States of America" ~ "USA",
coo_name == "United Kingdom of Great Britain and Northern Ireland" ~ "UK",
coo_name == "Iran (Islamic Rep. of)" ~ "Iran",
coo_name == "Palestinian" ~ "Palestine",
coo_name == "Serbia and Kosovo: S/RES/1244 (1999)" ~ "Serbia",
coo_name == "Türkiye" ~ "Turkey",
coo_name == "Congo" ~ "Congo",
coo_name == "Dem. Rep. of the Congo" ~ "Democratic Republic of the Congo",
coo_name == "Cote d'Ivoire" ~ "Ivory Coast",
coo_name == "Central African Rep." ~ "Central African Republic",
coo_name == "United Rep. of Tanzania" ~ "Tanzania",
coo_name == "Russian Federation" ~ "Russia",
coo_name == "Syrian Arab Rep." ~ "Syria",
coo_name == "Bolivia (Plurinational State of)" ~ "Bolivia",
coo_name == "Dominican Rep." ~ "Dominican Republic",
coo_name == "Venezuela (Bolivarian Republic of)" ~ "Venezuela",
coo_name == "Czechia" ~ "Czech Republic",
coo_name == "Rep. of Korea" ~ "South Korea",
coo_name == "Dem. People's Rep. of Korea" ~ "North Korea",
coo_name == "Lao People's Dem. Rep." ~ "Laos",
coo_name == "Viet Nam" ~ "Vietnam",
coo_name == "China, Hong Kong SAR" ~ "Hong Kong",
coo_name == "Netherlands (Kingdom of the)" ~ "Netherlands",
coo_name == "Cabo Verde" ~ "Cape Verde",
coo_name == "China, Macao SAR" ~ "Macao",
coo_name == "Holy See" ~ "Vatican City",
TRUE ~ coo_name
)
) |>
# creating a categorical variable refugee_m to group countries based on their number of refugee's
mutate(
refugees_m = case_when(
refugees < 100 ~ "<100",
refugees >= 100 & refugees < 500 ~ "100 to 500",
refugees >= 500 & refugees < 1000 ~ "500 to 1000",
refugees >= 1000 & refugees < 2000 ~ "1k to 2k",
refugees >= 2000 & refugees < 3000 ~ "2k to 3k",
refugees >= 3000 & refugees < 4000 ~ "3k to 4k",
refugees >= 4000 & refugees < 5000 ~ "4k to 5k",
refugees >= 5000 & refugees < 7000 ~ "5k to 7k",
refugees >= 7000 & refugees < 10000 ~ "7k to 10k",
refugees >= 10000 & refugees < 20000 ~ "10k to 20k",
refugees >= 20000 & refugees < 50000 ~ "20k to 50k",
refugees >= 50000 & refugees < 100000 ~ "50k to 100k",
refugees >= 100000 ~ "100k+",
is.na(refugees) ~ "NA"
)
) %>%
mutate(
refugees_m = factor(refugees_m,
levels = c("<100",
"100 to 500",
"1k to 2k",
"2k to 3k",
"3k to 4k",
"4k to 5k",
"5k to 7k",
"7k to 10k",
"10k to 20k",
"20k to 50k",
"50k to 100k",
"100k+",
"NA"))
)
return(filtered_data)
}
# Assuming filtered_data is a list of data frames for each year
filtered_data <- lapply(filtered_data,
function(df) {
df %>%
filter(!is.na(coo_name))
})
generateRefugeePlot <- function(year) {
world_plot <- ggplot(filtered_data[[as.character(year)]],
aes(map_id = coo_name)) +
geom_map(
aes(fill = refugees_m),
map = world,
color = "#B2BEB5",
linewidth = 0.25,
linetype = "blank"
) +
expand_limits(x = world$long, y = world$lat) +
scale_fill_manual(values = color_mapping, na.value = "#F2F3F4") +
coord_fixed(ratio = 1) +
labs(
title = paste("Number of Refugees by Country in",
year),
subtitle = "Migrated to USA",
caption = "Data source: TidyTuesday",
fill = "need to specify"
) +
theme_void() +
theme(
legend.position = "bottom",
legend.direction = "horizontal",
plot.title = element_text(size = 19,
face = "bold",
hjust = 0.5),
plot.subtitle = element_text(size = 15,
color = "azure4",
hjust = 0.5),
plot.caption = element_text(size = 12,
color = "azure4",
hjust = 0.95)
) +
guides(
fill = guide_legend(
nrow = 1,
direction = "horizontal",
title.position = "top",
title.hjust = 0.5,
label.position = "bottom",
label.hjust = 1,
label.vjust = 1,
label.theme = element_text(lineheight = 0.25,
size = 9),
keywidth = 1,
keyheight = 0.5
)
)
return(world_plot)
}
By analyzing annual refugee data alongside global events, our study reveals how geopolitical conflicts, natural disasters, and economic changes drive global displacement patterns. This comprehensive examination underscores the urgency of addressing the root causes of forced migration and the importance of informed humanitarian responses.
[1] Title: Refugees, Source: tidytuesday, Link: https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-08-22/readme.md
[2] Analyzed some of the global trends of refugee population from UNHCR - “https://www.unhcr.org/us/global-trends”
[3] Quarto, For documentation and presentation - Quarto
[4] ggplot, For understanding of different plot - ggplot
[5] Our Presentation logo - Link: https://www.vectorstock.com/royalty-free-vector/family-people-and-earth-nature-logo-vector-21169176