if (!require("pacman")) install.packages("pacman")# use this line for installing/loadingpacman::p_load(devtools) pacman::p_load(tidyverse, openintro, gtable, ggrepel, patchwork, units, readr, gt)
Introduction
This project leverages data collected by the Open-Source Pyschometrics Project to reveal the relationship between popular culture and psychology. Through non-orthodox data analysis methods, 890 characters from 100 different universes could be compared and contrasted for their personalities. Each fictional universe denotes a different tv show or movie with popular characters within. While the characters used are fictional, the methods produced by this project will be re-usable and, in theory, applicable to collections of real-world people.
The complete psychometric dataset [1] is a combination of about 890 characters from 100 different universes across pop culture, media and entertainment.
The dataset “character.csv” consists of all these character’s names and IDs; the universes they are from (via a unique ID and name); their notability scores and; a link to a page that displays information about said characters. Additionally, there is a link to their picture to identify them.
The “psych_stats.csv” is made up of: characters names and IDs; their universe names and IDs; the juxtaposed personality traits (i.e. messy/neat or motivated/unmotivated). The next variable holds one of the two opposing traits: showing which one is more dominant. After this, three relevant numerical variables are present:
The average rating out of 100 of the dominant personality trait - based on survey responses
The rank for that personality trait, in comparison with all other characters
The standard deviation - calculated based on average rating
The “myers_briggs.csv” file consists of the character’s IDs and names as well as their universe names and IDs. It also displays a Myers-Briggs personality type then, that character’s percentage match with the aforementioned type.
To give context: each personality type is an ancronym of four parts of a personality - ESTJ stands for Extroverted, Observant, Thinking and Judging.
So for instance, Monica Geller has a 66.8% match wih the ESTJ type and 49.4% match with the INFP type. This implies that Monica is more extroverted than introverted.
The purpose of using this dataset is to validate a person’s Myers Briggs personality type by looking into their different character traits and determining how each trait contributes and affects a person’s behavior/personality accordingly.
Choosing this dataset
We chose this dataset because it would be more interesting to work on and visualize data that we can understand easily, are familiar with and can even relate with partially. Also, the topic on which this dataset is based is very engaging and we reckon it would attract more people to be interested in it. Since quite a number of fictional characters from different universes are used, it expands our dataset and adds more diversity to it, letting us make different types of plots to visualize the characters, their traits and the number of people that identify with them. Moreover, since fictional characters are not bound by real-world constraints, we have more freedom to explore unconventional or extreme personality traits (especially in our case) to use for data analysis in an engaging and accessible way. Using data based on fictional characters that represent personality types and personality traits is a unique way to portray what we have learnt about data visualization so far. Fusing our ‘after-study’ forms of entertainment with our academic subjects will make working on this project more interesting.
To exemplify the use of this dataset: many real-world people identify with the Game of Thrones character Jon Snow [See above image]. According to the personality analysis of Snow: he is more persistent than quitter; more diligent than lazy; more honourable than cunning; etc… Combining this with the information from the myers-briggs analysis might highlight a correlation between said data and his most dominant personality style: ISFJ. The real-world implication is that one who identifies with the same personality style, may then have statistical data to support the likelihood of being more diligent than lazy or; more persistent, etc…
Questions
How do Myers-Briggs personality types distribute across different universes, and how does the average match percentage vary within each universe? Additionally, is there any correlation between character notability scores and their Myers-Briggs types within each universe?
What is the frequency distribution of character personality traits across all characters, and how does it correlate with their average rating? Furthermore, do character notability scores vary significantly based on their personality traits?
Analysis plan
Question 1:
Introduction
Question one is primarily an exploration of the varying personality types that exist across cultural media. It looks to investigate the differences in character personalities and motivations across differing universes, potentially identifying correlations between genres or settings. Additionally, an attempt to uncover a relationship between popular opinion of characters and their in-universe personas will be made: are anti-heroes liked more? Are villains viewed more negatively than heroes? These questions will be answered providing information about the prevalence and celebration of particular personality types, which enhance our comprehension of audience engagement and storytelling dynamics.
In conjunction, analysing the distribution of Myers-Briggs personality types in fictional worlds offers important insights about the variety and growth of characters. We will learn more about consistency in portrayal by examining the relationship between the match percentages and the identified categories of the characters.
Variables:
‘Myers-Briggs’ dataframe:
myers_briggs: Myers-Briggs Type
avg_match_perc: Average Match Percentage
‘Characters’ dataframe:
uni_name: Universe Name
notability: Notability Score
Planning:
Using the common column ‘char_id’ between ‘Myers-Briggs’ dataframe and the ‘Characters’ dataframe, merge both the dataframes to get the ‘Myers-Briggs’ type and average match percentage for each character and drop all the unnecessary columns.
Group the data by ‘uni_name’ to analyze the distribution across different universes.
Discussion
External context and understanding of the genres and settings of the various universes will be brought in intuitively to aid understanding of character motivations and subsequent personality types. Following the visualisation of data, clear trends or a lack thereof will be available to educate us on whether personality types have an impact on audience engagement and/or likability of certain characters, or universes as a whole.
Question 2:
Introduction
Intriguing insights into the dynamics of character perception and storytelling impact can be gained by examining the frequency distribution of personality qualities in characters and how they relate to average ratings. One could explore whether a character that is commonly perceived as honourable or righteous has a resonance with the audience, or even a more fulfilling character arc. Tying this to the real-world it could be utilised to determine whether traits such as lazy correlate to a lesser perception of an individual from their peers.
Through exploring the characteristics that characters share the most and analyzing user acceptance, we may learn a great deal about audience involvement and story resonance. Furthermore, investigating whether personality traits have an impact on character notability scores deepens our understanding of character portrayal and how it affects the potency of storytelling. Once again, the real-world implications are varied, with positive character portrayals potentially having tangible influence on the actions of people in everyday walks of life.
Variables:
‘Psychology Stats’ dataframe:
personality: Character Personality
avg_rating: Average Rating
‘Characters’ dataframe:
notability: Notability Score
Planning:
Group the data by ‘personality’ to analyze the frequency distribution of character personality traits.
Merge the ‘Psychology Stats’ dataframe with the ‘Characters’ dataframe using the common column ‘char_id’ to get the average rating for each character.
Discussion
The data will provide insights into the interrelationships between differing personality traits. Would one’s honor counter their villainy? Does the kind-heartedness of a side character offset their evil actions? A number of relationships will be identified and scrutinized to conclude on how the traits alter public perception and if some hold more weight than others when deciding upon one’s opinions on a character - or person in general.
Potential Plots
Bar Charts and Histograms: these are the foundation, providing a solid groundwork for understanding the distribution of test scores or personality traits across different cohorts.
Scatter Plots: Scatter Plots helps to relate between different personality traits like plotting coordinates on a map it also explains the correlations and clusters within the data patterns
Radar Charts: Radar Charts Plots individual profiles across multiple dimensions of personality, scanning the horizon, these visualizations provide a comprehensive overview of strengths and weaknesses.
Box Plots: Visualizes these as distribution of test scores within specific categories or groups and examining the box plots and violin plots uncover outlines within the data.
Network Graphs: Picture network graphs provides interaction the connections between different personality types or traits between various elements of the dataset.
Heatmaps: These Heatmaps are used for thermal detecting patterns and correlations within vast datasets visualizing into similarities and differences across different groups or individuals.
Tree Maps: It explains the hierarchical organization within personality frameworks much like branches and leaves in a tree that shows the representation of the data.
These data visualisation methods are subject to change during the analysis of the data, some may prove futile and new ones may prove useful.