Psychometric Analysis

Proposal

Author

Byte Coders

Code
if (!require("pacman")) 
  install.packages("pacman")

# use this line for installing/loading
pacman::p_load(devtools) 

pacman::p_load(tidyverse,
           openintro,
           gtable,
           ggrepel,
           patchwork,
           units,
           readr,
           gt)

Introduction

This project leverages data collected by the Open-Source Pyschometrics Project to reveal the relationship between popular culture and psychology. Through non-orthodox data analysis methods, 890 characters from 100 different universes could be compared and contrasted for their personalities. Each fictional universe denotes a different tv show or movie with popular characters within. While the characters used are fictional, the methods produced by this project will be re-usable and, in theory, applicable to collections of real-world people.

Dataset

Code
characters <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/characters.csv')

myers_briggs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/myers_briggs.csv')

psych_stats <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/psych_stats.csv')

The three datasets can be located on their host pages:

Code
charactersTop <- head(characters)
myers_briggsTop <- head(myers_briggs)
psych_statsTop <- head(psych_stats)



charactersTop %>%gt() %>%
tab_header(title = "Characters Dataframe") %>%
  
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)
Characters Dataframe
id name uni_id uni_name notability link image_link
F2 Monica Geller F Friends 79.7 https://openpsychometrics.org/tests/characters/stats/F/2 https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg
F1 Rachel Green F Friends 76.7 https://openpsychometrics.org/tests/characters/stats/F/1 https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg
F5 Chandler Bing F Friends 74.4 https://openpsychometrics.org/tests/characters/stats/F/5 https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg
F4 Joey Tribbiani F Friends 74.3 https://openpsychometrics.org/tests/characters/stats/F/4 https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg
F3 Phoebe Buffay F Friends 72.6 https://openpsychometrics.org/tests/characters/stats/F/3 https://openpsychometrics.org/tests/characters/test-resources/pics/F/3.jpg
F6 Ross Geller F Friends 51.6 https://openpsychometrics.org/tests/characters/stats/F/6 https://openpsychometrics.org/tests/characters/test-resources/pics/F/6.jpg
characters.csv
variable type description
id varchar Character ID
name varchar Character Name
uni_id varchar Universe ID, e.g. GOT
uni_name varchar Universe Name, e.g. Game of Thrones
notability num Notability Score
link varchar Link to Character Page
image_link varchar Link to Character Image
Code
myers_briggsTop %>% gt() %>%
tab_header(title = "Myers-Briggs Dataframe") %>%
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = char_id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)
Myers-Briggs Dataframe
char_id char_name uni_id uni_name myers_briggs avg_match_perc number_users
F2 Monica Geller F Friends ESTJ 66.8 547
F2 Monica Geller F Friends ISTJ 63.3 1475
F2 Monica Geller F Friends ENTJ 63.1 2286
F2 Monica Geller F Friends ESFJ 62.8 592
F2 Monica Geller F Friends ENFJ 61.0 3842
F2 Monica Geller F Friends ISFJ 60.7 1602
myers_briggs.csv
variable type description
char_id varchar Character ID
char_name varchar Character Name
uni_id varchar Universe ID, e.g. GOT
uni_name varchar Universe Name, e.g. Game of Thrones
myers_briggs varchar Myers Briggs Type, e.g. ENFP
avg_match_perc num Percentage match
number_users int number of user respondents
Code
psych_statsTop %>% gt() %>%
tab_header(title = "Psych Evaluation Dataframe") %>%
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = char_id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)
Psych Evaluation Dataframe
char_id char_name uni_id uni_name question personality avg_rating rank rating_sd number_ratings
F2 Monica Geller F Friends messy/neat neat 95.7 9 11.7 1079
F2 Monica Geller F Friends disorganized/self-disciplined self-disciplined 95.2 27 11.2 1185
F2 Monica Geller F Friends diligent/lazy diligent 93.9 87 10.4 1166
F2 Monica Geller F Friends on-time/tardy on-time 93.8 34 14.3 236
F2 Monica Geller F Friends competitive/cooperative competitive 93.6 56 13.4 1168
F2 Monica Geller F Friends scheduled/spontaneous scheduled 93.4 23 14.5 1173
psych_stats.csv
variable type description
char_id varchar Character ID
char_name varchar Character Name
uni_id varchar Universe ID, e.g. GOT
uni_name varchar Universe Name, e.g. Game of Thrones
question varchar Personality Question - e.g. messy/neat
personality varchar Character Personality, e.g. neat
avg_rating num Score out of 100
rank int Rank
rating_sd num Rating Standard Deviation
number_ratings int Number of Ratings (Responses)

A brief description of the dataset

The complete psychometric dataset [1] is a combination of about 890 characters from 100 different universes across pop culture, media and entertainment.

The dataset “character.csv” consists of all these character’s names and IDs; the universes they are from (via a unique ID and name); their notability scores and; a link to a page that displays information about said characters. Additionally, there is a link to their picture to identify them.

The “psych_stats.csv” is made up of: characters names and IDs; their universe names and IDs; the juxtaposed personality traits (i.e. messy/neat or motivated/unmotivated). The next variable holds one of the two opposing traits: showing which one is more dominant. After this, three relevant numerical variables are present:

  • The average rating out of 100 of the dominant personality trait - based on survey responses

  • The rank for that personality trait, in comparison with all other characters

  • The standard deviation - calculated based on average rating

The “myers_briggs.csv” file consists of the character’s IDs and names as well as their universe names and IDs. It also displays a Myers-Briggs personality type then, that character’s percentage match with the aforementioned type.

To give context: each personality type is an ancronym of four parts of a personality - ESTJ stands for Extroverted, Observant, Thinking and Judging.

So for instance, Monica Geller has a 66.8% match wih the ESTJ type and 49.4% match with the INFP type. This implies that Monica is more extroverted than introverted.

The purpose of using this dataset is to validate a person’s Myers Briggs personality type by looking into their different character traits and determining how each trait contributes and affects a person’s behavior/personality accordingly.

Choosing this dataset

We chose this dataset because it would be more interesting to work on and visualize data that we can understand easily, are familiar with and can even relate with partially. Also, the topic on which this dataset is based is very engaging and we reckon it would attract more people to be interested in it. Since quite a number of fictional characters from different universes are used, it expands our dataset and adds more diversity to it, letting us make different types of plots to visualize the characters, their traits and the number of people that identify with them. Moreover, since fictional characters are not bound by real-world constraints, we have more freedom to explore unconventional or extreme personality traits (especially in our case) to use for data analysis in an engaging and accessible way. Using data based on fictional characters that represent personality types and personality traits is a unique way to portray what we have learnt about data visualization so far. Fusing our ‘after-study’ forms of entertainment with our academic subjects will make working on this project more interesting.

To exemplify the use of this dataset: many real-world people identify with the Game of Thrones character Jon Snow [See above image]. According to the personality analysis of Snow: he is more persistent than quitter; more diligent than lazy; more honourable than cunning; etc… Combining this with the information from the myers-briggs analysis might highlight a correlation between said data and his most dominant personality style: ISFJ. The real-world implication is that one who identifies with the same personality style, may then have statistical data to support the likelihood of being more diligent than lazy or; more persistent, etc…

Questions

  1. How do Myers-Briggs personality types distribute across different universes, and how does the average match percentage vary within each universe? Additionally, is there any correlation between character notability scores and their Myers-Briggs types within each universe?

  2. What is the frequency distribution of character personality traits across all characters, and how does it correlate with their average rating? Furthermore, do character notability scores vary significantly based on their personality traits?

Analysis plan

Question 1:

Introduction

Question one is primarily an exploration of the varying personality types that exist across cultural media. It looks to investigate the differences in character personalities and motivations across differing universes, potentially identifying correlations between genres or settings. Additionally, an attempt to uncover a relationship between popular opinion of characters and their in-universe personas will be made: are anti-heroes liked more? Are villains viewed more negatively than heroes? These questions will be answered providing information about the prevalence and celebration of particular personality types, which enhance our comprehension of audience engagement and storytelling dynamics.

In conjunction, analysing the distribution of Myers-Briggs personality types in fictional worlds offers important insights about the variety and growth of characters. We will learn more about consistency in portrayal by examining the relationship between the match percentages and the identified categories of the characters.

  • Variables:

    • ‘Myers-Briggs’ dataframe:

      • myers_briggs: Myers-Briggs Type

      • avg_match_perc: Average Match Percentage

    • ‘Characters’ dataframe:

      • uni_name: Universe Name

      • notability: Notability Score

  • Planning:

    • Using the common column ‘char_id’ between ‘Myers-Briggs’ dataframe and the ‘Characters’ dataframe, merge both the dataframes to get the ‘Myers-Briggs’ type and average match percentage for each character and drop all the unnecessary columns.

    • Group the data by ‘uni_name’ to analyze the distribution across different universes.

Discussion

External context and understanding of the genres and settings of the various universes will be brought in intuitively to aid understanding of character motivations and subsequent personality types. Following the visualisation of data, clear trends or a lack thereof will be available to educate us on whether personality types have an impact on audience engagement and/or likability of certain characters, or universes as a whole.

Question 2:

Introduction

Intriguing insights into the dynamics of character perception and storytelling impact can be gained by examining the frequency distribution of personality qualities in characters and how they relate to average ratings. One could explore whether a character that is commonly perceived as honourable or righteous has a resonance with the audience, or even a more fulfilling character arc. Tying this to the real-world it could be utilised to determine whether traits such as lazy correlate to a lesser perception of an individual from their peers.

Through exploring the characteristics that characters share the most and analyzing user acceptance, we may learn a great deal about audience involvement and story resonance. Furthermore, investigating whether personality traits have an impact on character notability scores deepens our understanding of character portrayal and how it affects the potency of storytelling. Once again, the real-world implications are varied, with positive character portrayals potentially having tangible influence on the actions of people in everyday walks of life.

  • Variables:

    • ‘Psychology Stats’ dataframe:

      • personality: Character Personality

      • avg_rating: Average Rating

    • ‘Characters’ dataframe:

      • notability: Notability Score
  • Planning:

    • Group the data by ‘personality’ to analyze the frequency distribution of character personality traits.

    • Merge the ‘Psychology Stats’ dataframe with the ‘Characters’ dataframe using the common column ‘char_id’ to get the average rating for each character.

Discussion

The data will provide insights into the interrelationships between differing personality traits. Would one’s honor counter their villainy? Does the kind-heartedness of a side character offset their evil actions? A number of relationships will be identified and scrutinized to conclude on how the traits alter public perception and if some hold more weight than others when deciding upon one’s opinions on a character - or person in general.

Potential Plots

  • Bar Charts and Histograms: these are the foundation, providing a solid groundwork for understanding the distribution of test scores or personality traits across different cohorts.

  • Scatter Plots: Scatter Plots helps to relate between different personality traits like plotting coordinates on a map it also explains the correlations and clusters within the data patterns

  • Radar Charts: Radar Charts Plots individual profiles across multiple dimensions of personality, scanning the horizon, these visualizations provide a comprehensive overview of strengths and weaknesses.

  • Box Plots: Visualizes these as distribution of test scores within specific categories or groups and examining the box plots and violin plots uncover outlines within the data.

  • Network Graphs: Picture network graphs provides interaction the connections between different personality types or traits between various elements of the dataset.

  • Heatmaps: These Heatmaps are used for thermal detecting patterns and correlations within vast datasets visualizing into similarities and differences across different groups or individuals.

  • Tree Maps: It explains the hierarchical organization within personality frameworks much like branches and leaves in a tree that shows the representation of the data.

These data visualisation methods are subject to change during the analysis of the data, some may prove futile and new ones may prove useful.

References:

  1. Title: Open Psychometrics

    Author: jonthegeek

    Date: 2022-08-16

    Source: tidytuesday

    Link: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-08-16

American Psychological Association. (n.d.). Definition of Psychometrics. American Psychological Association. https://dictionary.apa.org/psychometrics