Psychometric Analysis

Proposal

Author

Byte Coders

Code

if (!require("pacman")) 
  install.packages("pacman")

# use this line for installing/loading
pacman::p_load(devtools) 

pacman::p_load(tidyverse,
           openintro,
           gtable,
           ggrepel,
           patchwork,
           units,
           readr,
           gt)

Introduction

This project leverages data collected by the Open-Source Pyschometrics Project to reveal the relationship between popular culture and psychology. Through non-orthodox data analysis methods, 890 characters from 100 different universes could be compared and contrasted for their personalities. Each fictional universe denotes a different tv show or movie with popular characters within. While the characters used are fictional, the methods produced by this project will be re-usable and, in theory, applicable to collections of real-world people.

Dataset

Code

characters <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/characters.csv')

myers_briggs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/myers_briggs.csv')

psych_stats <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-16/psych_stats.csv')

The three datasets can be located on their host pages:

Code

charactersTop <- head(characters)
myers_briggsTop <- head(myers_briggs)
psych_statsTop <- head(psych_stats)



charactersTop %>%gt() %>%
tab_header(title = "Characters Dataframe") %>%
  
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)

Characters Dataframe
id	name	uni_id	uni_name	notability	link	image_link
F2	Monica Geller	F	Friends	79.7	https://openpsychometrics.org/tests/characters/stats/F/2	https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg
F1	Rachel Green	F	Friends	76.7	https://openpsychometrics.org/tests/characters/stats/F/1	https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg
F5	Chandler Bing	F	Friends	74.4	https://openpsychometrics.org/tests/characters/stats/F/5	https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg
F4	Joey Tribbiani	F	Friends	74.3	https://openpsychometrics.org/tests/characters/stats/F/4	https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg
F3	Phoebe Buffay	F	Friends	72.6	https://openpsychometrics.org/tests/characters/stats/F/3	https://openpsychometrics.org/tests/characters/test-resources/pics/F/3.jpg
F6	Ross Geller	F	Friends	51.6	https://openpsychometrics.org/tests/characters/stats/F/6	https://openpsychometrics.org/tests/characters/test-resources/pics/F/6.jpg

characters.csv
variable	type	description
id	varchar	Character ID
name	varchar	Character Name
uni_id	varchar	Universe ID, e.g. GOT
uni_name	varchar	Universe Name, e.g. Game of Thrones
notability	num	Notability Score
link	varchar	Link to Character Page
image_link	varchar	Link to Character Image

Code

myers_briggsTop %>% gt() %>%
tab_header(title = "Myers-Briggs Dataframe") %>%
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = char_id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)

Myers-Briggs Dataframe
char_id	char_name	uni_id	uni_name	myers_briggs	avg_match_perc	number_users
F2	Monica Geller	F	Friends	ESTJ	66.8	547
F2	Monica Geller	F	Friends	ISTJ	63.3	1475
F2	Monica Geller	F	Friends	ENTJ	63.1	2286
F2	Monica Geller	F	Friends	ESFJ	62.8	592
F2	Monica Geller	F	Friends	ENFJ	61.0	3842
F2	Monica Geller	F	Friends	ISFJ	60.7	1602

myers_briggs.csv
variable	type	description
char_id	varchar	Character ID
char_name	varchar	Character Name
uni_id	varchar	Universe ID, e.g. GOT
uni_name	varchar	Universe Name, e.g. Game of Thrones
myers_briggs	varchar	Myers Briggs Type, e.g. ENFP
avg_match_perc	num	Percentage match
number_users	int	number of user respondents

Code

psych_statsTop %>% gt() %>%
tab_header(title = "Psych Evaluation Dataframe") %>%
tab_style(
  style = list(cell_fill(color = "#b2f7ef"),
  cell_text(weight = "bold")),
locations = cells_body(columns = char_id)) %>% 
  
tab_style(
  style = cell_text(weight = "bold"),
  locations = cells_column_labels()
)

Psych Evaluation Dataframe
char_id	char_name	uni_id	uni_name	question	personality	avg_rating	rank	rating_sd	number_ratings
F2	Monica Geller	F	Friends	messy/neat	neat	95.7	9	11.7	1079
F2	Monica Geller	F	Friends	disorganized/self-disciplined	self-disciplined	95.2	27	11.2	1185
F2	Monica Geller	F	Friends	diligent/lazy	diligent	93.9	87	10.4	1166
F2	Monica Geller	F	Friends	on-time/tardy	on-time	93.8	34	14.3	236
F2	Monica Geller	F	Friends	competitive/cooperative	competitive	93.6	56	13.4	1168
F2	Monica Geller	F	Friends	scheduled/spontaneous	scheduled	93.4	23	14.5	1173

psych_stats.csv
variable	type	description
char_id	varchar	Character ID
char_name	varchar	Character Name
uni_id	varchar	Universe ID, e.g. GOT
uni_name	varchar	Universe Name, e.g. Game of Thrones
question	varchar	Personality Question - e.g. messy/neat
personality	varchar	Character Personality, e.g. neat
avg_rating	num	Score out of 100
rank	int	Rank
rating_sd	num	Rating Standard Deviation
number_ratings	int	Number of Ratings (Responses)

A brief description of the dataset

The complete psychometric dataset [1] is a combination of about 890 characters from 100 different universes across pop culture, media and entertainment.

The dataset “character.csv” consists of all these character’s names and IDs; the universes they are from (via a unique ID and name); their notability scores and; a link to a page that displays information about said characters. Additionally, there is a link to their picture to identify them.

The “psych_stats.csv” is made up of: characters names and IDs; their universe names and IDs; the juxtaposed personality traits (i.e. messy/neat or motivated/unmotivated). The next variable holds one of the two opposing traits: showing which one is more dominant. After this, three relevant numerical variables are present:

The average rating out of 100 of the dominant personality trait - based on survey responses
The rank for that personality trait, in comparison with all other characters
The standard deviation - calculated based on average rating

The “myers_briggs.csv” file consists of the character’s IDs and names as well as their universe names and IDs. It also displays a Myers-Briggs personality type then, that character’s percentage match with the aforementioned type.

To give context: each personality type is an ancronym of four parts of a personality - ESTJ stands for Extroverted, Observant, Thinking and Judging.

So for instance, Monica Geller has a 66.8% match wih the ESTJ type and 49.4% match with the INFP type. This implies that Monica is more extroverted than introverted.

The purpose of using this dataset is to validate a person’s Myers Briggs personality type by looking into their different character traits and determining how each trait contributes and affects a person’s behavior/personality accordingly.

Choosing this dataset

We chose this dataset because it would be more interesting to work on and visualize data that we can understand easily, are familiar with and can even relate with partially. Also, the topic on which this dataset is based is very engaging and we reckon it would attract more people to be interested in it. Since quite a number of fictional characters from different universes are used, it expands our dataset and adds more diversity to it, letting us make different types of plots to visualize the characters, their traits and the number of people that identify with them. Moreover, since fictional characters are not bound by real-world constraints, we have more freedom to explore unconventional or extreme personality traits (especially in our case) to use for data analysis in an engaging and accessible way. Using data based on fictional characters that represent personality types and personality traits is a unique way to portray what we have learnt about data visualization so far. Fusing our ‘after-study’ forms of entertainment with our academic subjects will make working on this project more interesting.

To exemplify the use of this dataset: many real-world people identify with the Game of Thrones character Jon Snow [See above image]. According to the personality analysis of Snow: he is more persistent than quitter; more diligent than lazy; more honourable than cunning; etc… Combining this with the information from the myers-briggs analysis might highlight a correlation between said data and his most dominant personality style: ISFJ. The real-world implication is that one who identifies with the same personality style, may then have statistical data to support the likelihood of being more diligent than lazy or; more persistent, etc…

Questions

How do Myers-Briggs personality types distribute across different universes, and how does the average match percentage vary within each universe? Additionally, is there any correlation between character notability scores and their Myers-Briggs types within each universe?
What is the frequency distribution of character personality traits across all characters, and how does it correlate with their average rating? Furthermore, do character notability scores vary significantly based on their personality traits?

Analysis plan

Question 1:

Introduction

Question one is primarily an exploration of the varying personality types that exist across cultural media. It looks to investigate the differences in character personalities and motivations across differing universes, potentially identifying correlations between genres or settings. Additionally, an attempt to uncover a relationship between popular opinion of characters and their in-universe personas will be made: are anti-heroes liked more? Are villains viewed more negatively than heroes? These questions will be answered providing information about the prevalence and celebration of particular personality types, which enhance our comprehension of audience engagement and storytelling dynamics.

In conjunction, analysing the distribution of Myers-Briggs personality types in fictional worlds offers important insights about the variety and growth of characters. We will learn more about consistency in portrayal by examining the relationship between the match percentages and the identified categories of the characters.

Variables:
- ‘Myers-Briggs’ dataframe:
  - myers_briggs: Myers-Briggs Type
  - avg_match_perc: Average Match Percentage
- ‘Characters’ dataframe:
  - uni_name: Universe Name
  - notability: Notability Score
Planning:
- Using the common column ‘char_id’ between ‘Myers-Briggs’ dataframe and the ‘Characters’ dataframe, merge both the dataframes to get the ‘Myers-Briggs’ type and average match percentage for each character and drop all the unnecessary columns.
- Group the data by ‘uni_name’ to analyze the distribution across different universes.

Discussion

External context and understanding of the genres and settings of the various universes will be brought in intuitively to aid understanding of character motivations and subsequent personality types. Following the visualisation of data, clear trends or a lack thereof will be available to educate us on whether personality types have an impact on audience engagement and/or likability of certain characters, or universes as a whole.

Question 2:

Introduction

Intriguing insights into the dynamics of character perception and storytelling impact can be gained by examining the frequency distribution of personality qualities in characters and how they relate to average ratings. One could explore whether a character that is commonly perceived as honourable or righteous has a resonance with the audience, or even a more fulfilling character arc. Tying this to the real-world it could be utilised to determine whether traits such as lazy correlate to a lesser perception of an individual from their peers.

Through exploring the characteristics that characters share the most and analyzing user acceptance, we may learn a great deal about audience involvement and story resonance. Furthermore, investigating whether personality traits have an impact on character notability scores deepens our understanding of character portrayal and how it affects the potency of storytelling. Once again, the real-world implications are varied, with positive character portrayals potentially having tangible influence on the actions of people in everyday walks of life.

Variables:
- ‘Psychology Stats’ dataframe:
  - personality: Character Personality
  - avg_rating: Average Rating
- ‘Characters’ dataframe:
  - notability: Notability Score
Planning:
- Group the data by ‘personality’ to analyze the frequency distribution of character personality traits.
- Merge the ‘Psychology Stats’ dataframe with the ‘Characters’ dataframe using the common column ‘char_id’ to get the average rating for each character.

Discussion

The data will provide insights into the interrelationships between differing personality traits. Would one’s honor counter their villainy? Does the kind-heartedness of a side character offset their evil actions? A number of relationships will be identified and scrutinized to conclude on how the traits alter public perception and if some hold more weight than others when deciding upon one’s opinions on a character - or person in general.

Potential Plots

Bar Charts and Histograms: these are the foundation, providing a solid groundwork for understanding the distribution of test scores or personality traits across different cohorts.
Scatter Plots: Scatter Plots helps to relate between different personality traits like plotting coordinates on a map it also explains the correlations and clusters within the data patterns
Radar Charts: Radar Charts Plots individual profiles across multiple dimensions of personality, scanning the horizon, these visualizations provide a comprehensive overview of strengths and weaknesses.
Box Plots: Visualizes these as distribution of test scores within specific categories or groups and examining the box plots and violin plots uncover outlines within the data.
Network Graphs: Picture network graphs provides interaction the connections between different personality types or traits between various elements of the dataset.
Heatmaps: These Heatmaps are used for thermal detecting patterns and correlations within vast datasets visualizing into similarities and differences across different groups or individuals.
Tree Maps: It explains the hierarchical organization within personality frameworks much like branches and leaves in a tree that shows the representation of the data.

These data visualisation methods are subject to change during the analysis of the data, some may prove futile and new ones may prove useful.

References:

Title: Open Psychometrics

Author: jonthegeek

Date: 2022-08-16

Source: tidytuesday

Link: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-08-16

American Psychological Association. (n.d.). Definition of Psychometrics. American Psychological Association. https://dictionary.apa.org/psychometrics