Big Tech Stock Prices Visualization

Proposal

Data visualization

TidyTuesday

Author

Team Crimson - Arun, Varun, Jasdeep, Mohit, Pradnya

Installed Packages

### GETTING THE LIBRARIES
#devtools::install_github("choonghyunryu/dlookr")

if (!require(pacman))
  install.packages(pacman)

# Loading the libraries
pacman::p_load(tidyverse,
               tidytuesdayR,
               dlookr,
               formattable,
               here)

Dataset

The Big Tech Stock Prices dataset, sourced from the TidyTuesday project, provides a comprehensive overview of the daily stock prices and trading volumes of 14 prominent technology companies. The list of industry giants that are studied for data analysis are Apple, Adobe, Amazon.com, Salesforce, Cisco Systems, Alphabet, IBM, Intel, Meta, Microsoft, Netflix, NVIDIA, Oracle and TESLA.

Comprising two tables, namely big_tech_stock_prices.csv and big_tech_companies.csv, this dataset offers a ample amount of information for conducting in-depth analysis and exploring various aspects of the stock market. The big_tech_stock_prices.csv file contains detailed records of daily stock prices and trading volumes for each company, while the big_tech_companies.csv file provides additional information about the included companies, such as their names and stock symbols.

The Big Tech Stock Prices dataset offers a comprehensive and reliable source of data for conducting insightful analyses and making informed decisions in the financial domain related to these specific companies.

# Loading the data
big_tech_stock_prices <- read_csv(here("data","big_tech_stock_prices.csv"))
big_tech_companies <- read_csv(here("data","big_tech_companies.csv"))

Stock Prices Dataset

The dataset contains 14 big tech companies daily stock prices with a total of approx. 45088 observations. There is no missing data in the observation which helps to use all the columns available in the data set.

Stock Prices Dataset Code

# Analyzing the data based on unique values for each variable.
big_tech_stock_prices |>
  diagnose() |>
  dplyr::select(-missing_percent) |>
  formattable()

variables	types	unique_count	unique_rate
stock_symbol	character	14	0.0003105039
date	Date	3287	0.0729018808
open	numeric	29798	0.6608853797
high	numeric	30056	0.6666075231
low	numeric	30028	0.6659865153
close	numeric	30965	0.6867680979
adj_close	numeric	41279	0.9155207594
volume	numeric	43550	0.9658889283

Variable description

variable	class	description
`stock_symbol`	character	stock_symbol
`date`	double	date
`open`	double	The price at market open.
`high`	double	The highest price for that day.
`low`	double	The lowest price for that day.
`close`	double	The price at market close, adjusted for splits.
`adj_close`	double	The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
`volume`	double	The number of shares traded on that day.

When we summarize the data to observe the range for start and end date of the stock data available, we could notice the data is available for 12 years except for Meta which starts from year 2012.

Stock Prices Date Range Code

# Examining the data to know the date range.
big_tech_stock_prices |> 
  group_by(stock_symbol) |> 
  summarize(min_date = min(date), max_date = max(date)) |> 
  formattable()

stock_symbol	min_date	max_date
AAPL	2010-01-04	2022-12-29
ADBE	2010-01-04	2022-12-29
AMZN	2010-01-04	2022-12-29
CRM	2010-01-04	2022-12-29
CSCO	2010-01-04	2022-12-29
GOOGL	2010-01-04	2022-12-29
IBM	2010-01-04	2022-12-29
INTC	2010-01-04	2022-12-29
META	2012-05-18	2023-01-24
MSFT	2010-01-04	2022-12-29
NFLX	2010-01-04	2022-12-29
NVDA	2010-01-04	2022-12-29
ORCL	2010-01-04	2022-12-29
TSLA	2010-06-30	2022-12-29

Big Tech Companies Dataset

Big Tech Companies dataset provides the company names against the stock symbols.

Big Tech Companies Dataset Code

# Viewing the data of big tech companies.
big_tech_companies |>
  formattable()

stock_symbol	company
AAPL	Apple Inc.
ADBE	Adobe Inc.
AMZN	Amazon.com, Inc.
CRM	Salesforce, Inc.
CSCO	Cisco Systems, Inc.
GOOGL	Alphabet Inc.
IBM	International Business Machines Corporation
INTC	Intel Corporation
META	Meta Platforms, Inc.
MSFT	Microsoft Corporation
NFLX	Netflix, Inc.
NVDA	NVIDIA Corporation
ORCL	Oracle Corporation
TSLA	Tesla, Inc.

Why did we choose this data set?

While stock prices represent a familiar dataset, our decision to focus on it stems from a meticulous evaluation of both technical and analytical considerations. Given that stock prices embody a convergence of market sentiment and company performance, delving into this dataset promises valuable insights. Analyzing the trends and drivers within it empowers investors to make well-informed decisions.

The dataset itself is well-structured, comprising two tables. One table provides essential information such as date, open, and close, while the other details the associated company names.

Spanning nearly a decade, this dataset offers a unique vantage point to observe trends from the post-financial crisis era to the third wave of the COVID-19 pandemic. Our objective is to create impactful visualizations for the top 14 big tech companies, aiming for a comprehensive understanding of how the tech industry has been perceived in the market during this significant timeframe.

In essence, our engagement with this dataset transcends mere technical considerations; it extends to encompass analytical aspects, enabling us to make well-informed decisions within the confines of the provided data.

Questions

The two questions to be answered are:

Which companies have experienced the highest/lowest impact on their stock prices due to the pandemic?
What patterns do we see in the positive gain days over the given period?

Analysis plan

We will be employing the following approaches to address the questions posed.

Approach for question 1

The inquiry involves examining the impact of the pandemic on stock prices within a specific time-frame. To achieve this, we focus on key features: “dates” (to identify the pandemic period) and “stock price” (utilizing adj_close). The “Pandemic Period” is defined as starting from the year 2020, enabling us to assess stock prices during this particular time-frame. Calculations include determining the average returns within this period, as well as 2- and 5-year returns, along with their respective peak prices. The impact is measured by the variance in stock prices, considering the returns over 2 years, and 5 years. This approach offers a comprehensive overview of the overall impact on a company’s stock price.

This visualization can be achieved using geom_line() , geom_bar() and annotations can be added using annotate(). Facets and colors will be employed to differentiate between the top big tech companies with the highest and lowest impact in terms of returns.

Note: Given that we only have access to stock prices, establishing causality regarding the impact of stock prices in relation to COVID-19 may pose challenges. So, instead we will be observing trends and returns.

Approach for question 2

To analyze gaindays in stock prices across different companies, we will create this variable using open and close values.

Utilizing this data, we plan to visualize weekdays using the weekdays() function from base R. Additionally, we will augment our dataset with an additional column for year and quarter to facilitate more granular analysis.

Our visualization strategy may involve using geom_density_ridges() or geom_col() to determine the most effective representation. To enhance clarity, facets and colors will be employed to distinguish between companies and periods of interest.

This approach is expected to yield valuable insights into stock price trends, allowing us to uncover patterns related to stock gain days.

--- title: "Big Tech Stock Prices Visualization" subtitle: "Proposal" author: - name: "Team Crimson - Arun, Varun, Jasdeep, Mohit, Pradnya" format: html: embed-resources: true code-line-numbers: true code-overflow: wrap code-tools: true editor: visual code-annotations: hover categories: - Data visualization - TidyTuesday --- ```{r} #| label: load-pkgs #| message: false #| output: false #| warning: false #| code-fold: true #| code-summary: "Installed Packages" ### GETTING THE LIBRARIES #devtools::install_github("choonghyunryu/dlookr") if (!require(pacman)) install.packages(pacman) # Loading the libraries pacman::p_load(tidyverse, tidytuesdayR, dlookr, formattable, here) ``` ## Dataset The [Big Tech Stock Prices](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023/2023-02-07) dataset, sourced from the TidyTuesday project, provides a comprehensive overview of the daily stock prices and trading volumes of 14 prominent technology companies. The list of industry giants that are studied for data analysis are Apple, Adobe, Amazon.com, Salesforce, Cisco Systems, Alphabet, IBM, Intel, Meta, Microsoft, Netflix, NVIDIA, Oracle and TESLA. Comprising two tables, namely `big_tech_stock_prices.csv` and `big_tech_companies.csv`, this dataset offers a ample amount of information for conducting in-depth analysis and exploring various aspects of the stock market. The `big_tech_stock_prices.csv` file contains detailed records of daily stock prices and trading volumes for each company, while the `big_tech_companies.csv` file provides additional information about the included companies, such as their names and stock symbols. The Big Tech Stock Prices dataset offers a comprehensive and reliable source of data for conducting insightful analyses and making informed decisions in the financial domain related to these specific companies. ```{r} #| label: load-dataset #| message: false #| output: false #| warning: false # Loading the data big_tech_stock_prices <- read_csv(here("data","big_tech_stock_prices.csv")) big_tech_companies <- read_csv(here("data","big_tech_companies.csv")) ``` ### Stock Prices Dataset The dataset contains 14 big tech companies daily stock prices with a total of approx. 45088 observations. There is no missing data in the observation which helps to use all the columns available in the data set. ```{r} #| label: stock-prices-dataset-info #| message: false #| warning: false #| code-fold: true #| code-summary: "Stock Prices Dataset Code" # Analyzing the data based on unique values for each variable. big_tech_stock_prices |> diagnose() |> dplyr::select(-missing_percent) |> formattable() ``` [*Variable description*]{.underline} | variable | class | description | |---------------|---------------|------------------------------------------| | `stock_symbol` | character | stock_symbol | | `date` | double | date | | `open` | double | The price at market open. | | `high` | double | The highest price for that day. | | `low` | double | The lowest price for that day. | | `close` | double | The price at market close, adjusted for splits. | | `adj_close` | double | The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards. | | `volume` | double | The number of shares traded on that day. | When we summarize the data to observe the range for start and end date of the stock data available, we could notice the data is available for 12 years except for Meta which starts from year 2012. ```{r} #| label: stock-prices-dataset-daterange #| message: false #| warning: false #| code-fold: true #| code-summary: "Stock Prices Date Range Code" # Examining the data to know the date range. big_tech_stock_prices |> group_by(stock_symbol) |> summarize(min_date = min(date), max_date = max(date)) |> formattable() ``` ### Big Tech Companies Dataset Big Tech Companies dataset provides the company names against the stock symbols. ```{r} #| label: big-tech-companies-info #| message: false #| code-fold: true #| code-summary: "Big Tech Companies Dataset Code" # Viewing the data of big tech companies. big_tech_companies |> formattable() ``` ### Why did we choose this data set? While stock prices represent a familiar dataset, our decision to focus on it stems from a meticulous evaluation of both technical and analytical considerations. Given that stock prices embody a convergence of market sentiment and company performance, delving into this dataset promises valuable insights. Analyzing the trends and drivers within it empowers investors to make well-informed decisions. The dataset itself is well-structured, comprising two tables. One table provides essential information such as `date`, `open`, and `close`, while the other details the associated `company` names. Spanning nearly a decade, this dataset offers a unique vantage point to observe trends from the post-financial crisis era to the third wave of the COVID-19 pandemic. Our objective is to create impactful visualizations for the top 14 big tech companies, aiming for a comprehensive understanding of how the tech industry has been perceived in the market during this significant timeframe. In essence, our engagement with this dataset transcends mere technical considerations; it extends to encompass analytical aspects, enabling us to make well-informed decisions within the confines of the provided data. ## Questions The two questions to be answered are: 1. Which companies have experienced the highest/lowest impact on their stock prices due to the pandemic? 2. What patterns do we see in the positive gain days over the given period? ## Analysis plan We will be employing the following approaches to address the questions posed. ### Approach for question 1 The inquiry involves examining the impact of the pandemic on stock prices within a specific time-frame. To achieve this, we focus on key features: "dates" (to identify the pandemic period) and "stock price" (utilizing `adj_close`). The "Pandemic Period" is defined as starting from the year 2020, enabling us to assess stock prices during this particular time-frame. Calculations include determining the average returns within this period, as well as 2- and 5-year returns, along with their respective peak prices. The impact is measured by the variance in stock prices, considering the returns over 2 years, and 5 years. This approach offers a comprehensive overview of the overall impact on a company's stock price. This visualization can be achieved using `geom_line()` , `geom_bar()` and annotations can be added using annotate`()`. Facets and colors will be employed to differentiate between the top big tech companies with the highest and lowest impact in terms of returns. *Note: Given that we only have access to stock prices, establishing causality regarding the impact of stock prices in relation to COVID-19 may pose challenges. So, instead we will be observing trends and returns.* ### Approach for question 2 To analyze `gaindays` in stock prices across different companies, we will create this variable using `open` and `close` values. Utilizing this data, we plan to visualize weekdays using the `weekdays()` function from base R. Additionally, we will augment our dataset with an additional column for `year` and `quarter` to facilitate more granular analysis. Our visualization strategy may involve using `geom_density_ridges()` or `geom_col()` to determine the most effective representation. To enhance clarity, facets and colors will be employed to distinguish between companies and periods of interest. This approach is expected to yield valuable insights into stock price trends, allowing us to uncover patterns related to stock gain days.