Big Tech Stock Prices Visualization

Proposal

Data visualization
TidyTuesday
Author

Team Crimson - Arun, Varun, Jasdeep, Mohit, Pradnya

Installed Packages
### GETTING THE LIBRARIES
#devtools::install_github("choonghyunryu/dlookr")

if (!require(pacman))
  install.packages(pacman)

# Loading the libraries
pacman::p_load(tidyverse,
               tidytuesdayR,
               dlookr,
               formattable,
               here)

Dataset

The Big Tech Stock Prices dataset, sourced from the TidyTuesday project, provides a comprehensive overview of the daily stock prices and trading volumes of 14 prominent technology companies. The list of industry giants that are studied for data analysis are Apple, Adobe, Amazon.com, Salesforce, Cisco Systems, Alphabet, IBM, Intel, Meta, Microsoft, Netflix, NVIDIA, Oracle and TESLA.

Comprising two tables, namely big_tech_stock_prices.csv and big_tech_companies.csv, this dataset offers a ample amount of information for conducting in-depth analysis and exploring various aspects of the stock market. The big_tech_stock_prices.csv file contains detailed records of daily stock prices and trading volumes for each company, while the big_tech_companies.csv file provides additional information about the included companies, such as their names and stock symbols.

The Big Tech Stock Prices dataset offers a comprehensive and reliable source of data for conducting insightful analyses and making informed decisions in the financial domain related to these specific companies.

# Loading the data
big_tech_stock_prices <- read_csv(here("data","big_tech_stock_prices.csv"))
big_tech_companies <- read_csv(here("data","big_tech_companies.csv"))

Stock Prices Dataset

The dataset contains 14 big tech companies daily stock prices with a total of approx. 45088 observations. There is no missing data in the observation which helps to use all the columns available in the data set.

Stock Prices Dataset Code
# Analyzing the data based on unique values for each variable.
big_tech_stock_prices |>
  diagnose() |>
  dplyr::select(-missing_percent) |>
  formattable()
variables types missing_count unique_count unique_rate
stock_symbol character 0 14 0.0003105039
date Date 0 3287 0.0729018808
open numeric 0 29798 0.6608853797
high numeric 0 30056 0.6666075231
low numeric 0 30028 0.6659865153
close numeric 0 30965 0.6867680979
adj_close numeric 0 41279 0.9155207594
volume numeric 0 43550 0.9658889283

Variable description

variable class description
stock_symbol character stock_symbol
date double date
open double The price at market open.
high double The highest price for that day.
low double The lowest price for that day.
close double The price at market close, adjusted for splits.
adj_close double The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
volume double The number of shares traded on that day.

When we summarize the data to observe the range for start and end date of the stock data available, we could notice the data is available for 12 years except for Meta which starts from year 2012.

Stock Prices Date Range Code
# Examining the data to know the date range.
big_tech_stock_prices |> 
  group_by(stock_symbol) |> 
  summarize(min_date = min(date), max_date = max(date)) |> 
  formattable()
stock_symbol min_date max_date
AAPL 2010-01-04 2022-12-29
ADBE 2010-01-04 2022-12-29
AMZN 2010-01-04 2022-12-29
CRM 2010-01-04 2022-12-29
CSCO 2010-01-04 2022-12-29
GOOGL 2010-01-04 2022-12-29
IBM 2010-01-04 2022-12-29
INTC 2010-01-04 2022-12-29
META 2012-05-18 2023-01-24
MSFT 2010-01-04 2022-12-29
NFLX 2010-01-04 2022-12-29
NVDA 2010-01-04 2022-12-29
ORCL 2010-01-04 2022-12-29
TSLA 2010-06-30 2022-12-29

Big Tech Companies Dataset

Big Tech Companies dataset provides the company names against the stock symbols.

Big Tech Companies Dataset Code
# Viewing the data of big tech companies.
big_tech_companies |>
  formattable()
stock_symbol company
AAPL Apple Inc. 
ADBE Adobe Inc. 
AMZN Amazon.com, Inc. 
CRM Salesforce, Inc. 
CSCO Cisco Systems, Inc. 
GOOGL Alphabet Inc. 
IBM International Business Machines Corporation
INTC Intel Corporation
META Meta Platforms, Inc. 
MSFT Microsoft Corporation
NFLX Netflix, Inc. 
NVDA NVIDIA Corporation
ORCL Oracle Corporation
TSLA Tesla, Inc. 

Why did we choose this data set?

While stock prices represent a familiar dataset, our decision to focus on it stems from a meticulous evaluation of both technical and analytical considerations. Given that stock prices embody a convergence of market sentiment and company performance, delving into this dataset promises valuable insights. Analyzing the trends and drivers within it empowers investors to make well-informed decisions.

The dataset itself is well-structured, comprising two tables. One table provides essential information such as date, open, and close, while the other details the associated company names.

Spanning nearly a decade, this dataset offers a unique vantage point to observe trends from the post-financial crisis era to the third wave of the COVID-19 pandemic. Our objective is to create impactful visualizations for the top 14 big tech companies, aiming for a comprehensive understanding of how the tech industry has been perceived in the market during this significant timeframe.

In essence, our engagement with this dataset transcends mere technical considerations; it extends to encompass analytical aspects, enabling us to make well-informed decisions within the confines of the provided data.

Questions

The two questions to be answered are:

  1. Which companies have experienced the highest/lowest impact on their stock prices due to the pandemic?
  2. What patterns do we see in the positive gain days over the given period?

Analysis plan

We will be employing the following approaches to address the questions posed.

Approach for question 1

The inquiry involves examining the impact of the pandemic on stock prices within a specific time-frame. To achieve this, we focus on key features: “dates” (to identify the pandemic period) and “stock price” (utilizing adj_close). The “Pandemic Period” is defined as starting from the year 2020, enabling us to assess stock prices during this particular time-frame. Calculations include determining the average returns within this period, as well as 2- and 5-year returns, along with their respective peak prices. The impact is measured by the variance in stock prices, considering the returns over 2 years, and 5 years. This approach offers a comprehensive overview of the overall impact on a company’s stock price.

This visualization can be achieved using geom_line() , geom_bar() and annotations can be added using annotate(). Facets and colors will be employed to differentiate between the top big tech companies with the highest and lowest impact in terms of returns.

Note: Given that we only have access to stock prices, establishing causality regarding the impact of stock prices in relation to COVID-19 may pose challenges. So, instead we will be observing trends and returns.

Approach for question 2

To analyze gaindays in stock prices across different companies, we will create this variable using open and close values.

Utilizing this data, we plan to visualize weekdays using the weekdays() function from base R. Additionally, we will augment our dataset with an additional column for year and quarter to facilitate more granular analysis.

Our visualization strategy may involve using geom_density_ridges() or geom_col() to determine the most effective representation. To enhance clarity, facets and colors will be employed to distinguish between companies and periods of interest.

This approach is expected to yield valuable insights into stock price trends, allowing us to uncover patterns related to stock gain days.