Horror Movie Profits

Graphing for EDA (Exploratory Data Analysis)

Published

October 22, 2018

Notable topics: Graphing for EDA (Exploratory Data Analysis)

Recorded on: 2018-10-22

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

parse_date
lubridate

Using parse_date function from lubridate package to convert date formatted as character to date class (should have used mdy function though)

fct_lump

Using fct_lump function to aggregate distributors into top 6 (by number of movies) and and "Other" category

Investigating strange numbers in the data and discovering duplication

problems

Using problems function to look at parsing errors when importing data

arrangedistinct

Using arrange and distinct function and its .keep_all argument to de-duplicate observations

goem_boxplot

Using geom_boxplot function to create a boxplot of budget by distributor

floor

Using floor function to bin release years into decades (e.g., "1970" and "1973" both become "1970")

summarise_at

Using summarise_at function to apply the same function to multiple variables at the same time

geom_line

Using geom_line to visualize multiple metrics at the same time

facet_wrap

Using facet_wrap function to graph small multiples of genre-budget boxplots by distributor

Starting analysis of profit ratio of movies

paste0

Using paste0 function in a custom function to show labels of multiple (e.g., "4X" or "6X" to mean "4 times" or "6 times")

Starting analysis of the most common genres over time

Starting analysis of the most profitable individual horror movies

paste0

Using paste0 function to add release date of movie to labels in a bar graph

geom_text

Using geom_text function, along with its check_overlap argument, to add labels to some points on a scatterplot

ggplotly
plotly

Using ggplotly function from plotly package to create an interactive scatterplot

Reviewing unexplored areas of investigation