IKEA Furniture

Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume

Published

November 2, 2020

Notable topics: Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume

Recorded on: 2020-11-02

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

fct_reorder
forcats

Use fct_reorder from the forcats package to reorder the factor levels for category sorted along n.

scale_x_log_10geom_boxplot
ggplot2

Brief explanation of why scale_x_log10 is needed given the distribution of category and price with geom_boxplot.

geom_jittergeom_boxplot
ggplot2

Using geom_jitter with geom_boxplot to show how many items are within each category.

glueadd_count
gluedplyr

Use add_count from the dplyr package and glue from the glue package to concatenate the category name with category_total on the geom_boxplot y-axis.

mutate
dplyr

Convert from Saudi Riyals to United States Dollars.

geom_density_ridges
ggridges

Create a ridgeplot - AKA joyplot - using ggridges package showing the distribution of price across category.

Discussion on distributions and when to use a log scale.

fct_lump
forcats

Use fct_lump from the forcats package to lump together all the levels in category except for the n most frequent.

scale_fill_discrete
ggplot2

Use scale_fill_discrete from the ggplot2 package with guide = guide_legend(reverse = TRUE) to reverse the fill legend.

str_trimstr_replace_all
stringr

Use str_trim from the stringr package to remove whitespace from the short_description variable. David then decides to use str_replace_all instead with the following regular expression "\\s+", " " to replace all whitespace with a single space instead.

separate
tidyr

Use separate from the tidyr package with extra = "merge" and fill = "right" to separate item description from item dimension.

extract
tidyr

Use extract from the tidyr package with the regular expression "[\\d\\-xX]+) cm" to extract the numbers before cm.

unite
tidyr

Use unite from the tidyr package to paste together the category and main_description columns into a new column named category_and_description.

mutate
dplyr

Calculate the volume given the depth, height, and width of each item in dataset in liters using depth * height * width / 1000. At 36:15, David decides to change to cubic meters instead using depth * height * width / 1000000.

str_squish
stringr

Use str_squish from the stringr package to remove whitespace from the start to the end of the short_description variable.

lm
stats

Use lm from the stats package to create a linear model on a log, log scale to predict the price of an item based on volume + category. David then uses fct_relevel to reorder the factor levels for category such that tables & desks is first (starting point) since it's the most frequent item in the category variable and it's price distribution is in the middle.

tidygeom_pointgeom_errorbarhgeom_vline
broom

Use the broom package to turn the model output into a coefficient / TIE fighter plot.

str_remove
stringr

Use str_remove from the stringr package to remove category from the start of the strings on the y-axis using the regular expression "^category"

Summary of screencast.