IKEA Furniture
Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume
Notable topics: Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume
Recorded on: 2020-11-02
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder from the forcats package to reorder the factor levels for category sorted along n.
Brief explanation of why scale_x_log10 is needed given the distribution of category and price with geom_boxplot.
Using geom_jitter with geom_boxplot to show how many items are within each category.
Use add_count from the dplyr package and glue from the glue package to concatenate the category name with category_total on the geom_boxplot y-axis.
Convert from Saudi Riyals to United States Dollars.
Create a ridgeplot - AKA joyplot - using ggridges package showing the distribution of price across category.
Discussion on distributions and when to use a log scale.
Use fct_lump from the forcats package to lump together all the levels in category except for the n most frequent.
Use scale_fill_discrete from the ggplot2 package with guide = guide_legend(reverse = TRUE) to reverse the fill legend.
Use str_trim from the stringr package to remove whitespace from the short_description variable. David then decides to use str_replace_all instead with the following regular expression "\\s+", " " to replace all whitespace with a single space instead.
Use separate from the tidyr package with extra = "merge" and fill = "right" to separate item description from item dimension.
Use extract from the tidyr package with the regular expression "[\\d\\-xX]+) cm" to extract the numbers before cm.
Use unite from the tidyr package to paste together the category and main_description columns into a new column named category_and_description.
Calculate the volume given the depth, height, and width of each item in dataset in liters using depth * height * width / 1000. At 36:15, David decides to change to cubic meters instead using depth * height * width / 1000000.
Use str_squish from the stringr package to remove whitespace from the start to the end of the short_description variable.
Use lm from the stats package to create a linear model on a log, log scale to predict the price of an item based on volume + category. David then uses fct_relevel to reorder the factor levels for category such that tables & desks is first (starting point) since it's the most frequent item in the category variable and it's price distribution is in the middle.
Use the broom package to turn the model output into a coefficient / TIE fighter plot.
Use str_remove from the stringr package to remove category from the start of the strings on the y-axis using the regular expression "^category"
Summary of screencast.