IKEA Furniture

Linear model, Coefficient/TIE fighter plot, Boxplots, Log scale discussion, Calculating volume

Recorded on: 2020-11-02

Timestamps by: Eric Fletcher

## Screencast

## Timestamps

Use `fct_reorder`

from the `forcats`

package to reorder the factor levels for `category`

sorted along `n`

.

Brief explanation of why `scale_x_log10`

is needed given the distribution of `category`

and `price`

with `geom_boxplot`

.

Using `geom_jitter`

with `geom_boxplot`

to show how many items are within each `category`

.

Use `add_count`

from the `dplyr`

package and `glue`

from the `glue`

package to concatenate the `category`

name with `category_total`

on the `geom_boxplot`

y-axis.

Convert from `Saudi Riyals`

to `United States Dollars`

.

Create a `ridgeplot`

- AKA `joyplot`

- using `ggridges`

package showing the distribution of `price`

across `category`

.

Discussion on `distributions`

and when to use a `log scale`

.

Use `fct_lump`

from the `forcats`

package to lump together all the levels in `category`

except for the `n`

most frequent.

Use `scale_fill_discrete`

from the `ggplot2`

package with `guide = guide_legend(reverse = TRUE)`

to reverse the `fill legend`

.

Use `str_trim`

from the `stringr`

package to remove whitespace from the `short_description`

variable. David then decides to use `str_replace_all`

instead with the following regular expression `"\\s+", " "`

to replace all whitespace with a single space instead.

Use `separate`

from the `tidyr`

package with `extra = "merge"`

and `fill = "right"`

to separate item description from item dimension.

Use `extract`

from the `tidyr`

package with the regular expression `"[\\d\\-xX]+) cm"`

to extract the numbers before `cm`

.

Use `unite`

from the `tidyr`

package to paste together the `category`

and `main_description`

columns into a new column named `category_and_description`

.

Calculate the volume given the `depth`

, `height`

, and `width`

of each item in dataset in liters using `depth * height * width / 1000`

. At 36:15, David decides to change to `cubic meters`

instead using `depth * height * width / 1000000`

.

Use `str_squish`

from the `stringr`

package to remove whitespace from the start to the end of the `short_description`

variable.

Use `lm`

from the `stats`

package to create a linear model on a `log, log scale`

to predict the price of an item based on volume + category. David then uses `fct_relevel`

to reorder the factor levels for `category`

such that `tables & desks`

is first (starting point) since it's the most frequent item in the category variable and it's price distribution is in the middle.

Use the `broom`

package to turn the model output into a coefficient / TIE fighter plot.

Use `str_remove`

from the `stringr`

package to remove `category`

from the start of the strings on the y-axis using the regular expression `"^category"`

Summary of screencast.