Cocktails

Pairwise correlation, Network diagram, Principal component analysis (PCA)

Notable topics: Pairwise correlation, Network diagram, Principal component analysis (PCA)

Recorded on: 2020-05-25

Timestamps by: Eric Fletcher

## Screencast

## Timestamps

Use `fct_reorder`

from the `forcats`

package to reorder the `ingredient`

factor levels along `n`

.

Use `fct_lump`

from the `forcats`

package to lump together all the levels except the `n`

most frequent in the `category`

and `ingredient`

variables.

Use `pairwise_cor`

from the `widyr`

package to find the correlation between the `ingredients`

.

Use `reorder_within`

from the `tidytext`

package with `scale_x_reordered`

to reorder the the columns in each `facet`

.

Use the `ggraph`

and `igraph`

packages to create a `network diagram`

Use `extract`

from the `tidyr`

package with `regex = (.*) oz`

to create a new variable `amount`

which doesn't include the `oz`

.

Use `extract`

with `regex`

to turn the strings in the new `amount`

variable into separate columns for the `ones`

, `numerator`

, and `denominator`

.

Use `replace_na`

from the `tidyr`

package to replace `NA`

with zeros in the `ones`

, `numberator`

, and `denominator`

columns. David ends up reaplcing the `zero`

in the `denominator`

column with ones in order for the calculation to work.

Use `geom_text_repel`

from the `ggrepel`

package to add `ingredient`

labels to the `geom_point`

plot.

Use `na_if`

from the `dplyr`

package to replace `zeros`

with `NA`

Use `scale_size_continuous`

with `labels = percent_format()`

to convert size legend values to percent.

Change the size of the points in the `network diagram`

proportional to `n`

using `vertices = ingredient_info`

within `graph_from_data_frame`

and `aes(size = n)`

within `geom_node_point`

.

Use `widely_svd`

from the `widyr`

package to perform principle component analysis on the `ingredients`

.

Use `paste0`

to concatenate `PC`

and `dimension`

in the facet panel titles.

Summary of screencast.