Cocktails
Pairwise correlation, Network diagram, Principal component analysis (PCA)
Notable topics: Pairwise correlation, Network diagram, Principal component analysis (PCA)
Recorded on: 2020-05-25
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder from the forcats package to reorder the ingredient factor levels along n.
Use fct_lump from the forcats package to lump together all the levels except the n most frequent in the category and ingredient variables.
Use pairwise_cor from the widyr package to find the correlation between the ingredients.
Use reorder_within from the tidytext package with scale_x_reordered to reorder the the columns in each facet.
Use the ggraph and igraph packages to create a network diagram
Use extract from the tidyr package with regex = (.*) oz to create a new variable amount which doesn't include the oz.
Use extract with regex to turn the strings in the new amount variable into separate columns for the ones, numerator, and denominator.
Use replace_na from the tidyr package to replace NA with zeros in the ones, numberator, and denominator columns. David ends up reaplcing the zero in the denominator column with ones in order for the calculation to work.
Use geom_text_repel from the ggrepel package to add ingredient labels to the geom_point plot.
Use na_if from the dplyr package to replace zeros with NA
Use scale_size_continuous with labels = percent_format() to convert size legend values to percent.
Change the size of the points in the network diagram proportional to n using vertices = ingredient_info within graph_from_data_frame and aes(size = n) within geom_node_point.
Use widely_svd from the widyr package to perform principle component analysis on the ingredients.
Use paste0 to concatenate PC and dimension in the facet panel titles.
Summary of screencast.