Coffee Ratings

Ridgeline plot, Pairwise correlation, Network plot, Singular value decomposition, Linear model

Published

July 6, 2020

Notable topics: Ridgeline plot, Pairwise correlation, Network plot, Singular value decomposition, Linear model

Recorded on: 2020-07-06

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

countmutatefct_lump
forcats

Using fct_lump within count and then mutate to lump the variety of coffee together except for the most frequent

geom_boxplot
ggplot2

Create a geom_boxplot to visualize the variety and the distribution of total_cup_points

geom_histogram
ggplot2

Create a geom_histogram to visualize the variety and the distribution of total_cup_points

fct_reorder
fct_reorder

Using fct_reorder to reorder variety by sorting it along total_cup_points in ascending order

summarizeacross
dplyr

Using summarize with across to calculate the percent of missing data (NA) for each rating variable

geom_colfct_lump
ggplot2forcats

Create a bar chart using geom_col with fct_lump to visualize the frequency of top countries

pivot_longer
tidyr

Using pivot_longer to pivot the rating metrics for wide format to long format

geom_line
ggplot2

Create a geom_line chart to see if the sum of the rating categories equal to the total_cup_points column

geom_density_ridges
ggridges

Create a geom_density_ridges chart to show the distribution of ratings across each rating metric

summarize
dplyr

Using summarize with mean and sd to show the average rating per metric with its standard deviation

pairwise_cor
widyr

Using pairwise_cor to find correlations amongst the rating metrics

graph_from_data_frameggraphgeom_edge_linkgeom_node_pointgeom_node_text
ggraphigraph

Create a network plot to show the clustering of the rating metrics

widely_svdgeom_colreorder_withinscale_y_reordered
widyrtidytext

Using widely_svd to visualize the biggest source of variation with the rating metrics (Singular value decomposition)

geom_histogram
ggplot2

Create a geom_histogram to visualize the distribution of altitude

pmin
base

Using pmin to set a maximum numeric altitude value of 3000

geom_pointgeom_smooth
ggplot2

Create a geom-point chart to visualize the correlation between altitude and quality (total_cup_points)

summarizecor
dplyrstats

Using summarize with cor to show the correlation between altitude and each rating metric

lmgeom_pointtidymapgeom_errorbarh
statsbroompurrrggplot2

Create a linear model lm for each rating metric then visualize the results using a geom_line chart to show how each kilometer of altitude contributes to the score

Summary of screencast