NCAA Women’s Basketball

Heatmap, Correlation analysis

Published

October 5, 2020

Notable topics: Heatmap, Correlation analysis

Recorded on: 2020-10-05

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

fct_relevel
forcats

Use fct_relevel from the forcats package to order the factor levels for the tourney_finish variable.

geom_tilescale_fill_gradient2
ggplot2

Use geom_tile from the ggplot2 package to create a heatmap to show how far a particular seed ends up going in the tournament.

scale_y_continuous
ggplot2

Use scale_y_continuous from the ggplot2 package with breaks = seq(1, 16) in order to include all 16 seeds.

geom_textscales
ggplot2

Use geom_text from the ggplot2 package with label = percent(pct) to apply the percentage to each tile in the heatmap.

scale_x_discretescale_y_continuous
ggplot2

Use scale_x_discrete and scale_y_continuous both with expand = c(0, 0) to remove the space between the x and y axis and the heatmap tiles. David calls this flattening.

scale_y_reverse
ggplot2

Use scale_y_reverse to flip the order of the y-axis from 1-16 to 16-1.

corgeom_line
statsggplot2

Use cor from the stats package to calculate the correlation between seed and tourney_finish. Then plotted to determine if there is a correlation over time.

geom_smooth
ggplot2

Use geom_smooth with method = "loess" to add a smoothing line with confidence bound to aid in seeing the trend between seed and reg_percent.

fct_lump
forcats

Use fct_lump from the forcats package to lump together all the conference except for the n most frequent.

geom_jitter
ggplot2

Use geom_jitter from the ggplot2 package instead of geom_boxplot to avoid overplotting which makes it easier to visualize the points that make up the distribution of the seed variable.

geom_smooth
ggplot2

Use geom_smooth with method = "lm" to aid in seeing the trend between reg_percent and tourney_w.

.%>%

Create a dot pipe function using . and %>% to avoid duplicating summary statistics with summarize.

glue
glue

Use glue from the glue package to concatenate together school and n_entries on the geo_col y-axis.

Summary of screencast.