NCAA Women’s Basketball

Heatmap, Correlation analysis

Notable topics: Heatmap, Correlation analysis

Recorded on: 2020-10-05

Timestamps by: Eric Fletcher

## Screencast

## Timestamps

Use `fct_relevel`

from the `forcats`

package to order the factor levels for the `tourney_finish`

variable.

Use `geom_tile`

from the `ggplot2`

package to create a `heatmap`

to show how far a particular seed ends up going in the tournament.

Use `scale_y_continuous`

from the `ggplot2`

package with `breaks = seq(1, 16)`

in order to include all 16 seeds.

Use `geom_text`

from the `ggplot2`

package with `label = percent(pct)`

to apply the percentage to each tile in the heatmap.

Use `scale_x_discrete`

and `scale_y_continuous`

both with `expand = c(0, 0)`

to remove the space between the x and y axis and the heatmap tiles. David calls this flattening.

Use `scale_y_reverse`

to flip the order of the y-axis from 1-16 to 16-1.

Use `cor`

from the `stats`

package to calculate the `correlation`

between `seed`

and `tourney_finish`

. Then plotted to determine if there is a correlation over time.

Use `geom_smooth`

with `method = "loess"`

to add a smoothing line with confidence bound to aid in seeing the trend between `seed`

and `reg_percent`

.

Use `fct_lump`

from the `forcats`

package to lump together all the conference except for the `n`

most frequent.

Use `geom_jitter`

from the `ggplot2`

package instead of `geom_boxplot`

to avoid overplotting which makes it easier to visualize the points that make up the distribution of the `seed`

variable.

Use `geom_smooth`

with `method = "lm"`

to aid in seeing the trend between `reg_percent`

and `tourney_w`

.

Create a `dot pipe function`

using `.`

and `%>%`

to avoid duplicating summary statistics with `summarize`

.

Use `glue`

from the `glue`

package to concatenate together `school`

and `n_entries`

on the `geo_col`

y-axis.

Summary of screencast.