Great American Beer Festival

Log odds ratio, Logistic regression, TIE Fighter plot

Notable topics: Log odds ratio, Logistic regression, TIE Fighter plot

Recorded on: 2020-10-19

Timestamps by: Eric Fletcher

## Screencast

## Timestamps

Use `pivot_wider`

with `values_fill = list(value =0))`

from the `tidyr`

package along with `mutate(value = 1)`

to pivot the `medal`

variable from `long`

to `wide`

adding a 1 for the medal type awarded and 0 for the remaining medal types in the row.

Use `fct_lump`

from the `forcats`

package to lump together all the beers except for the N most frequent.

Use `str_to_upper`

from the `stringr`

package to convert the case of the `state`

variable to uppercase.

Use `fct_relevel`

from the the `forcats`

package in order to reorder the `medal`

factor levels.

Use `fct_reorder`

from the `forcats`

package to sort `beer_name`

factor levels by sorting along `n`

.

Use `glue`

from the `glue`

package to concatenate `beer_name`

and `brewery`

on the y-axis.

Use `ties.mthod = "first" `

within `fct_lump`

to show only the first `brewery`

when a tie exists between them.

Use `setdiff`

from the `dplyr`

package and the `state.abb`

built in vector from the `datasets`

package to check which states are missing from the dataset.

Use `summarize`

from the `dplyr`

package to calculate the `number of medals`

with `n_medals = n()`

, `number of beers`

with `n_distinct`

, `number of gold medals`

with `sum()`

, and `weighted medal totals`

using `sum(as.integer()`

because `medal`

is an ordered factor, so 1 for each bronze, 2 for each silver, and 3 for each gold.

Import `Craft Beers Dataset`

from `Kaggle`

using `read_csv`

from the `readr`

package.

Use `inner_join`

from the `dplyr`

package to join together the 2 datasets from `kaggle`

.

Use `semi_join`

from the `dplyr`

package to join together to see if the beer names match with the `kaggle`

dataset. Ends up at a dead end with not enough matches between the datasets.

Use `bind_log_odds`

from the `tidylo`

package to show the representation of each beer category for each state compared to the categories across the other states.

Use `complete`

from the `tidyr`

package in order to turn missing values into explicit missing values.

Use `reorder_within`

from the `tidytext`

package and `scale_y_reordered`

from the `tidytext`

package in order to reorder the bars within each facet panel.

Use `fct_reorder`

from the `forcats`

package to reorder the `facet panels`

in descending order.

For the previous plot, use `fill = log_odds_weighted > 0`

in the `ggplot`

`aes`

argument to highlight the positive and negative values.

Use `add_count`

from the `dplyr`

package to add a `year_total`

variable which shows the total awards for each year. Then use this to calculate the percent change in totals medals per state using `mutate(pct_year = n / year)`

Use `glm`

from the `stats`

package to create a `logistic regression`

model to find out if their is a statistical trend in the probability of award success over time.

Exapnd on the previous model by using the `broom`

package to fit multiple `logistic regressions`

across multiple states instead of doing it for an individual state at a time.

Use `conf.int = TRUE`

to add `confidence bounds`

to the `logistic regression`

output then use it to create a `TIE Fighter`

plot to show which states become more or less frequent medal winners over time.

Use the `state.name`

dataset with `match`

from `base r`

to change state abbreviation to the state name.

Summary of screencast.