Great American Beer Festival
Log odds ratio, Logistic regression, TIE Fighter plot
Notable topics: Log odds ratio, Logistic regression, TIE Fighter plot
Recorded on: 2020-10-19
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use pivot_wider with values_fill = list(value =0)) from the tidyr package along with mutate(value = 1) to pivot the medal variable from long to wide adding a 1 for the medal type awarded and 0 for the remaining medal types in the row.
Use fct_lump from the forcats package to lump together all the beers except for the N most frequent.
Use str_to_upper from the stringr package to convert the case of the state variable to uppercase.
Use fct_relevel from the the forcats package in order to reorder the medal factor levels.
Use fct_reorder from the forcats package to sort beer_name factor levels by sorting along n.
Use glue from the glue package to concatenate beer_name and brewery on the y-axis.
Use ties.mthod = "first" within fct_lump to show only the first brewery when a tie exists between them.
Use setdiff from the dplyr package and the state.abb built in vector from the datasets package to check which states are missing from the dataset.
Use summarize from the dplyr package to calculate the number of medals with n_medals = n(), number of beers with n_distinct, number of gold medals with sum(), and weighted medal totals using sum(as.integer() because medal is an ordered factor, so 1 for each bronze, 2 for each silver, and 3 for each gold.
Import Craft Beers Dataset from Kaggle using read_csv from the readr package.
Use inner_join from the dplyr package to join together the 2 datasets from kaggle.
Use semi_join from the dplyr package to join together to see if the beer names match with the kaggle dataset. Ends up at a dead end with not enough matches between the datasets.
Use bind_log_odds from the tidylo package to show the representation of each beer category for each state compared to the categories across the other states.
Use complete from the tidyr package in order to turn missing values into explicit missing values.
Use reorder_within from the tidytext package and scale_y_reordered from the tidytext package in order to reorder the bars within each facet panel.
Use fct_reorder from the forcats package to reorder the facet panels in descending order.
For the previous plot, use fill = log_odds_weighted > 0 in the ggplot aes argument to highlight the positive and negative values.
Use add_count from the dplyr package to add a year_total variable which shows the total awards for each year. Then use this to calculate the percent change in totals medals per state using mutate(pct_year = n / year)
Use glm from the stats package to create a logistic regression model to find out if their is a statistical trend in the probability of award success over time.
Exapnd on the previous model by using the broom package to fit multiple logistic regressions across multiple states instead of doing it for an individual state at a time.
Use conf.int = TRUE to add confidence bounds to the logistic regression output then use it to create a TIE Fighter plot to show which states become more or less frequent medal winners over time.
Use the state.name dataset with match from base r to change state abbreviation to the state name.
Summary of screencast.