Board Game Reviews
LASSO regression (glmnet package)
Notable topics: LASSO regression (glmnet package)
Recorded on: 2019-03-14
Timestamps by: Alex Cookson
Screencast
Timestamps
Starting EDA (exploratory data analysis) with counts of categorical variables
Specifying scale_x_log10 function's breaks argument to get sensisble tick marks for time on histogram
Tweaking geom_histogram function's binwidth argument to get something that makes sense for log scale
Using separate_rows to break down comma-separated values for three different categorical variables
Using top_n to get top 20 observations from each of several categories (not quite right, fixed at 17:47)
Troubleshooting various issues with facetted graph (e.g., ordering, values appearing in multiple categories)
Starting prediction of average rating with a linear model
Splitting data into train/test sets (training/holdout)
Investigating relationship between max number of players and average rating (to determine if it should be in linear model)
Exploring average rating over time ("Do newer games tend to be rated higher/lower?")
Discussing necessity of controlling for year a game was published in the linear model
Non-model approach to exploring relationship between game features (e.g., card game, made in Germany) on average rating
Using geom_boxplot function to create boxplot of average ratings for most common game features
Using unite function to combine multiple variables into one
Introducing Lasso regression as good option when you have many features likely to be correlated with one another
Writing code to set up Lasso regression using glmnet and tidytext packages
Adding average rating to the feature matrix (warning: method is messy)
Using setdiff function to find games that are in one set, but not in another (while setting up matrix for Lasso regression)
Spotting the error stemming from the step above (calling row names from the wrong data)
Explaining what a Lasso regression does, including the penalty parameter lambda
Using a cross-validated Lasso model to choose the level of the penalty parameter (lambda)
Adding non-categorical variables to the Lasso model to control for them (e.g., max number of players)
Using unite function to combine multiple variables into one, separated by a colon
Graphing the top 20 coefficients in the Lasso model that have the biggest effect on predicted average rating
Mentioning the yardstick package as a way to evaluate the model's performance
Discussing drawbacks of linear models like Lasso (can't do non-linear relationships or interaction effects)