Wine Ratings

Text mining (tidytext package), LASSO regression (glmnet package)

Notable topics: Text mining (tidytext package), LASSO regression (glmnet package)

Recorded on: 2019-05-30

Timestamps by: Alex Cookson

## Screencast

## Timestamps

Using extract function from tidyr package to pull out year from text field

Changing extract function to pull out year column more accurately

Starting to explore prediction of points

Using fct_lump on country variable to collapse countries into an "Other" category, then fct_relevel to set the baseline category for a linear model

Investigating year as a potential confounding variable

Investigating "taster_name" as a potential confounding variable

Coefficient (TIE fighter) plot to see effect size of terms in a linear model, using tidy function from broom package

Polishing category names for presentation in graph using str_replace function

Using augment function to add predictions of linear model to original data

Plotting predicted points vs. actual points

Using ANOVA to determine the amount of variation that explained by different terms

Using tidytext package to set up wine review text for Lasso regression

Setting up and using pairwise_cor function to look at words that appear in reviews together

Creating sparse matrix using cast_sparse function from tidytext package; used to perform a regression on positive/negative words

Checking if rownames of sparse matrix correspond to the wine_id values they represent

Setting up sparse matrix for using glmnet package to do sparse regression using Lasso method

Actually writing code for doing Lasso regression

Basic explanation of Lasso regression

Putting Lasso model into tidy format

Explaining how the number of terms increases as lambda (penalty parameter) decreases

Answering how we choose a lambda value (penalty parameter) for Lasso regression

Using parallelization for intensive computations

Adding price (from original linear model) to Lasso regression

Shows glmnet.fit piece of a Lasso (glmnet) model

Picking a lambda value (penalty parameter) and explaining which one to pick

Taking most extreme coefficients (positive and negative) by grouping theme by direction

Demonstrating tidytext package's sentiment lexicon, then looking at individual reviews to demonstrate the model

Visualizing each coefficient's effect on a single review

Using str_trunc to truncate character strings