Animal Crossing

Topic modelling (stm package)

Published

May 4, 2020

Notable topics: Topic modelling (stm package)

Recorded on: 2020-05-04

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

Starting text analysis of critic reviews of Animal Crossing

floor_date
lubridate

Using floor_date function from lubridate package to round dates down to nearest month (then week)

unnest_tokensanti_join
tidytext

Using unnest_tokens function and anti_join functions from tidytext package to break reviews into individual words and remove stop words

Taking the average rating associated with individual words (simple approach to gauge sentiment)

geom_line

Using geom_line and geom_point to graph ratings over time

mean

Using mean function and logical statement to calculate percentages that meet a certain condition

geom_text

Using geom_text to visualize what words are associated with positive/negative reviews

Disclaimer that this exploration is not text regression -- wine ratings screencast is a good resource for that

Starting to do topic modelling

stm
stm

Explanation of stm function from stm package

stm
stm

Explanation of stm function's output (topic modelling output)

Changing the number of topics from 4 to 6

Explanation of how topic modelling works conceptually

tidy
broom

Using tidy function from broom package to find which "documents" (reviews) were the "strongest" representation of each topic

Noting that there might be a scraping issue resulting in review text being repeated

str_sub

(Unsuccessfully) Using str_sub function to help fix repeated review text by locating where in the review text starts being repeated

str_replacemap2

(Unsuccessfully) Using str_replace and map2_chr functions, as well as regex cpaturing groups to fix repeated text

Looking at the association between review grade and gamma of the topic model (how "strong" a review represents a topic)

cor

Using cor function with method = "spearman" to calculate correlation based on rank instead of actual values

Summary of screencast