Chopped
Data manipulation, Modeling (Linear Regression, Random Forest, and Natural Spline)
Notable topics: Data manipulation, Modeling (Linear Regression, Random Forest, and Natural Spline)
Recorded on: 2020-08-24
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use geom_histogram to visualize the distribution of episode ratings.
Use geom_point and geom_line with color = factor(season) to visualize the episode rating for every episode.
Use group_by and summarize to show the average rating for each season and the number of episodes in each season.
Continuing from previous row:
Use geom_line and geom_point with size = n_episodes to visualize the average rating for each season with point size indicating the total number of episodes (larger = more episodes, smaller = fewer episodes).
Use fct_reorder to reorder the episode_name factor levels by sorting along the episode_rating variable.
Use geom_point to visualize the top episodes by rating.
Use the 'glue' package to place season number and episode number before episode name on the y axis.
Use pivot_longer to combine ingredients into one single column.
Use separate_rows with sep = ", " to separate out the ingredients with each ingredient getting its own row.
Use fct_lump to lump ingredients together except for the 10 most frequent.
Use fct_reorder to reorder ingredient factor levels by sorting against n.
Use geom_col to create a stacked bar plot to visualize the most common ingredients by course.
Use fct_relevel to reorder course factor levels to appetizer, entree, dessert.
Use fct_rev and scale_fill_discrete with guide = guide_legend(reverse = TRUE) to reorder the segments within the stacked bar plot.
Use the widyr package and pairwise_cor to find out what ingredients appear together.
Mentioned: David Robinson - The {widyr} Package YouTube Talk at 2020 R Conference
Use ggraph , geom_edge_link, geom_node_point, geom_node_text to create an ingredient network diagram to show their makeup and how they interact.
Use pairwise_count from widyr to count the number of times each pair of items appear together within a group defined by feature.
Use unite from the tidyr package in order to paste together the episode_course and series_episode columns into one column to figure out if any pairs of ingredients appear together in the same course across episodes.
Use summarize with min, mean, max, and n()to create thefirst_season, avg_season, last_seasonandn_appearances` variables.
Use slice with tail to get the n ingredients that appear in early and late seasons.
Use geom_boxplot to visualize the distribution of each ingredient across all seasons.
Fit predictive models (linear regression , random forest, and natural spline) to determine if episode rating is explained by the ingredients or season.
Use pivot_wider with values_fill = list(value = 0)) with 1 indicating ingredient was used and 0 indicating it wasn't used.
Summary of screencast.