Tour de France

Survival analysis, Animated bar graph (gganimate package)

Published

April 6, 2020

Notable topics: Survival analysis, Animated bar graph (gganimate package)

Recorded on: 2020-04-06

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

Getting an overview of the data

%/%

Aggregating data into decades using the truncated division operator %/%

Noting that death data is right-censored (i.e., some winners are still alive)

transmute

Using transmute function, which combines functionality of mutate (to create new variables) and select (to choose variables to keep)

survfit
survival

Using survfit function from survival package to conduct survival analysis

glance
broom

Using glance function from broom package to get a one-row model summary of the survival model

extract

Using extract function to pull out a string matching a regular expression from a variable (stage number in this case)

Theorizing that there is a parsing issue with the original data's time field

group_by

Using group_by function's built-in "peeling" feature, where a summarise call will "peel away" one group but left other groupings intact

rankpercent_rank

Using rank function, then upgrading to percent_rank function to give percentile rankings (between 0 and 1)

geom_smooth

Using geom_smooth function with method argument as "lm" to plot a linear regression

cut

Using cut function to bin numbers (percentiles in this case) into categories

Reviewing boxplots exploring relationship between first-stage performance and overall Tour performance

gganimate

Starting to create an animation using gganimate package

Actually writing the code to create the animation

reorder_within
tidytext

Using reorder_within function from tidytext package to re-order factors that have the same name across multiple groups

Summary of screencast