Tennis Tournaments

NA

Published

April 8, 2019

Notable topics: NA

Recorded on: 2019-04-08

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

Identifying duplicated rows ands fixing them

add_countfct_reorder

Using add_count and fct_reorder functions to order categories that are broken down into sub-categories for graphing

str_to_titlestr_replace

Tidying graph titles (e.g., replacing underscores with spaces) using str_to_title and str_replace functions

inner_join

Using inner_join function to merge datasets

difftimeas.numeric

Calculating age from date of birth using difftime and as.numeric functions

Adding simple calculations like mean and median into the text portion of markdown document

Looking at distribution of wins by sex using overlapping histograms

%/%

Binning years into decades using truncated division %/%

interaction

Splitting up boxplots so that they are separated into pairs (M/F) across a different group (decade) using interaction function

Analyzing distribution of ages across decades, looking specifically at the effect of Serena Williams (one individual having a disproportionate affect on the data, making it look like there's a trend)

Avoiding double-counting of individuals by counting their average age instead of their age at each win

Starting analysis to predict winner of Grand Slam tournaments

row_number

Creating rolling count using row_number function to make a count of previous tournament experience

cumsum

Creating rolling win count using cumsum function

lag

Lagging rolling win count using lag function (otherwise we get information about a win before a player has actually won, for prediction purposes)

Asking, "When someone is a finalist, what is their probability of winning as a function of previous tournaments won?"

Asking, "How does the number of wins a finalist has affect their chance of winning?"

Backtesting simple classifier where person with more tournament wins is predicted to win the given tournament

Creating classifier that gives points based on how far a player got in previous tournaments

match

Using match function to turn name of round reached (1st round, 2nd round, …) into a number score (1, 2, …)

cummean

Using cummean function to get score of average past performance (instead of cumsum function)

Pulling names of rounds (1st round, 2nd round, … ) based on the rounded numeric score of previous performance