Tennis Tournaments



April 8, 2019

Notable topics: NA

Recorded on: 2019-04-08

Timestamps by: Alex Cookson

View code



Identifying duplicated rows ands fixing them


Using add_count and fct_reorder functions to order categories that are broken down into sub-categories for graphing


Tidying graph titles (e.g., replacing underscores with spaces) using str_to_title and str_replace functions


Using inner_join function to merge datasets


Calculating age from date of birth using difftime and as.numeric functions

Adding simple calculations like mean and median into the text portion of markdown document

Looking at distribution of wins by sex using overlapping histograms


Binning years into decades using truncated division %/%


Splitting up boxplots so that they are separated into pairs (M/F) across a different group (decade) using interaction function

Analyzing distribution of ages across decades, looking specifically at the effect of Serena Williams (one individual having a disproportionate affect on the data, making it look like there's a trend)

Avoiding double-counting of individuals by counting their average age instead of their age at each win

Starting analysis to predict winner of Grand Slam tournaments


Creating rolling count using row_number function to make a count of previous tournament experience


Creating rolling win count using cumsum function


Lagging rolling win count using lag function (otherwise we get information about a win before a player has actually won, for prediction purposes)

Asking, "When someone is a finalist, what is their probability of winning as a function of previous tournaments won?"

Asking, "How does the number of wins a finalist has affect their chance of winning?"

Backtesting simple classifier where person with more tournament wins is predicted to win the given tournament

Creating classifier that gives points based on how far a player got in previous tournaments


Using match function to turn name of round reached (1st round, 2nd round, …) into a number score (1, 2, …)


Using cummean function to get score of average past performance (instead of cumsum function)

Pulling names of rounds (1st round, 2nd round, … ) based on the rounded numeric score of previous performance