Nobel Prize Winners

Data manipulation, Graphing for EDA (Exploratory Data Analysis)

Published

May 23, 2019

Notable topics: Data manipulation, Graphing for EDA (Exploratory Data Analysis)

Recorded on: 2019-05-23

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

geom_col%/%

Creating a stacked bar plot using geom_col and the aes function's fill argument (also bins years into decades with truncated division operator %/%)

n_distinct

Using n_distinct function to quickly count unique years in a group

distinct

Using distinct function and its .keep_all argument to de-duplicate data

coalesce

Using coalesce function to replace NAs in a variable (similar to SQL COALESCE verb)

year
lubridate

Using year function from lubridate package to calculate (approx.) age of laureates at time of award

fct_reorder

Using fct_reorder function to arrange boxplot graph by the median age of winners

count

Defining a new variable within the count function (like doing a mutate in the count function)

geom_colfacet_wrap

Creating a small multiples bar plot using geom_col and facet_wrap functions

WDIsearch
WDI

Importing income data from WDI package to explore relationship between high/low income countries and winners

fct_relevel

Using fct_relevel to change the levels of a categorical income variable (e.g., "Upper middle income") so that the ordering makes sense

Starting to explore new dataset of nobel laureate publications

mean

Taking the mean of a subset of data without needing to fully filter the data beforehand

rank

Using rank function and its ties.method argument to add the ordinal number of a laureate's publication (e.g., 1st paper, 2nd paper)

geom_histogram

Lots of playing around with exploratory histograms (geom_histogram)

Discussion of right-censoring as an issue (people winning the Nobel prize but still having active careers)

Summary of screencast