Maryland Bridges

Data manipulation, Map visualization

Published

November 26, 2018

Notable topics: Data manipulation, Map visualization

Recorded on: 2018-11-26

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

geom_line

Using geom_line to create an exploratory line graph

%/%

Using %/% operator (truncated division) to bin years into decades (e.g., 1980, 1984, and 1987 would all become "1980")

Converting two-digit year to four-digit year (e.g., "16" becomes "2016") by adding 2000 to each one

percent_format
scales

Using percent_format function from scales package to get nice-looking axis labels

geom_col

Using geom_col to create an ordered nice bar/column graph

replace_na

Using replace_na to replace NA values with "Other"

Starting exploration of average daily traffic

comma_format
scales

Using comma_format function from scales package to get more readable axis labels (e.g., "1e+05" becomes "100,000")

cut

Using cut function to bin continuous variable into customized breaks (also does a mutate within a group_by!)

Starting to make a map

scale_colour_gradient2

Encoding a continuous variable to colour, then using scale_colour_gradient2 function to specify colours and midpoint

scale_colour_gradient2

Specifying the trans argument (transformation) of the scale_colour_gradient2 function to get a logarithmic scale

str_to_title

Using str_to_title function to get values to Title Case (first letter of each word capitalized)

glm

Predicting whether bridges are in "Good" condition using logistic regression (remember to specify the family argument! Dave fixes this at 52:54)

Explanation of why we should NOT be using an OLS linear regression

augment
broom

Using the augment function from the broom package to illustrate why a linear model is not a good fit

augment
broom

Specifying the type.predict argument in the augment function so that we get the actual predicted probability

Explanation of why the sigmoidal shape of logistic regression can be a drawback

Using a cubic spline model (a type of GAM, Generalized Additive Model) as an alternative to logistic regression

Explanation of the shape that a cubic spline model can take (which logistic regression cannot)

Visualizing the model in a different way, using a coefficient plot

geom_vline

Using geom_vline function to add a red reference line to a graph

tidygeom_errorbarh

Adding confidence intervals to the coefficient plot by specifying conf.int argument of tidy function and graphing using the geom_errorbarh function

Brief explanation of log-odds coefficients

Summary of screencast