Car Fuel Efficiency

Natural splines for regression

Published

October 14, 2019

Notable topics: Natural splines for regression

Recorded on: 2019-10-14

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

selectsortcolnames

Using select and sort and colnames functions to sort variables in alphabetical order

geom_abline

Adding geom_abline for y = x to a scatter plot for comparison

geom_boxplot

Visualising using geom_boxplot for mpg by vehicle class (size of car)

Start of explanation of prediction goals

sample_frac

Creating train and test sets, along with trick using sample_frac function to randomly re-arrange all rows in a dataset

geom_smooth

First step of developing linear model: visually adding geom_smooth

augment

Using augment function to add extra variables from model to original dataset (fitted values and residuals, especially)

Creating residuals plot and explaining what you want and don't want to see

Explanation of splines

Visualising effect of regressing using natural splines

Creating a tibble to test different degrees of freedom (1:10) for natural splines

unnest

Using unnest function to get tidy versions of different models

Visualising fitted values of all 6 different models at the same time

glance

Investigating whether the model got "better" as we added degrees of freedom to the natural splines, using the glance function

Using ANOVA to perform a statistical test on whether natural splines as a group explain variation in MPG

Exploring colinearity of dependant variables (displacement and cylinders)

floor

Binning years into every two years using floor function

summarise_at

Using summarise_at function to do quick averaging of multiple variables