HBCU Enrollment

Data Cleaning

Published

February 1, 2021

Notable topics: Data Cleaning

Recorded on: 2021-02-01

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

str_detect
stringr

Detect the presence or absence of a pattern in a string.

separate
tidyr

Separate a character column into multiple columns with a regular expression or numeric locations

rename
dplyr

Rename column.

distinct
dplyr

Select only unique/distinct rows from a data frame.

expand_limits
ggplot2

Expand the y axis plot limits by starting at 0.

full_join
dplyr

Combine two datasets while including all rows in x and y.

percent
scales

Y axis labels as percentages (2.5%, 50%, etc).

bind_rows
dplyr

Bind multiple data frames by row and an explanation as to why it's not the best approach for joining given the other options.

rbindrow_bind
dplyrbase

Brief discussion on the differences between rbind and row_bind.

str_remove
stringr

Remove matched patterns in a string.

clean_names
janitor

Turn variable names into 'snake case' (e.g. Standard Error, standard_error).

mutate_ifis.characterparse_number
dplyrbasereadr

Mutate multiple columns to change type from character to numeric while parsing out the numbers while getting rid of the other characters in the dataset.

slice
dplyr

Subset rows using their positions.

gathermutateifelsestr_removespread
tidyrdplyrstringrbase

Reshape the data from wide to long such that there is one row for each year and race.

abs
base

Compute the absolute value of x

str_remove
stringr

Remove matched patterns in a string (e.g. black1, black & white1, white).

fct_reorder
forcats

Reorder factor levels in geom_line plot by sorting along another variable.

bind_rows
dplyr

Bind multiple data frames by row.

fct_relevel
forcats

Reorder factor levels by hand.

str_remove
stringr

Detect and remove the presence of a pattern in a string to remove duplication from geom_line plot legend.

fct_reorder
forcats

"Reorder factor levels in geom_line plot by sorting along another variable with ordering based on the last value to make the data line up with how the values are displayed in the legend. 'fct_reorder(race_ethnicity, percent, last, .desc = TRUE)`"

read_excel
readxl

Import external Excel data set from Data.World.

starts_with
tidyselect

Select variables that match a pattern to remove.

str_removegroup_byfirstifelsecumsum
stringrdplyr

Unpack data in one column (field_gender) into two separate columns (field, gender).

Summary of screencast.

NA