Government Spending on Kids

Data Manipulation, Functions, Embracing, Reading in Many .csv Files, Pairwise Correlation

Published

September 14, 2020

Notable topics: Data Manipulation, Functions, Embracing, Reading in Many .csv Files, Pairwise Correlation

Recorded on: 2020-09-14

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

geom_linesummarizeuniquesamplefacet_wrapfct_reordertheme_tuftegeom_vline
ggplotdplyrbaseggthemesforcats

Using geom_line and summarize to visualize education spending over time. First for all states. Then individual states. Then small groups of states using %in%. Then in random groups of size n using %in% and sample with unique. fct_reorder is used to reorder state factor levels by sorting along the inf_adj variable.

geom_vline used to add reference to the 2009 financial crisis.

geom_linesummarizeuniquesamplefacet_wrapfct_reordertheme_tuftegeom_vlinegeom_hline
ggplotdplyrbaseggthemesforcats

Take the previous chart setting the inf_adj_perchild for the first year 1997 to 100% in order to show a measure of increase from 100% as opposed to absolute value for change over time for each state relative to 1997. geom_hline used to add reference for the 100% starting point. David ends up changing the starting point from 100% to 0%

fct_reorder with max used to reorder the plots in descending order based on highest peak values.

David briefly mentions the small multiples approach to analyzing data.

function

Create a function named plot_changed_faceted to make it easier to visualize the many other variables included in the dataset.

function

Create a function named plot_faceted with a {{ y_axis }} embracing argument. Adding this function creates two stages: one for data transformation and another for plotting.

dirmap_dffunctionset_namespivot_longerseparateextract
basepurrtidyr

Use the dir function with pattern and purrr package's map_df function to read in many different .csv files with GDP values for each state.

Troubleshooting Can't combine <character> and <double> columns error using function and mutate with across and as.numeric.

Extract state name from filename using extract from tidyr and regular expression.

read_xlsx
readxl

Unsuccessful attempt at importing state population data via a not user friendly dataset from census.gov by skipping the first 3 rows of the Excel file.

geom_colfct_reorderscale_fill_discrete
ggplotforcats

Use geom_col to see which states spend the most for each child for a single variable and multiple variables using %in%.

Use scale_fill_discrete with guide_legend(reverse = TRUE) to change the ordering of the legend.

pairwise_corrfct_reorder
widyr

Use geom_col and 'pairwise_corrto visualize the correlation between variables across states in 2016 usingpairwise correlation`.

pivot_widergeom_pointgeom_text
ggplottidyr

Use geom_point to plot inf_adjust_perchild_PK12ed versus inf_adj_perchild_highered. geom_text used to apply state names to each point.

Summary of screencast.