African-American Achievements

plotly interactive timeline, Wikipedia web scraping

Published

June 8, 2020

Notable topics: plotly interactive timeline, Wikipedia web scraping

Recorded on: 2020-06-08

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

fct_reorder
forcats

Use fct_reorder from the forcats package to reorder the category factor levels by sorting along n.

str_remove
stringr

Use str_remove from the stringr package to remove anything after a bracket or parenthesis from the person variable with the regular expression "[\\[\\(].*" David then discusses how web scraping may be a better option than parsing the strings.

str_trim
stringr

Use str_trim from the stringr package to remove the whitespace from the person variable. David then discusses how web scraping may be a better option than parsing the strings.

ggplotly
plotly

Create an interactive plotly timeline.

ylim
ggplot2

Use ylim(c(-.1, 1)) to set scale limits moving the geom_point to the bottom of the graph.

paste0
base

Use paste0 from base R to concatenate the accomplishment and person with ": " in between the two displayed in the timeline hover label.

aes
ggplot2

Set y to category in ggplot aesthetics to get 8 separate timelines on one plot, one for each category. Doing this allows David to remove the ylim mentioned above.

tooltip
plotly

Use the plotly tooltip = text parameter to get just a single line of text in the plotly hover labels.

glue
glue

Use glue from the glue package to reformat text with \n included so that the single line of text can now be broken up into 2 separate lines in the hover labels.

separate_rows
tidyr

Use separate_rows from the tidyr package to separate the occupation_s variable from the science dataset into multiple columns delimited by a semicolon with sep = "; "

str_to_title
stringr

Use str_to_title from the stringr package to conver the case to title case in the occupation_s variable.

str_detect
stringr

Use str_detect from the stringr package to detect the presence of statistician from within the occupation_s variable with regex("statistician", ignore_case = TRUE) to perform a case-insensitive search.

read_htmlhtml_nodeshtml_tablesetNames
rvest

Use the rvest package with Selector Gadget to scrape additional information about the individual from their Wikipedia infobox.

mappossiblyread_html
purrr

Use map and possibly from the purrr package to separate out the downloading of data from parsing the useful information. David then turns the infobox extraction step into an anonymous function using .%>% dot-pipe.

Summary of screencast.