African-American Achievements
plotly interactive timeline, Wikipedia web scraping
Notable topics: plotly interactive timeline, Wikipedia web scraping
Recorded on: 2020-06-08
Timestamps by: Eric Fletcher
Screencast
Timestamps
Use fct_reorder from the forcats package to reorder the category factor levels by sorting along n.
Use str_remove from the stringr package to remove anything after a bracket or parenthesis from the person variable with the regular expression "[\\[\\(].*" David then discusses how web scraping may be a better option than parsing the strings.
Use str_trim from the stringr package to remove the whitespace from the person variable. David then discusses how web scraping may be a better option than parsing the strings.
Create an interactive plotly timeline.
Use ylim(c(-.1, 1)) to set scale limits moving the geom_point to the bottom of the graph.
Use paste0 from base R to concatenate the accomplishment and person with ": " in between the two displayed in the timeline hover label.
Set y to category in ggplot aesthetics to get 8 separate timelines on one plot, one for each category. Doing this allows David to remove the ylim mentioned above.
Use the plotly tooltip = text parameter to get just a single line of text in the plotly hover labels.
Use glue from the glue package to reformat text with \n included so that the single line of text can now be broken up into 2 separate lines in the hover labels.
Use separate_rows from the tidyr package to separate the occupation_s variable from the science dataset into multiple columns delimited by a semicolon with sep = "; "
Use str_to_title from the stringr package to conver the case to title case in the occupation_s variable.
Use str_detect from the stringr package to detect the presence of statistician from within the occupation_s variable with regex("statistician", ignore_case = TRUE) to perform a case-insensitive search.
Use the rvest package with Selector Gadget to scrape additional information about the individual from their Wikipedia infobox.
Use map and possibly from the purrr package to separate out the downloading of data from parsing the useful information. David then turns the infobox extraction step into an anonymous function using .%>% dot-pipe.
Summary of screencast.