R Downloads

Data manipulation (especially time series)

Published

October 29, 2018

Notable topics: Data manipulation (especially time series)

Recorded on: 2018-10-29

Timestamps by: Alex Cookson

View code

Screencast

Timestamps

geom_line

Using geom_line function to visualize changes over time

lubridate

Starting to decompose time series data into day-of-week trend and overall trend (lots of lubridate package functions)

Using floor_date function from lubridate package to round dates down to the week level

Using min function to drop incomplete/partial week at the start of the dataset

countrycode
countrycode

Using countrycode function from countrycode package to replace two-letter country codes with full names (e.g., "CA" becomes "Canada")

fct_lump

Using fct_lump function to get top N categories within a categorical variable and classify the rest as "Other"

hour
lubridate

Using hour function from lubridate package to pull out integer hour value from a datetime variable

facet_wrap

Using facet_wrap function to graph small multiples of downloads by country, then changing scales argument to allow different scales on y-axis

Starting analysis of downloads by IP address

as.POSIXlt

Using as.POSIXlt to combine separate date and time variables to get a single datetime variable

lag

Using lag function to calculate time between downloads (time between events) per IP address (comparable to SQL window function)

as.numeric

Using as.numeric function to convert variable from a time interval object to a numeric variable (number in seconds)

Explanation of a bimodal log-normal distribution

scale_x_log10

Handy trick for setting easy-to-interpret intervals for time data on scale_x_log10 function's breaks argument

Starting to explore package downloads

Adding 1 to the numerator and denominator when calculating a ratio to get around dividing by zero

cran_downloads
cranlogs

Showing how to look at package download data over time using cran_downloads function from the cranlogs package