Art Collections

geom_area plot, distributions, calculating area (square meters) and ratio (width / height)

Published

January 11, 2021

Notable topics: geom_area plot, distributions, calculating area (square meters) and ratio (width / height)

Recorded on: 2021-01-11

Timestamps by: Eric Fletcher

View code

Screencast

Timestamps

clean_names
janitor

Using clean_names to convert variable names from camelcase to snakecase.

fct_reordergeom_col
forcatsggplot2

Use fct_reorder to reorder geom_col columns in ascending order.

extractseparate
tidyr

"Use extract to extract a character column into multiple columns using the regular expression ""(.*) on (.*)"" at 6:05 David decides to change this to: Use separate with sep = "" on "" and fill = ""left"" and extra = ""merge"" to control what happens when there are not enoughor too many pieces. at 7:10 David decides to change to fill = ""right""."

replace_na
tidyr

Use replace_na to replace NAs with specified values. In this case replace them with Missing.

fct_lumpfilter
forcatsdplyr

"Use fct_lump to lump artist and medium levels except for the n most frequent. at 11:30 David decides to use filter(fct_lump(artist, 16) != ""Other"") to get rid of the artist Other category. "

geom_area
ggplot2

"Create a geom_area plot to show the distribution of paintings by medium over time. At 15:35 David decides to change from count to percentage to make it easier to show the difference in composition using mutate(pct = n / sum)."

countround
basedplyr

Bucket year variable into decades using round(year -1) to round the year to the nearest 10.

scale_y_continuous
scales

Use scale_y_continuous(labels = scales::percent) to change y-axis labels to percent format.

facet_wrapgeom_col
ggplot2

Turn the geom_area plot into a faceted geom_col.

mutategroup_bysummarizecomplete
dplyrtidyr

"Calculate the percentage of artists for each medium per decade. "

filtermutateggplotgeom_histogramscale_x_log10geom_vline
dplyrggplot2

Calculate the distribution of the area (square meters) and ratio (width / height) of the art pieces.

mutatecase_whengeom_areacomplete
dplyrggplot2

Categorize the pieces by shape(landscape, portait, scquare) based on their ratio then plot using geom_area to look at the composition over time.

group_bysummarizefilterggplotgeom_linegeom_point
dplyrggplot2

Craete a line plot showing the median ratio by decade over time.

group_bysummarizefilterggplotgeom_linegeom_point
dplyrggplot2

Craete a line plot showing the median area by decade over time.

mutatefilterggplotgeom_boxplotscale_y_log10
dplyrggplot2

Create a boxplot showing the distribution of area over time.

group_bysummarizearrange
dplyr

Create various summary statistics for the artists such as avg_year, first_year, last_year, n_pieces, median_area, median_ratio`.

filteradd_countmutateggplotgeom_boxplotscale_x_log10geom_vlineglue
dplyrggplot2glue

Create a boxplot showing the distribution of ratio over time for n amount of artists. Use glue to concatonate number of pieces for each artist ont he y axis.

filteradd_countmutateggplotgeom_boxplotscale_x_log10geom_vlineglue
dplyrggplot2glue

Create a boxplot showing the distribution of ratio over time for each medium. Use glue to concatonate number of pieces for each medium on the y axis.

Summary of screencast