Himalayan Climbers

Data Manipulation, Empirical Bayes, Logistic Regression Model

Notable topics: Data Manipulation, Empirical Bayes, Logistic Regression Model

Recorded on: 2020-09-21

Timestamps by: Eric Fletcher

## Screencast

## Timestamps

Create a `geom_col`

chart to visualize the top 50 tallest mountains.

Use `fct_reorder`

to reorder the `peak_name`

factor levels by sorting along the `height_metres`

variable.

Use `summarize`

with `across`

to get the total number of climbs, climbers, deaths, and first year climbed.

Use `mutate`

to calculate the percent death rate for members and hired staff.

Use `inner_join`

and `select`

to join with `peaks`

dataset by `peak_id`

.

Touching on statistical `noise`

and how it impacts the death rate for mountains with fewer number of climbs, and how to account for it using various statistical methods including `Beta Binomial Regression`

& `Empirical Bayes`

.

Further description of `Empirical Bayes`

and how to account for not overestimating death rate for mountains with fewer climbers.

Recommended reading: Introduction to Empirical Bayes: Examples from Baseball Statistics by David Robinson

Use the `ebbr`

package (Empirical Bayes for Binomial in R) to create an Empirical Bayes Estimate for each mountain by fitting prior distribution across data and adjusting the death rates down or up based on the prior distributions.

Use a `geom_point`

chart to visualize the difference between the raw death rate and new `ebbr`

fitted death rate.

Use `geom_point`

to visualize how deadly each mountain is with `geom_errorbarh`

representing the 95% credible interval between minimum and maximum values.

Use `geom_point`

to visualize the relationship between `death rate`

and `height`

of mountain.

There is not a clear relationship, but David does briefly mention how one could use `Beta Binomial Regression`

to further inspect for possible relationships / trends.

Use `geom_histogram`

and `geom_boxplot`

to visualize the distribution of time it took climbers to go from basecamp to the mountain’s high point for successful climbs only.

Use `mutate`

to calculate the number of days it took climbers to get from basecamp to the highpoint.

Add column to data using `case_when`

and `str_detect`

to identify strings in `termination_reason`

that contain the word `Success`

and rename them to `Success`

& how to use a `vector`

and `%in%`

to change multiple values in `termination_reason`

to `NA`

and rest to `Failed`

.

Use `fct_lump`

to show the top 10 mountains while lumping the other factor levels (mountains) into `other`

.

For just Mount Everest, use `geom_histogram`

and `geom_density`

with `fill = success`

to visualize the days from basecamp to highpoint for climbs that ended in `success`

, `failure`

or `other`

.

For just Mount Everest, use `geom_histogram`

to see the distribution of climbs per year.

For just Mount Everest, use ‘geom_line`and`

geom_point`to visualize`

pct_death` over time by decade.

Use `mutate`

with `pmax`

and `integer division`

to create a decade variable that lumps together the data for 1970 and before.

Write a function for summary statistics such as `n_climbs`

, `pct_success`

, `first_climb`

, `pct_death`

, ‘pct_hired_staff_death`.

For just Mount Everest, use `geom_line`

and `geom_point`

to visualize `pct_success`

over time by decade.

For just Mount Everest, use `geom_line`

and `geom_point`

to visualize `pct_hired_staff_deaths `

over time by decade.

David decides to visualize the `pct_hired_staff_deaths`

and `pct_death`

charts together on the same plot.

For just Mount Everest, fit a logistic regression model to predict the probability of death with `format.pval`

to calculate the `p.value`

.

Use `fct_lump`

to lump together all `expedition_role`

factors except for the n most frequent.

Use `group_by`

with `integer division`

and `summarize`

to calculate `n_climbers`

and `pct_death`

for age bucketed into decades.

Use `geom_point`

and `geom_errorbarh`

to visualize the logistic regression model with confident intervals.

Summary of screencast