library(tidyverse) library(PPBDS.data) library(learnr) library(shiny) library(rstanarm) knitr::opts_chunk$set(echo = FALSE, message = FALSE) options(tutorial.storage="local") ch_9 <- governors %>% select(last_name, year, state, sex, alive_post, alive_pre) gov_1 <- stan_glm(data = ch_9, formula = alive_post ~ sex + alive_pre, refresh = 0) gov_2 <- stan_glm(data = ch_9, formula = alive_post ~ state + sex*alive_pre, refresh = 0, iter = 1000)
Confirm that you have the correct version of PPBDS.data installed by pressing "Run Code."
packageVersion('PPBDS.data')
The answer should be ‘0.3.2.9008’, or a higher number. If it is not, you should upgrade your installation by issuing these commands:
remove.packages('PPBDS.data') library(remotes) remotes::install_github('davidkane9/PPBDS.data')
Strictly speaking, it should not be necessary to remove a package. Just installing it again should overwrite the current version. But weird things sometimes happen, so removing first is the safest approach.
question_text( "Student Name:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
``` {r email, echo=FALSE} question_text( "Email:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
## EDA of governors Let's create this graph. ```r ch_9 %>% ggplot(aes(x = sex, y = alive_post)) + geom_boxplot() + labs(title = "US Gubernatorial Candidate Lifespans", subtitle = "Male candidates live much longer", x = "Gender", y = "Days Lived After Election") + scale_y_continuous(labels = scales::label_number())
Start a pipe with governors
. Select the last_name
, year
, state
, and alive_post
, and alive_pre
variables. Assign your work to an object called ch_9
.
Continue the pipe into ggplot()
. Map sex
to the x-axis and alive_post
to the y axis. Use geom_boxplot
.
Title your graph "US Gubernatorial Candidate Lifespans". Label your x-axis "Gender" and your y-axis "Days Lived After Election". Add the subtitle "Male candidates live much longer".
Let's also change the y-axis values for probability to show percents rather decimals Use scale_y_continuous()
. Within scale_y_continuous()
, set thelabels
to scales:: label_number()
.
Let's take a closer look at the graph you just created in the previous section. Say our motive for creating this graph was to answer the following question: Do male candidates or female candidates live longer after the election?
Using Wisdom, write a paragraph about whether or not this data is relevant for the problem we face. See The Primer for guidance.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Let's build another model. Our outcome variable will be alive_post, we will have two explanatory variables: alive_pre
and sex
. Below is the math model we will be using.
$$ alive_post_i = \beta_0 + \beta_1 male_i + \beta_2 alive_pre_i + \epsilon_i $$
Looking at the model above, what are the parameters here? You do not need to figure out how to display the symbols in your answer, just write their names (i.e. "epsilon," "delta," etc. ).
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Great! Now write a sentence for each parameter that describes what it means.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Let's implement the model using stan_glm()
. The formula argument should be alive_post ~ sex + alive_pre
. Set data
toch_9
, and refresh
to 0. Assign your work to an object named gov_1
.
Use print()
to look at our parameter values. Set the argument detail
to FALSE.
print(gov_1, detail = ...)
Look at the results above. Write two sentences, using your own words, explaining the significance of the value 8103.0.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Write two sentences that explain how you would find the alive_post value for a male candidate, who has been alive the avg. number of days of all candidates. In addition to your explanation, provide the numerical value.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Let's now create the following posterior.
gov_1 %>% as_tibble() %>% mutate(male_days = `(Intercept)` + sexMale) %>% rename(female_days = `(Intercept)`) %>% select(female_days, male_days) %>% pivot_longer(cols = female_days:male_days, names_to = "parameters", values_to = "days") %>% ggplot(aes(days, color = parameters)) + geom_density() + labs(title = "Posterior Probability Distribution", subtitle = "Men live longer", x = "Average Days Lived Post Election", y = "Probability") + theme_classic() + scale_y_continuous(labels=scales::percent_format())
Start a pipe with gov_1
and use as_tibble()
. Continue the pipe with mutate()
to create a new variable male_days
. male_days
should be equal to the following argument: (Intercept) + sexMale
. Make sure you place back tick marks on either side of the parentheses enclosing Intercept
.
gov_2 %>% as_tibble() %>% mutate(... = `(Intercept)` + ...)
Continue the pipe. Use rename()
to rename the (Intercept)
column as fenale_days
. Now continue again to select female_days
and male_days
.
Continue the pipe even further. Use pivot_longer()
. Set cols
to female_days
and male_days
(Make sure you insert a colon between them). names_to
should be set to "parameters" and values_to
should be set to "days".
Pipe in ggplot()
to plot your data. Map days
to the x-axis, and map parameters
to the color. Add the layers geom_density()
.
Title your graph "Posterior Probability Distribution" with the subtitle "Men live longer". Label your x-axis "Days" and y-axis "Probability". Also add the layer theme_classic()
.
Let's also change the y-axis values for probability to show percents rather decimals. Use scale_y_continuous()
. Within scale_y_continuous()
, set thelabels
to scales::percent_format()
In two sentences, explain one interpretation you could make from the graph you created.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Let's build another model. The outcome variable alive_post will be a function of the two explanatory variables we used above: alive_pre and sex. We are also adding "state" which means we will have 55 different intercepts rather than only having two like in our previous model.
Recall from the chapter that this means there are two different slopes to consider: one for only male candidates and one for only female candidates. In the previous model we built, there was one slope for both men and women. Here is the math model we will be using:
$$ y_i = \beta_0 + \beta_1 x_{AK,i} + \beta_1 x_{AR,i} + ... \beta_{49} x_{WY,i} + \beta_{50} male_i + \beta_{51} alive_pre_i+ \beta_{52} male_i * alive_pre_i + \epsilon_i$$
Let's implement the model using stan_glm()
. The formula argument should be alive_post ~ state + sex*alive_pre
. Set data
toch_9
, refresh
to 0, and iter
to 1000. Assign your work to an object named gov_2
.
Use print()
to look at our parameter values. Set the argument detail
to FALSE.
print(gov_2, detail = ...)
Look at the results above. Write two sentences, using your own words, explaining the significance of the value 4855.9.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Write two sentences that explain how you would find the slope for value for a male candidate from Wisconsin. In addition to you explanation, provide the numerical value.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Let's now create the following posterior to see how the alive_post values vary for female candidates and male candidates from Idaho.
gov_2 %>% as_tibble() %>% mutate(Idaho_females = `(Intercept)` + 1754.0) %>% mutate (Idaho_males = `(Intercept)` +1869.4 +4420.2) %>% select(Idaho_females, Idaho_males) %>% pivot_longer(cols = Idaho_females:Idaho_males, names_to = "parameters", values_to = "days") %>% ggplot(aes(days, color = parameters)) + geom_density() + labs(title = "Posterior Probability Distribution", subtitle = "for women and men that live in Idaho", x = "Average Days Lived Post Election", y = "Probability") + theme_classic() + scale_y_continuous(labels=scales::percent_format())
Start a pipe with gov_2
and use as_tibble()
. Continue the pipe with mutate()
to create a new variable Idaho_males
. Idaho_males
should be equal to the following argument: (Intercept) + 1869. 4+ 4420.2 + sexMale
. Make sure you place back tick marks on either side of the parentheses enclosing Intercept
.
Continue the pipe and use mutate()
again to create the column Idaho_females
. Idaho_females
should be equal to the following argument: (Intercept) + 1754
. Make sure you place back tick marks on either side of the parentheses enclosing Intercept
.
Continue the pipe and select Idaho_females
and Idaho_males
. Continue the pipe again and use pivot_longer()
. Set cols
to `Idaho_females
and Idaho_males
(Make sure you insert a colon between them). names_to
should be set to "parameters" and values_to
should be set to "days".
Pipe in ggplot()
to plot your data. Map days
to the x-axis, and map parameters
to the color. Add the layers geom_density()
.
Title your graph "Posterior Probability Distribution" with the subtitle "for women and men that live in Idaho". Label your x-axis "Days" and y-axis "Probability". Also add the layer theme_classic()
.
Let's also change the y-axis values for probability to show percents rather decimals. Use scale_y_continuous()
. Within scale_y_continuous()
, set thelabels
to scales::percent_format()
In two sentences, explain one interpretation you could make from the graph you created.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Using Temperance, write a paragraph about how you should use this estimate. Are you sure it is correct? How safely can you apply data from 8 years ago to today? How similar is the population from which you drew the data to the population to which you hope to apply your model? See The Primer for guidance.
question_text( "Answer:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.