library(tidyverse)
library(PPBDS.data)
library(learnr)
library(shiny)
library(rstanarm)
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
options(tutorial.storage="local")  

ch_9 <- governors %>% 
  select(last_name, year, state, sex, alive_post, alive_pre)

gov_1 <- stan_glm(data = ch_9,
                      formula = alive_post ~ sex + alive_pre,
                      refresh = 0)

gov_2 <- stan_glm(data = ch_9,
                      formula = alive_post ~ state + sex*alive_pre,
                      refresh = 0,
                      iter = 1000)

Confirm Correct Package Version

Confirm that you have the correct version of PPBDS.data installed by pressing "Run Code."

packageVersion('PPBDS.data')

The answer should be ‘0.3.2.9008’, or a higher number. If it is not, you should upgrade your installation by issuing these commands:

remove.packages('PPBDS.data')  
library(remotes)  
remotes::install_github('davidkane9/PPBDS.data')  

Strictly speaking, it should not be necessary to remove a package. Just installing it again should overwrite the current version. But weird things sometimes happen, so removing first is the safest approach.

Name

question_text(
  "Student Name:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Email

``` {r email, echo=FALSE} question_text( "Email:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )

## EDA of governors 

Let's create this graph.

```r
ch_9 %>%
  ggplot(aes(x = sex, y = alive_post)) +
  geom_boxplot() +
  labs(title = "US Gubernatorial Candidate Lifespans",
       subtitle = "Male candidates live much longer",
       x = "Gender",
       y = "Days Lived After Election") +
  scale_y_continuous(labels = scales::label_number()) 

Exercise 1

Start a pipe with governors. Select the last_name, year, state, and alive_post, and alive_pre variables. Assign your work to an object called ch_9.


Exercise 2

Continue the pipe into ggplot(). Map sex to the x-axis and alive_post to the y axis. Use geom_boxplot.


Exercise 3

Title your graph "US Gubernatorial Candidate Lifespans". Label your x-axis "Gender" and your y-axis "Days Lived After Election". Add the subtitle "Male candidates live much longer".


Exercise 4

Let's also change the y-axis values for probability to show percents rather decimals Use scale_y_continuous(). Within scale_y_continuous(), set thelabels to scales:: label_number().


Wisdom

Let's take a closer look at the graph you just created in the previous section. Say our motive for creating this graph was to answer the following question: Do male candidates or female candidates live longer after the election?

Using Wisdom, write a paragraph about whether or not this data is relevant for the problem we face. See The Primer for guidance.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Justice and Courage

Exercise 1

Let's build another model. Our outcome variable will be alive_post, we will have two explanatory variables: alive_pre and sex. Below is the math model we will be using.

$$ alive_post_i = \beta_0 + \beta_1 male_i + \beta_2 alive_pre_i + \epsilon_i $$

Looking at the model above, what are the parameters here? You do not need to figure out how to display the symbols in your answer, just write their names (i.e. "epsilon," "delta," etc. ).

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 2

Great! Now write a sentence for each parameter that describes what it means.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 3

Let's implement the model using stan_glm(). The formula argument should be alive_post ~ sex + alive_pre. Set data toch_9, and refresh to 0. Assign your work to an object named gov_1.


Exercise 4

Use print() to look at our parameter values. Set the argument detail to FALSE.


print(gov_1, detail = ...)

Exercise 5

Look at the results above. Write two sentences, using your own words, explaining the significance of the value 8103.0.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 6

Write two sentences that explain how you would find the alive_post value for a male candidate, who has been alive the avg. number of days of all candidates. In addition to your explanation, provide the numerical value.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 7

Let's now create the following posterior.

gov_1 %>% 
  as_tibble() %>% 
  mutate(male_days = `(Intercept)` + sexMale) %>% 
  rename(female_days = `(Intercept)`) %>% 
  select(female_days, male_days) %>% 
  pivot_longer(cols = female_days:male_days, 
               names_to = "parameters",
               values_to = "days") %>% 
  ggplot(aes(days, color = parameters)) +
    geom_density() +
     labs(title = "Posterior Probability Distribution",
         subtitle = "Men live longer",
         x = "Average Days Lived Post Election",
         y = "Probability") + 
    theme_classic() + 
    scale_y_continuous(labels=scales::percent_format())

Exercise 8

Start a pipe with gov_1 and use as_tibble(). Continue the pipe with mutate() to create a new variable male_days. male_days should be equal to the following argument: (Intercept) + sexMale. Make sure you place back tick marks on either side of the parentheses enclosing Intercept.


gov_2 %>% 
  as_tibble() %>% 
  mutate(... = `(Intercept)` + ...)

Exercise 9

Continue the pipe. Use rename() to rename the (Intercept)column as fenale_days. Now continue again to select female_days and male_days.


Exercise 10

Continue the pipe even further. Use pivot_longer(). Set cols to female_days and male_days (Make sure you insert a colon between them). names_to should be set to "parameters" and values_to should be set to "days".


Exercise 11

Pipe in ggplot() to plot your data. Map days to the x-axis, and map parameters to the color. Add the layers geom_density().


Exercise 12

Title your graph "Posterior Probability Distribution" with the subtitle "Men live longer". Label your x-axis "Days" and y-axis "Probability". Also add the layer theme_classic().


Exercise 13

Let's also change the y-axis values for probability to show percents rather decimals. Use scale_y_continuous(). Within scale_y_continuous(), set thelabels to scales::percent_format()


Exercise 14

In two sentences, explain one interpretation you could make from the graph you created.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 15

Let's build another model. The outcome variable alive_post will be a function of the two explanatory variables we used above: alive_pre and sex. We are also adding "state" which means we will have 55 different intercepts rather than only having two like in our previous model.

Recall from the chapter that this means there are two different slopes to consider: one for only male candidates and one for only female candidates. In the previous model we built, there was one slope for both men and women. Here is the math model we will be using:

$$ y_i = \beta_0 + \beta_1 x_{AK,i} + \beta_1 x_{AR,i} + ... \beta_{49} x_{WY,i} + \beta_{50} male_i + \beta_{51} alive_pre_i+ \beta_{52} male_i * alive_pre_i + \epsilon_i$$

Exercise 16

Let's implement the model using stan_glm(). The formula argument should be alive_post ~ state + sex*alive_pre. Set data toch_9, refresh to 0, and iter to 1000. Assign your work to an object named gov_2.


Exercise 17

Use print() to look at our parameter values. Set the argument detail to FALSE.


print(gov_2, detail = ...)

Exercise 18

Look at the results above. Write two sentences, using your own words, explaining the significance of the value 4855.9.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 19

Write two sentences that explain how you would find the slope for value for a male candidate from Wisconsin. In addition to you explanation, provide the numerical value.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Exercise 20

Let's now create the following posterior to see how the alive_post values vary for female candidates and male candidates from Idaho.

gov_2 %>% 
  as_tibble() %>% 
  mutate(Idaho_females = `(Intercept)` + 1754.0) %>% 
  mutate (Idaho_males = `(Intercept)` +1869.4 +4420.2) %>% 
  select(Idaho_females, Idaho_males) %>% 
  pivot_longer(cols = Idaho_females:Idaho_males, 
               names_to = "parameters",
               values_to = "days") %>% 
  ggplot(aes(days, color = parameters)) +
    geom_density() +
     labs(title = "Posterior Probability Distribution",
         subtitle = "for women and men that live in Idaho",
         x = "Average Days Lived Post Election",
         y = "Probability") + 
    theme_classic() + 
    scale_y_continuous(labels=scales::percent_format())

Exercise 21

Start a pipe with gov_2 and use as_tibble(). Continue the pipe with mutate() to create a new variable Idaho_males. Idaho_males should be equal to the following argument: (Intercept) + 1869. 4+ 4420.2 + sexMale. Make sure you place back tick marks on either side of the parentheses enclosing Intercept.


Exercise 22

Continue the pipe and use mutate() again to create the column Idaho_females. Idaho_females should be equal to the following argument: (Intercept) + 1754. Make sure you place back tick marks on either side of the parentheses enclosing Intercept.


Exercise 23

Continue the pipe and select Idaho_females and Idaho_males. Continue the pipe again and use pivot_longer(). Set cols to `Idaho_females and Idaho_males (Make sure you insert a colon between them). names_to should be set to "parameters" and values_to should be set to "days".


Exercise 24

Pipe in ggplot() to plot your data. Map days to the x-axis, and map parameters to the color. Add the layers geom_density().


Exercise 25

Title your graph "Posterior Probability Distribution" with the subtitle "for women and men that live in Idaho". Label your x-axis "Days" and y-axis "Probability". Also add the layer theme_classic().


Exercise 26

Let's also change the y-axis values for probability to show percents rather decimals. Use scale_y_continuous(). Within scale_y_continuous(), set thelabels to scales::percent_format()


Exercise 27

In two sentences, explain one interpretation you could make from the graph you created.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Temperance

Exercise

Using Temperance, write a paragraph about how you should use this estimate. Are you sure it is correct? How safely can you apply data from 8 years ago to today? How similar is the population from which you drew the data to the population to which you hope to apply your model? See The Primer for guidance.

question_text(
  "Answer:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)


davidkane9/PPBDS.data documentation built on Nov. 18, 2020, 1:17 p.m.