knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE, warning = FALSE )
library(Stat231) library(tidyverse) library(infer)
As we've seen in chapter 6, we can calculate confidence intervals and conduct hypothesis test to make claims about proportions in our population but based on our sample. As you will see in chapter 7, we can also do this for averages. In this lab, we'll build an interval and conduct a hypothesis test for a single sample proportion and as well as a two-sample average (section 7.3).
We'll be using the piracy
data set included in our Stat231
package. It contains data for each member of congress and money they received from lobby groups for (money_pro) and against (money_con) anti-piracy legislation.
For the proportion section, we'll conduct a hypothesis test to determine if a majority of a randomly chosen congress are Democrats. First, we'll need to make a new data set that contains the proportion of Democrats and Other parties.
prop_data <- Stat231::piracy prop_data$party <- recode(prop_data$party, "D" = 1, .default = 0) prop_data$party <- as.double(prop_data$party) dem_count <- prop_data %>% summarize(democrats = sum(party), other = n()- democrats, total = n(), democrats_prop = democrats/total, other_prop = other/total)
There is a function, prop.test()
, that calculates the test and interval for us. However, it is not a true 1 proportion z test. Instead it runs a chi-squared test assuming that the distribution is binomial. We'll use the categorical_z
function from our Stat231
package. The function has arguments of sample proportion, (assumed) population proportion and sample size.
$H_0: p = 0.50$
$H_1: p > 0.50$
$\alpha = 0.05 \rightarrow Z* = 1.645$
z <- categorical_z(dem_count$democrats_prop, 0.50, dem_count$total) #calculate z score z # print z score pval <- pnorm(z, lower.tail = FALSE) #calculate the p-value pval #print the p-value
Using the classic method, our test statistic is not in the critical region so we fail to reject $H_0$. There is not enough evidence to support the claim that the majority of a randomly chosen congress is Democrat.
Using the p-value method, our p-value is much greater than alpha so we fail to reject $H_0$. There is not enough evidence to support the claim that the majority of a randomly chosen congress is Democrat.
If there is not a majority, what proportion of Democrats can we expect? This is where our confidence interval comes in. We'll still use $\alpha = 0.05$; however, $Z$ will change since we are now using two-tails. The function will change $Z$ for us behind the scenes.
prop_ci(dem_count$democrats_prop, 0.05, dem_count$total)
We can expect a randomly chosen congress to be between 41.28% Democrat and 49.73 % Other.
For our means section Our test/interval will be to determine if there is a difference in money received by Democrats and Republicans, regardless of which lobby group the money came from.
To clean our data, we'll need to replace missing values with 0 (rationale discussed in lab/the screencast), create a new variable for the sum of money received and filter our the few Independent party observations.
summary_data <- Stat231::piracy %>% mutate(money_pro = replace_na(money_pro, 0), money_con = replace_na(money_con,0), money_total = money_pro+money_con) %>% select(-c(money_pro, money_con)) %>% filter(party!= "I")
Now we're ready to run our test.
t_test(summary_data, #choose the data frame formula = money_total~party, #define the relationship/variables for testing order = c("D","R"), #choose an order for the mean difference to be calculated mu = 0, #define the null hypothesis alternative = "two.sided", #define the alternative hypothesis var.equal = TRUE, #assume the populations have equal variance conf_int = TRUE, #will calculate a confidence interval for the mean difference as part of the test conf_level = 0.99) #defines the confidence/significance level
From the output, we can conclude there is a difference. Based on the interval is appears that Democrats received more money in relation to the anti-piracy legislation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.