In npaterno/stat231: Course Materials for Stat 231

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE,
  warning = FALSE
)

library(Stat231)
library(tidyverse)
library(infer)

Making Claims about a Population Based on Sample Data

As we've seen in chapter 6, we can calculate confidence intervals and conduct hypothesis test to make claims about proportions in our population but based on our sample. As you will see in chapter 7, we can also do this for averages. In this lab, we'll build an interval and conduct a hypothesis test for a single sample proportion and as well as a two-sample average (section 7.3).

We'll be using the piracy data set included in our Stat231 package. It contains data for each member of congress and money they received from lobby groups for (money_pro) and against (money_con) anti-piracy legislation.

Proportions

For the proportion section, we'll conduct a hypothesis test to determine if a majority of a randomly chosen congress are Democrats. First, we'll need to make a new data set that contains the proportion of Democrats and Other parties.

prop_data <- Stat231::piracy

prop_data$party <- recode(prop_data$party, "D" = 1, .default = 0)
prop_data$party <- as.double(prop_data$party)

dem_count <- prop_data %>% 
  summarize(democrats = sum(party),
            other = n()- democrats,
            total = n(),
            democrats_prop = democrats/total,
            other_prop = other/total)

There is a function, prop.test(), that calculates the test and interval for us. However, it is not a true 1 proportion z test. Instead it runs a chi-squared test assuming that the distribution is binomial. We'll use the categorical_z function from our Stat231 package. The function has arguments of sample proportion, (assumed) population proportion and sample size.

$H_0: p = 0.50$

$H_1: p > 0.50$

$\alpha = 0.05 \rightarrow Z* = 1.645$

z <- categorical_z(dem_count$democrats_prop, 0.50, dem_count$total) #calculate z score
z # print z score

pval <- pnorm(z, lower.tail = FALSE) #calculate the p-value
pval #print the p-value

Using the classic method, our test statistic is not in the critical region so we fail to reject $H_0$. There is not enough evidence to support the claim that the majority of a randomly chosen congress is Democrat.

Using the p-value method, our p-value is much greater than alpha so we fail to reject $H_0$. There is not enough evidence to support the claim that the majority of a randomly chosen congress is Democrat.

If there is not a majority, what proportion of Democrats can we expect? This is where our confidence interval comes in. We'll still use $\alpha = 0.05$; however, $Z$ will change since we are now using two-tails. The function will change $Z$ for us behind the scenes.

prop_ci(dem_count$democrats_prop, 0.05, dem_count$total)

We can expect a randomly chosen congress to be between 41.28% Democrat and 49.73 % Other.

Means

For our means section Our test/interval will be to determine if there is a difference in money received by Democrats and Republicans, regardless of which lobby group the money came from.

To clean our data, we'll need to replace missing values with 0 (rationale discussed in lab/the screencast), create a new variable for the sum of money received and filter our the few Independent party observations.

summary_data <- Stat231::piracy %>% 
  mutate(money_pro = replace_na(money_pro, 0), 
         money_con = replace_na(money_con,0),
         money_total = money_pro+money_con) %>% 
  select(-c(money_pro, money_con)) %>%  
  filter(party!= "I")

Now we're ready to run our test.

t_test(summary_data, #choose the data frame
       formula = money_total~party, #define the relationship/variables for testing
       order = c("D","R"), #choose an order for the mean difference to be calculated
       mu = 0, #define the null hypothesis
       alternative = "two.sided", #define the alternative hypothesis
       var.equal = TRUE, #assume the populations have equal variance
       conf_int = TRUE, #will calculate a confidence interval for the mean difference as part of the test
       conf_level = 0.99) #defines the confidence/significance level

From the output, we can conclude there is a difference. Based on the interval is appears that Democrats received more money in relation to the anti-piracy legislation.

npaterno/stat231 documentation built on May 1, 2023, 6:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com