In emeyers/SDS230: Tools for the class Data Exploration and Analysis

$\$

# install.packages("latex2exp")

library(latex2exp)

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width=6, fig.height=4) 


set.seed(230)

# get some images that are used in this document
SDS230::download_image("gingko_pills.jpg")

Overview

Hypothesis tests for a single proportion in R
Hypothesis tests for two means

$\$

Part 1: Running a randomization test for a single proportion in R

$\$

Part 1.1: Is it possible to smell whether someone has Parkinson's disease?

Joy Milne claimed to have the ability to smell whether someone had Parkinson’s disease.

To test this claim, researchers gave Joy 6 shirts that had been worn by people who had Parkinson’s disease and 6 people who did not.

Joy identified 11 out of the 12 shirts correctly.

Let's run a hypothesis test to assess whether there is significant evidence to suggest that Joy can really could smell whether someone has Parkinson's disease.

$\$

Step 1: State the null and alternative hypotheses in symbols and words, and set up the rules of the game

In words:

Null hypothesis: Joy's ability to smell whether someone has Parkinson's disease is a chance guess of 1/2
Alternative hypothesis: Joy can smell whether someone has Parkinson's disease at a level greater than chance of 1/2

Using symbols

$H_0: \pi = 0.5$ $H_A: \pi > 0.5$

Rules of the game

If there is a less than 5% probability we would get a random statistic as or more extreme than the observed statistic if $H_0$ was true, then we will reject $H_0$ and say that $H_A$ is likely to be true.

$\alpha = 0.05$

$\$

Step 2: Calculate the observed statistic

(obs_stat <- 11/12)

$\$

Step 3: Create the null distribution

flip_sims <- rbinom(10000, 12, .5)

flip_sims_prop <- flip_sims/12

barplot(table(flip_sims), 
        xlab = "Number of heads (i.e., correct guesses)", 
        ylab = "Number of simulations",
        main = "10,000 simulations of 12 coin flips")

hist(flip_sims_prop, breaks = 100)

table(flip_sims)

$\$

Step 4: Calculate a p-value

(p_value <- sum(flip_sims_prop >= obs_stat)/length(flip_sims))

$\$

Step 5: Make a decision

Since r p_value is less than $\alpha = 0.05$ we can reject the null hypothesis (and perhaps say the results are "statistically significant").

$\$

Questions

Do you believe Joy can really smell Parkinson's disease?
- What about after you read this article?
Is it better to report the actual p-value or just whether we rejected the null hypothesis $H_0$?

$\$

Part 2: Permutation tests for comparing two means in R

Let's us examine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills improves memory. To read the original paper see: https://jamanetwork.com/journals/jama/fullarticle/195207

Step 1: State the null and alternative hypotheses

$H_0: \mu_{gingko} - \mu_{control} = 0$ $H_A: \mu_{gingko} - \mu_{control} \ne 0$

$\alpha = 0.05$

$\$

Step 2a: Plot the data

load("gingko_RCT.rda")


# plot the data
boxplot(gingko, placebo, 
        names = c("Gingko", "Placebo"),
        ylab = "Memory score")



# create a stripchart
data_list <- list(gingko, placebo) 


stripchart(data_list, 
           group.names = c("Gingko", "Placebo"), 
           method = "jitter",
           xlab = "Memory score", 
           col = c("red", "blue"))

$\$

Step 2b: Calculate the observed statistic

(obs_stat <- mean(gingko) - mean(placebo))

$\$

Step 3: Create the null distribution

# combine the data from the treatment and placebo groups together
combined_data <- c(gingko, placebo)
n_gingko <- length(gingko)
total <- length(combined_data)


# use a for loop to create shuffled treatment and placebo groups and shuffled statistics 
null_distribution <-  NULL
for (i in 1:10000) {

       # shuffle data
        shuff_data <- sample(combined_data)

        # create fake treatment and control groups
        shuff_gingko   <-  shuff_data[1:n_gingko]
        shuff_placebo  <-  shuff_data[(n_gingko + 1):total]

        # save the statistic of interest
        null_distribution[i] <- mean(shuff_gingko) - mean(shuff_placebo)


}


# plot the null distribution as a histogram
hist(null_distribution, 
     breaks = 20,
     main = "Null distribution", 
     xlab = TeX("$\\bar{x}_{shuff-gingko} - \\bar{x}_{shuff-placebo}$"))

$\$

Step 4: Calculate a p-value

# plot the null distribution again with a red line a the value of the observed statistic
hist(null_distribution, 
     breaks = 20,
     main = "Null distribution", 
     xlab = TeX("$\\bar{x}_{shuff-gingko} - \\bar{x}_{shuff-placebo}$"))


abline(v = obs_stat, col = "red")



# calculate the p-value
(p_value_left_tail <- sum(null_distribution <= obs_stat)/10000)
(p_value_right_tail <- sum(null_distribution >= abs(obs_stat))/10000)

(p_value <- p_value_left_tail + p_value_right_tail)

$\$

Step 5: Make a decision

Since r p_value is greater than $\alpha = 0.05$ we can not reject the null hypothesis. Thus if we are using the Neyman-Pearson paradigm, we do not have sufficient evidence to say that the pill is effective.

$\$

emeyers/SDS230 documentation built on Feb. 6, 2025, 4:55 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

emeyers/SDS230
Tools for the class Data Exploration and Analysis

In emeyers/SDS230: Tools for the class Data Exploration and Analysis

Overview

Part 1: Running a randomization test for a single proportion in R

Part 1.1: Is it possible to smell whether someone has Parkinson's disease?

Step 1: State the null and alternative hypotheses in symbols and words, and set up the rules of the game

Step 2: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

Part 2: Permutation tests for comparing two means in R

Step 1: State the null and alternative hypotheses

Step 2a: Plot the data

Step 2b: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS230 Tools for the class Data Exploration and Analysis

In emeyers/SDS230: Tools for the class Data Exploration and Analysis

Overview

Part 1: Running a randomization test for a single proportion in R

Part 1.1: Is it possible to smell whether someone has Parkinson's disease?

Step 1: State the null and alternative hypotheses in symbols and words, and set up the rules of the game

Step 2: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

Part 2: Permutation tests for comparing two means in R

Step 1: State the null and alternative hypotheses

Step 2a: Plot the data

Step 2b: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS230
Tools for the class Data Exploration and Analysis