# In emeyers/SDS230: Tools for the class Data Exploration and Analysis

$\$

# install.packages("latex2exp")

library(latex2exp)

knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(fig.width=6, fig.height=4)

set.seed(230)

# get some images that are used in this document



# Overview

• Hypothesis tests for a single proportion in R
• Hypothesis tests for two means

$\$

## Part 1: Running a randomization test for a single proportion in R

$\$

### Part 1.1: Is it possible to smell whether someone has Parkinson's disease?

Joy Milne claimed to have the ability to smell whether someone had Parkinson’s disease.

To test this claim, researchers gave Joy 6 shirts that had been worn by people who had Parkinson’s disease and 6 people who did not.

Joy identified 11 out of the 12 shirts correctly.

Let's run a hypothesis test to assess whether there is significant evidence to suggest that Joy can really could smell whether someone has Parkinson's disease.

$\$

### Step 1: State the null and alternative hypotheses in symbols and words, and set up the rules of the game

In words:

• Null hypothesis: Joy's ability to smell whether someone has Parkinson's disease is a chance guess of 1/2
• Alternative hypothesis: Joy can smell whether someone has Parkinson's disease at a level greater than chance of 1/2

Using symbols

$H_0: \pi = 0.5$ $H_A: \pi > 0.5$

Rules of the game

If there is a less than 5% probability we would get a random statistic as or more extreme than the observed statistic if $H_0$ was true, then we will reject $H_0$ and say that $H_A$ is likely to be true.

$\alpha = 0.05$

$\$

### Step 2: Calculate the observed statistic

(obs_stat <- 11/12)


$\$

### Step 3: Create the null distribution

flip_sims <- rbinom(10000, 12, .5)

flip_sims_prop <- flip_sims/12

barplot(table(flip_sims),
xlab = "Number of heads (i.e., correct guesses)",
ylab = "Number of simulations",
main = "10,000 simulations of 12 coin flips")

hist(flip_sims_prop, breaks = 100)

table(flip_sims)


$\$

### Step 4: Calculate a p-value

(p_value <- sum(flip_sims_prop >= obs_stat)/length(flip_sims))


$\$

### Step 5: Make a decision

Since r p_value is less than $\alpha = 0.05$ we can reject the null hypothesis (and perhaps say the results are "statistically significant").

$\$

Questions

1. Do you believe Joy can really smell Parkinson's disease?

2. Is it better to report the actual p-value or just whether we rejected the null hypothesis $H_0$?

$\$

## Part 2: Permutation tests for comparing two means in R

Let's us examine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills improves memory. To read the original paper see: https://jamanetwork.com/journals/jama/fullarticle/195207

### Step 1: State the null and alternative hypotheses

$H_0: \mu_{gingko} - \mu_{control} = 0$ $H_A: \mu_{gingko} - \mu_{control} \ne 0$

$\alpha = 0.05$

$\$

### Step 2a: Plot the data

load("gingko_RCT.rda")

# plot the data
boxplot(gingko, placebo,
names = c("Gingko", "Placebo"),
ylab = "Memory score")

# create a stripchart
data_list <- list(gingko, placebo)

stripchart(data_list,
group.names = c("Gingko", "Placebo"),
method = "jitter",
xlab = "Memory score",
col = c("red", "blue"))


$\$

### Step 2b: Calculate the observed statistic

(obs_stat <- mean(gingko) - mean(placebo))


$\$

### Step 3: Create the null distribution

# combine the data from the treatment and placebo groups together
combined_data <- c(gingko, placebo)
n_gingko <- length(gingko)
total <- length(combined_data)

# use a for loop to create shuffled treatment and placebo groups and shuffled statistics
null_distribution <-  NULL
for (i in 1:10000) {

# shuffle data
shuff_data <- sample(combined_data)

# create fake treatment and control groups
shuff_gingko   <-  shuff_data[1:n_gingko]
shuff_placebo  <-  shuff_data[(n_gingko + 1):total]

# save the statistic of interest
null_distribution[i] <- mean(shuff_gingko) - mean(shuff_placebo)

}

# plot the null distribution as a histogram
hist(null_distribution,
breaks = 20,
main = "Null distribution",
xlab = TeX("$\\bar{x}_{shuff-gingko} - \\bar{x}_{shuff-placebo}$"))


$\$

### Step 4: Calculate a p-value

# plot the null distribution again with a red line a the value of the observed statistic
hist(null_distribution,
breaks = 20,
main = "Null distribution",
xlab = TeX("$\\bar{x}_{shuff-gingko} - \\bar{x}_{shuff-placebo}$"))

abline(v = obs_stat, col = "red")

# calculate the p-value
(p_value_left_tail <- sum(null_distribution <= obs_stat)/10000)
(p_value_right_tail <- sum(null_distribution >= abs(obs_stat))/10000)

(p_value <- p_value_left_tail + p_value_right_tail)


$\$

### Step 5: Make a decision

Since r p_value is greater than $\alpha = 0.05$ we can not reject the null hypothesis. Thus if we are using the Neyman-Pearson paradigm, we do not have sufficient evidence to say that the pill is effective.

$\$ emeyers/SDS230 documentation built on Jan. 13, 2023, 5:16 a.m.