Home

/

GitHub

/

emeyers/SDS230

/

In emeyers/SDS230: Tools for the class Data Exploration and Analysis

$\$

Overview

Randomization test comparing two means
Permutation test comparing more than two means

$\$

# install.packages("latex2exp")

library(latex2exp)

#knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.width=6, fig.height=4) 


set.seed(230)

# get some images and data that are used in this document
SDS230::download_data("gingko_RCT.rda")
SDS230::download_data("alcohol.rda")
SDS230::download_image("gingko_pills.jpg")

# a function to get the MAD statistic
get_MAD_stat <- function(quantitative_data, grouping_data) {

 # we can use the by() function to get the means separately for each group
 group_means <- as.vector(by(quantitative_data, grouping_data, mean, na.rm = TRUE))

  total <- 0 
  num_group_pairs <- 0

  for (iGroup1 in 1:(length(group_means) - 1)) {
    for (iGroup2 in (iGroup1 + 1):(length(group_means))) {

      total <- total + abs(group_means[iGroup1] - group_means[iGroup2])
      num_group_pairs <- num_group_pairs + 1

    }
  }

  total/num_group_pairs

}  # end of the MAD function

$\$

Part 1: Randomization test for comparing two means in R

Let's us examine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills improves memory. To read the original paper see: https://jamanetwork.com/journals/jama/fullarticle/195207

Step 1: State the null and alternative hypotheses

$\$

Step 2a: Plot the data

load("gingko_RCT.rda")



# plot the data




# create a stripchart

$\$

Step 2b: Calculate the observed statistic

$\$

Step 3: Create the null distribution

# combine the data from the treatment and placebo groups together




# use a for loop to create shuffled treatment and placebo groups and shuffled statistics 



        # shuffle data


        # create fake treatment and control groups



        # save the statistic of interest

# plot the null distribution as a histogram

$\$

Step 4: Calculate a p-value

# plot the null distribution again with a red line a the value of the observed statistic




# calculate the p-value

$\$

Step 5: Make a decision

$\$

Part 2: Permutation test comparing more than two means in R

Is there differences between the average amount of beer that countries in different continents drink? Let's explore this more below!

Note: This question was inspired by these sources: - https://blog.minitab.com/en/michelle-paret/what-is-anova-and-who-drinks-the-most-beer - https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/

Please check out those websites if you are interested in reading more!

$\$

Step 1: State the null and alternative hypotheses

$\$

Step 2: Calculate the statistic of interest and visualize the data

load("alcohol.rda")



# look at the average number of beers consumed per capita in each country




# view a box plot of the data




# use the get_MAD_stat() defined at the top of this R Markdown document to get the observed statistic

$\$

Step 3: Create a null distribution

To create a null distribution, we need to create a statistic consistent with the null hypothesis, and then repeat the process many times.

We can create a MAD statistic consistent with the null hypothesis by shuffling the continent names and then recalculate the MAD statistic on the shuffled data. To shuffle the data we can use the sample() function.

Because calculating the MAD statistic is slow, let's only create 1,000 points in our null distribution.

# create the null distribution

$\$

Step 4: Calculate the p-value

# visualize the null distribution





# calculate the p-value

$\$

Step 5: Conclusion

$\$

What is the underlying population/random process we are making claims about???

Every time you do an analysis that involves statistical inference (e.g., every time you create a confidence interval, run a hypothesis tests, etc.) you are trying to answer a question about some larger underlying population or random process. For the hypothesis test run here that looks at the relationship between beer consumption and continents, what is the underlying population/process we are trying to make inferences about?

emeyers/SDS230 documentation built on Feb. 6, 2025, 4:55 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

emeyers/SDS230
Tools for the class Data Exploration and Analysis

In emeyers/SDS230: Tools for the class Data Exploration and Analysis

Overview

Part 1: Randomization test for comparing two means in R

Step 1: State the null and alternative hypotheses

Step 2a: Plot the data

Step 2b: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

Part 2: Permutation test comparing more than two means in R

Step 1: State the null and alternative hypotheses

Step 2: Calculate the statistic of interest and visualize the data

Step 3: Create a null distribution

Step 4: Calculate the p-value

Step 5: Conclusion

What is the underlying population/random process we are making claims about???

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS230 Tools for the class Data Exploration and Analysis

In emeyers/SDS230: Tools for the class Data Exploration and Analysis

Overview

Part 1: Randomization test for comparing two means in R

Step 1: State the null and alternative hypotheses

Step 2a: Plot the data

Step 2b: Calculate the observed statistic

Step 3: Create the null distribution

Step 4: Calculate a p-value

Step 5: Make a decision

Part 2: Permutation test comparing more than two means in R

Step 1: State the null and alternative hypotheses

Step 2: Calculate the statistic of interest and visualize the data

Step 3: Create a null distribution

Step 4: Calculate the p-value

Step 5: Conclusion

What is the underlying population/random process we are making claims about???

R Package Documentation

Browse R Packages

We want your feedback!

emeyers/SDS230
Tools for the class Data Exploration and Analysis