$\$
$\$
# install.packages("latex2exp") library(latex2exp) #knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(fig.width=6, fig.height=4) set.seed(230)
# get some images and data that are used in this document SDS230::download_data("gingko_RCT.rda") SDS230::download_data("alcohol.rda") SDS230::download_image("gingko_pills.jpg")
# a function to get the MAD statistic get_MAD_stat <- function(quantitative_data, grouping_data) { # we can use the by() function to get the means separately for each group group_means <- as.vector(by(quantitative_data, grouping_data, mean, na.rm = TRUE)) total <- 0 num_group_pairs <- 0 for (iGroup1 in 1:(length(group_means) - 1)) { for (iGroup2 in (iGroup1 + 1):(length(group_means))) { total <- total + abs(group_means[iGroup1] - group_means[iGroup2]) num_group_pairs <- num_group_pairs + 1 } } total/num_group_pairs } # end of the MAD function
$\$
Let's us examine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills improves memory. To read the original paper see: https://jamanetwork.com/journals/jama/fullarticle/195207
$H_0: \mu_{Gingko} - \mu_{Placebo} = 0$ $H_A: \mu_{Gingko} - \mu_{Placebo} \ne 0$
$\alpha = 0.05$
$\$
load("gingko_RCT.rda") # plot the data boxplot(gingko, placebo, names = c("Gingko", "Placebo")) # create a stripchart data_list <- list(gingko, placebo) stripchart(data_list, group.names = c("Gingko", "Placebo"), col = c("red", "blue"), method = "jitter")
$\$
(obs_stat <- mean(gingko) - mean(placebo))
$\$
# combine the data from the treatment and placebo groups together combo_data <- c(gingko, placebo) n_gingko <- length(gingko) n_total <- length(combo_data) # use a for loop to create shuffled treatment and placebo groups and shuffled statistics null_dist <- NULL for (i in 1:10000) { # shuffle data shuff_data <- sample(combo_data) # create fake treatment and control groups shuff_treat <- shuff_data[1:n_gingko] shuff_control <- shuff_data[(n_gingko + 1):n_total] # save the statistic of interest null_dist[i] <- mean(shuff_treat) - mean(shuff_control) }
# plot the null distribution as a histogram hist(null_dist, breaks = 20) abline(v = obs_stat, col = "red") abline(v = -1 * obs_stat, col = "red")
$\$
# plot the null distribution again with a red line a the value of the observed statistic # calculate the p-value (pval_left <- sum(null_dist <= obs_stat)/length(null_dist)) (pval_right <- mean(null_dist >= -1 * obs_stat)) (pval <- pval_left + pval_right)
$\$
$\$
$\$
Is there differences between the average amount of beer that countries in different continents drink? Let's explore this more below!
Note: This question was inspired by these sources: - https://blog.minitab.com/en/michelle-paret/what-is-anova-and-who-drinks-the-most-beer - https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/
Please check out those websites if you are interested in reading more!
$\$
$\$
load("alcohol.rda") # look at the average number of beers consumed per capita in each country # view a box plot of the data # use the get_MAD_stat() defined at the top of this R Markdown document to get the observed statistic
$\$
To create a null distribution, we need to create a statistic consistent with the null hypothesis, and then repeat the process many times.
We can create a MAD statistic consistent with the null hypothesis by shuffling the continent names and then recalculate the MAD statistic on the shuffled data. To shuffle the data we can use the sample()
function.
Because calculating the MAD statistic is slow, let's only create 1,000 points in our null distribution.
# create the null distribution
$\$
# visualize the null distribution # calculate the p-value
$\$
$\$
Every time you do an analysis that involves statistical inference (e.g., every time you create a confidence interval, run a hypothesis tests, etc.) you are trying to answer a question about some larger underlying population or random process. For the hypothesis test run here that looks at the relationship between beer consumption and continents, what is the underlying population/process we are trying to make inferences about?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.