adventr: hypothesis testing
In adventr: Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

library(learnr)
library(tidyverse)
library(BayesFactor)
library(effsize)


knitr::opts_chunk$set(echo = FALSE, warning = FALSE)
tutorial_options(exercise.cap = "Exercise")

#Read dat files needed for the tutorial

teddy_tib <- adventr::teddy_dat;
zhang_female_tib <- adventr::zhang_female_dat

#setup objects for code blocks

teddy_sum <- teddy_tib %>% dplyr::group_by(study_n, group) %>% dplyr::summarize(
  mean = mean(self_esteem),
  sd = sd(self_esteem),
  ci_low = ggplot2::mean_cl_normal(self_esteem)$ymin,
  ci_upp = ggplot2::mean_cl_normal(self_esteem)$ymax
)

An Adventure in R: Hypothesis testing

Overview

This tutorial is one of a series that accompanies An Adventure in Statistics [@RN10163] by me, Andy Field. These tutorials contain abridged sections from the book so there are some copyright considerations but I offer them under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, ^[Basically you can use this tutorial for teaching and non-profit activities but do not meddle with it or claim it as your own work.]

Who is the tutorial aimed at?
- Anyone teaching from or reading An Adventure in Statistics may find them useful.
What is covered?
- This tutorial looks at how to compute effect sizes and Bayes factors for a basic research design that compares two groups. It would be a useful tutorial to run alongside teaching based on Chapter 11 of An Adventure in Statistics.
- This tutorial does not teach the background theory: it is assumed you have either attended my lecture or read the relevant chapter in the aforementioned books (or someone else's)
- The aim of this tutorial is to augment the theory that you already know by guiding you through fitting linear models using R and RStudio and asking you questions to test your knowledge along the way.

Story précis

Why a précis?

Because these tutorials accompany my book An adventure in statistics, which uses a fictional narrative to teach the statistics, some of the examples might not make sense unless you know something about the story. For those of you who don't have the book I begin each tutorial with a précis of the story. If you're not interested then fair enough - click past this section.

General context for the story

It is the future. Zach, a rock musician and Alice, a geneticist, who have been together since high school live together in Elpis, the ‘City of Hope’.

Zach and Alice were born in the wake of the Reality Revolution which occurred after a Professor Milton Gray invented the Reality Prism – a transparent pyramid worn on the head – that brought honesty to the world. Propaganda and media spin became unsustainable, religions collapsed, advertising failed. Society could no longer be lied to. Everyone could know the truth about anything that they could look at. A gift, some said, to a previously self-interested, self-obsessed society in which the collective good had been eroded.

But also a curse. For, it soon became apparent that through this Reality Prism, people could no longer kid themselves about their own puffed-up selves as they could see what they were really like – by and large, pretty ordinary. And this caused mass depression. People lost faith in themselves. Artists abandoned their pursuits, believing they were untalented and worthless.

Zach and Alice have never worn a Reality Prism and have no concept of their limitations. They were born after the World Governance Agency (WGA) destroyed all Reality Prisms, along with many other pre-revolution technologies, with the aim of restoring community and well-being. However, this has not been straightforward and in this post-Prism world, society has split into pretty much two factions

The Chippers who have had WiFi-enabled chips implanted into their brains, enabling them to record and broadcast what they see and think in real time; upload memories for future generations into a newly-created memoryBank and live-stream music and films directly into their brains.
The Clocktarians, followers of the old pre-Prism ways who use steam punk style technologies, who have elected not to have chips in their brains, regarded by the Chippers as backward-looking stuck in a ‘clockwork, Victorian society’.

Everyone has a star, a limitless space on which to store their digital world.

Zach and Alice are Clocktarians. Their technology consists mainly of:

A Proteus, a device made from programmable matter that can transform shape and function simply by the owners’ wishes. Zach calls his a diePad, in the shape of a tombstone in an ironic reference to an over-reliance on technology at the expense of memory.
A Reality Checker, a clockwork mechanism that, at the point of critical velocity, projects an opaque human head that is linked to everything and can tell you anything. Every head has a personality and Zach’s is a handsome, laid back ‘dude’ who is like an electronic friend, who answers questions if he feels like it and often winds Zach up by giving him false information. And he often flirts with Alice.

Main Protagonists

Zach
- Rock musician in band called The Reality Enigma.
- Spellbinding performer, has huge fan-base.
- Only people living in Elpis get to see The Reality Enigma in the flesh. Otherwise all performances are done via an oculus riff, a multisensory headset for experiencing virtual gigs.
- Zach’s music has influenced and changed thousands of lives.
- Wishes he had lived pre-Revolutionary times, the turn of the 21st Century, a golden age for music when bands performed in reality at festivals.
- Kind, gentle and self-doubting.
- Believes science and maths are dull and uninspiring. Creates a problem between him and Alice as she thinks that because he isn’t interested in science, he isn’t interested in her. Leads to lots of misunderstandings between them.
Alice
- Shy, lonely, academically-gifted – estranged from the social world until she met Zach in the college library.
- Serious scientist, works at the Beimeni Centre of Genetics.
- At 21, won the World Science Federation’s Einstein Medal for her genetics research
- Desperately wants Zach to get over his fear of science so he can open his mind to the beauty of it.

How Zach's adventure begins

Alice has been acting strangely, on edge for weeks, disconnected and uncommunicative, as if she is hiding something and Zach can’t get through to her. Arriving home from band practice, unusually, she already home and listening to an old album that the two of them enjoyed together, back in a simpler, less complicated time in their relationship. During an increasingly testy evening, that involves a discussion with the Head about whether or not a Proteus causes brain cancer, Alice is interrupted by an urgent call which she takes in private. She returns looking worried and is once again, distracted. She tells Zach that she has ‘a big decision to make’. Before going to bed, Zach asks her if he can help with the decision but she says he ‘already has’, thanking him for making ‘everything easier.’ He has no idea what she means and goes to sleep, uneasy.

On waking, Zach senses that something is wrong. And he is right. Alice has disappeared. Her clothes, her possessions and every photo of them together have gone. He can’t get hold of any of her family or friends as their contact information is stored on her Proteus, not on his diePad. He manages to contact the Beimeni Centre but is told that no one by the name of Alice Nightingale has ever worked there. He logs into their constellation but her star has gone. He calls her but finds that her number never existed. She has, thinks Zach, been ‘wiped from the planet.’ He summons The Head but he can’t find her either. He tells Zach that there are three possibilities: Alice has doesn’t want to be found, someone else doesn’t want her to be found or she never existed.

Zach calls his friend Nick, fellow band member and fan of the WGA-installed Repositories, vast underground repositories of actual film, books, art and music. Nick is a Chipper – solely for the purpose of promoting the band using memoryBank – and he puts the word out to their fans about Alice missing.

Thinking as hard as he can, Zach recalls the lyrics of the song she’d been playing the previous evening. Maybe they are significant? It may well be a farewell message and the Head is right. In searching for clues, he comes across a ‘memory stone’ which tells him to read what’s on there. File 1 is a research paper that Zach can’t fathom. It’s written in the ‘language of science’ and the Head offers to help Zach translate it and tells him that it looks like the results of her current work were ‘gonna blow the world’. Zach resolves to do ‘something sensible’ with the report.

Zach doesn’t want to believe that Alice has simply just left him. Rather, that someone has taken her and tried to erase her from the world. He decides to find her therapist, Dr Murali Genari and get Alice’s file. As he breaks into his office, Dr Genari comes up behind him and demands to know what he is doing. He is shaking but not with rage – with fear of Zach. Dr Genari turns out to be friendly and invites Zach to talk to him. Together they explore the possibilities of where Alice might have gone and the likelihood, rating her relationship satisfaction, that she has left him. During their discussion Zach is interrupted by a message on his diePad from someone called Milton. Zach is baffled as to who he is and how he knows that he is currently discussing reverse scoring. Out of the corner of his eye, he spots a ginger cat jumping down from the window ledge outside. The counsellor has to go but suggests that Zach and ‘his new friend Milton’ could try and work things out.

Packages and data

Packages

This tutorial uses the following packages:

BayesFactor [@RN9444] to compute Bayes factors
effsize [@RN11406] to compute Cohen's d
Hmisc [@RN11417] to compute confidence intervals
tidyverse [@RN11407] for general data processing

These packages are automatically loaded within this tutorial. If you are working outside of this tutorial (i.e. in RStudio) then you need to make sure that the package has been installed by executing install.packages("package_name"), where package_name is the name of the package. If the package is already installed, then you need to reference it in your current session by executing library(package_name), where package_name is the name of the package.

Data

This tutorial has the data files pre-loaded so you shouldn't need to do anything to access the data from within the tutorial. However, if you want to play around with what you have learnt in this tutorial outside of the tutorial environment (i.e. in a stand-alone RStudio session) you will need to download the data files and then read them into your R session. This tutorial uses the following file:

You can load the files in several way (using the first file as an example):

Assuming that you follow the workflow recommended in the tutorial adventr_02 (see also this online tutorial), you can load the data into an object called teddy_tib by executing:
- teddy_tib <- readr::read_csv("../data/ais_10_teddy_therapy.csv")
- If you don't follow my suggested workflow, you will adjust the file location in the above command.
Alternatively, if you have an internet connection (and my server has not exploded!) load the file direct from the URL by executing:
- teddy_tib <- readr::read_csv("http://www.discoveringstatistics.com/repository/ais_data/ais_10_teddy_therapy.csv")

Adapt the above instructions to load the second data file into a tibble called zhang_female_tib.

A cuddly example

Zach has just escaped from the Secret Philanthropic Society, who seem to do bizarre rituals that involve their members shoving their hands into boxes that may or may not contain a lethal gernal worm. They use probability to determine whether the box will be empty. He meets Emily, the member whose ritual he witnessed. She is riddled with doubts about the societies methods, but is powerless to escape. With Zach's help she does, and they are led by a strange moon-faced druid into a tombstone. It's never a good idea idea to follow moon-faced druids into tombstones. Nevertheless they do. He encounters the Doctrine of Chance led by Sister Price. They explain alternative methods to test evidence: specifically effect sizes and Bayes factors.

We're going to use the data in the tibble called teddy_tib, which contains data from two studies looking at whether cuddling a teddy bear (compared to the cardboard box that the teddy was packaged in) affects self-reported self-esteem. This tibble contains 4 variables:

id: Participant ID
group: whether the participant cuddled a teddy bear or the cardboard box that contained the teddy bear
study_n: Factor that distinguishes the two studies in the data set. The first was based on a total N of 20 and the second was based on a total N of 200
self_esteem: Self-reported self-esteem scores

Using what you have learnt to date, can you create a tibble called teddy_sum containing the means, standard deviations and confidence intervals for each group across the two studies. (If you're doing this outside of the tutorial remember to load tidyverse [@RN11407] and Hmisc [@RN11417]).

teddy_sum <- teddy_tib %>%
  dplyr::group_by(study_n, group) %>%
  dplyr::summarize(
    mean = mean(self_esteem),
    sd = sd(self_esteem),
    ci_low = ggplot2::mean_cl_normal(self_esteem)$ymin,
    ci_upp = ggplot2::mean_cl_normal(self_esteem)$ymax
    )

Now try plotting an error bar graph using facet_wrap() to display graphs for the two studies side by side. use coord_cartesian() to set the y-limits to be 0-20, scale_y_continuous() to set ticks along the axis every two (e.g., 0, 2, 4, 6 ....), and labs() to define labels foe the x- and y-axes.

ted_plot <- ggplot2::ggplot(teddy_tib, aes(group, self_esteem))
ted_plot +
  stat_summary(fun.data = "mean_cl_normal", size = 1) +
  facet_wrap(~ study_n) +
  coord_cartesian(ylim = c(0,20)) + 
  scale_y_continuous(breaks = seq(0, 20, 2)) +
  labs(x = "Experimental condition", y = "Self-esteem (0-20)") +
  theme_bw()

Effect sizes

Cohen's d

A useful measure of effect size is Cohen’s d, which is the difference between two means divided by some estimate of the standard deviation of those means:

$$ \hat{d} = \frac{\bar{X_1}-\bar{X_2}}{s} $$

I have put a hat on the d to remind us that we’re really interested in the effect size in the population, but because we can’t measure that directly, we estimate it from the samples (The hat means ‘estimate of’). By dividing by the standard deviation we are expressing the difference in means in standard deviation units (a bit like a z–score). The standard deviation is a measure of ‘error’ or ‘noise’ in the data, so d is effectively a signal-to-noise ratio. However, if we’re using two means, then there will be a standard deviation associated with each of them so which one should we use? There are three choices:

If one of the group is a control group it makes sense to use that groups standard deviation to compute d (the argument being that the experimental manipulation might affect the standard deviation of the experimental group, so the control group SD is a ‘purer’ measure of natural variation in scores)
Sometimes we assume that group variances (and therefore standard deviations) are equal (homogeneity of variance) and if they are we can pick a standard deviation from either of the groups because it won’t matter.
We use what’s known as a ‘pooled estimate’, which is the weighted average of the two group variances. This is given by the following equation:

$$ s_p = \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} $$

Say we wanted to estimate d for the difference between self-esteem after cuddling the teddy compared to the box (control), assuming you successfully created the tibble (teddy_sum) then you should have this information:

knitr::kable(teddy_sum, caption = "Summary statistics for the teddy and box groups within the two experiments")

We have a logical control group so d is simply:

$$ \hat{d}_\text{teddy vs box} = \frac{15-10}{5.96} = 0.839 $$

If we use the pooled s then we'd get:

$$ \begin{aligned} \ s_p &= \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \ \ &= \sqrt{\frac{(10-1)5.96^2+(10-1)6.02^2}{10+10-2}} \ \ &= \sqrt{\frac{645.86}{18}} \ \ &= 5.99 \end{aligned} $$

Basically, because the group standard deviations are more or less the same (5.96 and 6.02) the pooled estimate is very similar too (it falls between those two values). Consequently, the resulting effect size is not going to change much when we use the pooled estimate:

$$ \hat{d}_\text{teddy vs box} = \frac{15-10}{5.99} = 0.835 $$

Let's look at computing Cohen's d using the pooled estimate in R. We're going to do this separately for the two studies in out tibble (teddy_tib). As a brief recap, this tibble contains 4 variables:

id: Participant ID
group: whether the participant cuddled a teddy bear or the cardboard box that contained the teddy bear
study_n: Factor that distinguishes the two studies in the data set. The first was based on a total N of 20 and the second was based on a total N of 200. We're going to use this variable to dplyr::filter() the tibble before we compute d.
self_esteem: Self-reported self-esteem scores

Like many things in R, there are numerous packages that offer functions to compute effect sizes. We're going to use the cohen.d() function from the effsize package [@RN11406] primarily because it computes it from the raw data (rather than from summary statistics) and because it has an option to compute it in designs that use repeated measures (i.e. the means across conditions are dependent). The general format of this function is:

cohen.d(formula = outcome ~ group_variable, pooled = TRUE, paired = FALSE, na.rm = FALSE, hedges.correction = FALSE, conf.level = 0.95)

The arguments within this function are:

formula = outcome ~ group_variable: this allows us to specify the outcome variable (in this case self_esteem) and the variable that defines the two groups (in this case group). This format of specifying a model as outcome ~ predictor will become very familiar to you as we progress through these tutorials.
pooled = TRUE: by default the pooled standard deviation, $s_p$, will be used. If you set this option to FALSE then the standard deviation of the whole sample will be used. Typically you'd leave this option alone (i.e., use the pooled estimate).
paired = FALSE: by default the function considers scores as coming from different entities (i.e. an independent designs). That's the design our teddy bear experiments use. However, if you have a repeated measures design (i.e. in our example participants cuddle both a teddy and a box at different points in time) then set paired to TRUE.
na.rm = FALSE: we've seen this option before. By default missing values are not removed and you should set this option to TRUE if you have missing values.
hedges.correction = FALSE: the option determines whether a correction is applied to d. This converts d to something called Hedges' g. By default the option isn't applied, but it is a good to apply it.
conf.level = 0.95: by default the confidence interval will be a 95% one, but you can adjust this option to change the width of the confidence interval. The default is fine.

The function returns an object that contains the value of d, the confidence interval and some other information. We're going to use a pipe (%>%) to link:

teddy_tib: our input will be the tibble of raw data
dplyr::filter(study_n == "Total N = 20"): we're going to filter that tibble to extract the experiment based on a sample size of 20
cohen.d() the function described above will compute the effect size.

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 20") %>%
  effsize::cohen.d(formula = self_esteem ~ group, data = .)

This command takes the teddy_tib tibble, extracts the study based on 20 participants, then applies the cohen.d() function. Within this function we have left the defaults as they are, our formula specifies that we're predicting self_esteem from the variable group (which defines whether the individual hugs a teddy or a box), and we specify the data within the function with a period (data = .). By using a period we're telling the function to use the data coming through from the pipe. Copy this command in the code box below and run the code. Then edit it to apply Hedges' correction, then edit the code again to get the effect size for the experiment that had 200 participants.

# To get the Hedges' correction we'd add hedges.correction = T:

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 20") %>%
  effsize::cohen.d(formula = self_esteem ~ group, data = ., hedges.correction = T)

# To get effectsizes we'd change the filter() function to extract the study based on 200 participants:

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 200") %>%
  effsize::cohen.d(formula = self_esteem ~ group, data = ., hedges.correction = T)

Applying the Hedges' correction to both studies, we'd get these outputs:

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 20") %>%
  effsize::cohen.d(formula = self_esteem ~ group, data = ., hedges.correction = T)

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 200") %>%
  effsize::cohen.d(formula = self_esteem ~ group, data = ., hedges.correction = T)

This shows that for the smaller study, the effect of cuddling a teddy compared to a box on self-esteem was g = -0.80 [-1.78, 0.18]. Whether d (or g) has a positive or negative sign reflects which way around you subtracted the group means. The teddy group had a mean of 15 and and box group a mean of 10. The difference between these means will be 5 if you subtract the box mean from the teddy mean (15 - 10 = 5), but -5 if you subtract the teddy mean from the box mean (10 - 15 = -5). So, to interpret the effect size look at the group means (not the plus of minus sign of g). In this case, self-esteem is 0.80 of a standard deviation higher after cuddling a teddy than after cuddling a box. The confidence interval for this effect size suggests (if we assume this sample is one of the 95% that contain the population value) that the population effect could range between -1.78 and 0.18. Crucially this means the effect could be 0, and could reflect higher self-esteem in the box group OR in the teddy group.

  question("In the larger study the confidence interval ranged from -1.13 to -0.54. What does this tell us?",
    answer("If this confidence interval is one of the 95% that contains the population value then the population value of the difference between group means lies between -1.13 to -0.54.", correct = TRUE),
    answer("There is a 95% chance that the population value of the difference between group means lies between -1.13 to -0.54.", message = "You cannot make probability statements from a confidence interval. We don't know whether this particular CI is one of the 95% that contains the population value of the difference between means."),
    answer("The probability of this confidence interval containing the population value is 0.95.", message = "The probability of this confidence interval containing the population value is either 0 (it doesn't) or 1 (it does) but it's impossible to know which."),
    answer("I can be 95% confident that the population value of the difference between group means lies between -1.13 to -0.54.", message = "Confidence intervals do not quantify your subjective confidence."),
    correct = "Correct - well done!",
    random_answer_order = TRUE,
    allow_retry = T
  )

An optional tricky section

I've struggled to find a function that will automatically calculate d using the standard deviation of the control group. However, it is simple enough to get R to subtract two values and then divide by another value. For example, to add x and y and then divide by z you could execute (x + y)/z. Simple. What's not simple is re-arranging the data so that we have the group means in columns. It can be done but involves restructuring the data in a crazy pipe involving lots of functions that we haven't learnt! I'm going to just plough through how it's done, explain what everything does but not cover the new functions in detail. So, by all means ignore this section if you like.

Earlier we created a tibble called teddy_sum that includes the means, standard deviations and confidence intervals of those means. The problem is that the means and sds for the box and teddy groups are in different rows and we want them in different columns so that we can subtract them. This is to remind you of what the tibble currently contains:

knitr::kable(teddy_sum, caption = "Summary statistics for the teddy and box groups within the two experiments")

I'm going to restructure this tibble so that all of the information from a given experiment is in a single row (that is, create columns contain the means, sds and confidence intervals for each group). That's the hard part. Then I will add a column that compotes d (that's the easy part!).

Restructuring data using the tidyverse package tidyr is something that I find endlessly confusing, so if you find it confusing too then don't worry - you're in good company! To restructure the data we're going to use this pipe:

teddy_wide <-  teddy_sum %>%
  tidyr::gather(measure, value, -c(study_n, group)) %>%
  tidyr::unite(gp_measure, group, measure) %>% 
  tidyr::spread(gp_measure, value)

It's a pipe that could carry gas across the Atlantic. Let's break it down.

teddy_sum %>%: we begin with the teddy_sum tibble and carry that into the next function using the pipe operator (%>%).
tidyr::gather(measure, value, -c(study_n, group)): this takes each variable in teddy_sum and places its value in a variable called value (you can name it what you like, I chose 'value') and creates a variable called measure (again, you can name it differently) in which it identifies to what the value relates. For example, the if the value is from the column mean then the value placed in measure will be the word 'mean'. Crucially, I have used -c(study_n, group) to exclude the variables study_n and group from this process. These variables retain their existing values. To get a feel for what's happening execute the pipe just to this point in the code box (i.e., execute teddy_sum %>% tidyr::gather(measure, value, -c(study_n, group))). Essentially the variables mean, sd, ci_low and ci_upp which were in 4 columns have been replaced by two columns: one (measure) contains the name of the original column and the other (value) contains the value.
tidyr::unite(gp_measure, group, measure) is used to unite the variables group and measure into a single variable called gp_measure (again, choose a different name if you like). It basically combines the values from the columns and separates them with an underscore. For example, if the value of group is Box and the value of measure is mean then the resulting value in gp_measure will be Box_mean. In effect, we're creating a variable to contain column names to use when we transform the data back. Again, if you want to see what's going on use the code box to run the pipe up to this point (teddy_sum %>% tidyr::gather(measure, value, -c(study_n, group)) %>% tidyr::unite(gp_measure, group, measure)).
tidyr::spread(gp_measure, value): Finally, we spread the data back out into separate columns using gp_measure to define the column names and value to define the values placed with each column.

teddy_wide <-  teddy_sum %>%
  tidyr::gather(measure, value, -c(study_n, group)) %>%
  tidyr::unite(gp_measure, group, measure) %>% 
  tidyr::spread(gp_measure, value)
teddy_wide

Run the full pipe to create this new tibble (which I've called teddy_wide). Execute the name of the tibble to look at it. The resulting tibble contains these variables

study_n: Factor that distinguishes the two studies in the data set. The first was based on a total N of 20 and the second was based on a total N of 200.
Box_ci_low: lower limit of the confidence interval in the box group.
Box_ci_upp: upper limit of the confidence interval in the box group.
Box_mean: mean self-esteem in the box group.
Box_sd: standard deviation of self-esteem in the box group.
`Teddy Bear_ci_low`: lower limit of the confidence interval in the teddy group.
`Teddy Bear_ci_upp`: upper limit of the confidence interval in the teddy group.
`Teddy Bear_mean`: mean self-esteem in the teddy group.
`Teddy Bear_sd`: standard deviation of self-esteem in the teddy group.

The variables for the teddy bear group have been named a bit clunkily (because we'd defined the group as 'Teddy Bear'). Let's tidy these names up using the rename() function:

teddy_wide <- teddy_wide %>%
  rename(
   teddy_ci_low = `Teddy Bear_ci_low`,
   teddy_ci_upp = `Teddy Bear_ci_upp`,
   teddy_mean = `Teddy Bear_mean`,
   teddy_sd = `Teddy Bear_sd`
  )

This code recreates the teddy_wide tibble from itself but creates a variable teddy_ci_low from the variable currently called `Teddy Bear_ci_low` and do on. In effect I'm renaming the 'Teddy Bear' variables to omit spaces and remove upper case letters (and be shorter). Try this in the code box and inspect the resulting tibble.

teddy_wide <-  teddy_sum %>%
  tidyr::gather(measure, value, -c(study_n, group)) %>%
  tidyr::unite(gp_measure, group, measure) %>%
  tidyr::spread(gp_measure, value)

teddy_wide <- teddy_wide %>%
  rename(
   teddy_ci_low = `Teddy Bear_ci_low`,
   teddy_ci_upp = `Teddy Bear_ci_upp`,
   teddy_mean = `Teddy Bear_mean`,
   teddy_sd = `Teddy Bear_sd`
  )
teddy_wide

Well done if you're still with me! Let's now do the (relatively) easy bit. Now we have the group means in columns, it's relatively straightforward to use mutate() to add a column that contains d:

teddy_wide <- teddy_wide %>%
  dplyr::mutate(
  d = (teddy_mean - Box_mean)/Box_sd
)

This code re-creates the teddy_wide tibble from itself, but then uses mutate() to add a variable called d, which is defined as (teddy_mean - Box_mean)/Box_sd. In other words it subtracts the scores in the column Box_mean from the scores in the column teddy_mean and divides the result by the scores in the column Box_sd. In other words, it takes the difference between the group means and divides by the standard deviation for the control group! Try this below and then view the resulting tibble to see the values of d.

teddy_wide <-  teddy_sum %>%
  tidyr::gather(measure, value, -c(study_n, group)) %>%
  tidyr::unite(gp_measure, group, measure) %>%
  tidyr::spread(gp_measure, value)

teddy_wide <- teddy_wide %>%
  rename(
   teddy_ci_low = `Teddy Bear_ci_low`,
   teddy_ci_upp = `Teddy Bear_ci_upp`,
   teddy_mean = `Teddy Bear_mean`,
   teddy_sd = `Teddy Bear_sd`
  )

teddy_wide <- teddy_wide %>%
  dplyr::mutate(
  d = (teddy_mean - Box_mean)/Box_sd
)
teddy_wide

question("Which of these statements about Cohen's *d* is **NOT** correct?",
    answer("The value of *d* cannot exceed 1.", correct = TRUE, message = "This statement is false and so is the coirrect answer."),
    answer("*d* is the difference between two means expressed in standard deviation units.", message = "This statement is true so is not the correct answer."),
    answer("A *d* of 0.2 would be considered small", message = "This statement is true so is not the correct answer."),
    answer("*d* can be computed using a control group standard deviation, the standard deviation of all scores or a pooled standard deviation.", message = "This statement is true so is not the correct answer."),
    correct = "Correct - well done!",
    random_answer_order = TRUE,
    allow_retry = T
  )

Bayes factors

Bayes factors and priors

The Bayes factor quantifies the probability of the data given the alternative hypothesis (in this case that self-esteem differs in the teddy and box groups) relative to the probability of the data given the null hypothesis (in this case that self-esteem is the same in the teddy and box groups):

$$ \text{Bayes factor} = \frac{p(\text{data}|\text{alternative})}{p(\text{data}|\text{null})} \ $$

The Bayesian t-test uses a default prior distribution based on a value, r, which scales the distribution [@RN9316]. The advantage of this approach is that it enables people to use a Bayesian method without needing to choose a prior from an almost endless set of possibilities. By con straining the decisions, some of the hard thinking is removed. The disadvantage is that defaults can tempt people to use them without any thought, and it’s tricky to understand what’s going on under the hood. To encourage you to give some thought to your prior, let’s look under the hood (just a bit). The prior distribution is constrained to be from a family known as the Cauchy distribution. Members of the Cauchy family look a bit like a normal distributions, but they’re not. Figure 1 shows Cauchy distributions with different scale factors, r: that are medium, r = 0.5 (left); wide, r = 0.7071 (middle), and ultrawide, r = 1 (right). Each curve has a shaded region with boundaries from −r to +r and the region under the curve between these boundaries represents 50% of the area. Note that the width of this region gets wider as r increases. When you set the value of r you are assigning a 50% probability that the effect size (Cohen’s d) lies between −r and +r. Given the distribution is symmetrical, you’re also assigning a 25% prior belief that d is greater than r and the remaining 25% that d is less than −r.

If you use the default value of r = 0.7071 you are, therefore, saying that your prior belief is that there’s a 50% probability that the effect size (Cohen’s d) lies between −0.7071 and +0.7071, a 25% probability that it is greater than +0.7071 and a 25% probability that it is less than −0.7071. Remembering what d represents, you’re saying that there’s a 50% probability that the difference between the two means lies between −0.7071 and +0.7071 standard deviations. If you set r = 0.5 (Figure 1, left) the Cauchy distribution becomes narrower and you are similarly narrowing your prior beliefs to a 50% probability that the difference between the group means lies between −0.5 and +0.5 standard deviations. Setting r = 1 (Figure 1, right) widens the distribution, representing wider beliefs. You’d be placing a 50% probability that the difference between the group means lies between −1 and +1 standard deviations. Remembering that d = 0.8 is a large effect, you’d be assigning a 50% probability to the difference between means being somewhere between a very large effect in one direction and a very large effect in the opposite direction. This is too wide a belief for most situations (bearing in mind that another 50% of the probability is assigned to the effect being outside those limits).

Figure 1: Distributions representing the *r* scale

Bayes factors in R

The BayesFactor package contains various functions for computing Bayes factors. We'll use it at various points for other models. For now, we'll use the ttestBF() function, which is designed for situations where you want a Bayes factor that quantifies differences between two means. It has a general form of:

ttestBF(formula = outcome ~ group_variable, mu = 0, paired = FALSE, data = NULL, rscale = "medium")

It has some other arguments not listed above, but these are the key ones for our purposes:

formula = outcome ~ group_variable: this allows us to specify the outcome variable (in this case self_esteem) and the variable that defines the two groups (in this case group). This is basically the same as the cohen.d() function.
mu = 0: for this example we can ignore this but if you want to use the function on a one-sample or paired design then you can use this to specify the null value of the mean (or mean difference in a paired design). The default of 0 is usually what you would want.
paired = FALSE: by default the function considers scores as coming from different entities (i.e. an independent designs). That's the design our teddy bear experiments use. However, if you have a repeated measures design (i.e. in our example participants cuddle both a teddy and a box at different points in time) then set paired to TRUE.
data = Null: use this option to specify the tibble containing the data.
rscale = "medium": This option specifies the prior. You can input a numeric value or use one of three pre-defined default priors. These defaults are "medium", "wide", and "ultrawide", which correspond to values of sqrt(2)/2 (i.e., 0.707), 1, and sqrt(2) respectively. I explained these priors earlier.

Like when we computed an effect size we're going to use a pipe (%>%) to link:

teddy_tib: our input will be the tibble of raw data
dplyr::filter(study_n == "Total N = 20"): we're going to filter that tibble to extract the experiment based on a sample size of 20
ttestBF() the function described above will compute the Bayes factor.

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 20") %>%
  BayesFactor::ttestBF(formula = self_esteem ~ group, data = .)

This command takes the teddy_tib tibble, extracts the study based on 20 participants, then applies the ttestBF() function. You'll get a warning about data coerced from a atibble to a dataframe because the function works with dataframes so it converts your tibble to a data frame. The warning is nothing to worry about.

Within this function we have left the defaults as they are (i.e. a 'medium' prior scale value of 0.707), our formula specifies that we're predicting self_esteem from the variable group (which defines whether the individual hugs a teddy or a box), and we specify the data within the function with a period (data = .). By using a period we're telling the function to use the data coming through from the pipe. Copy this command in the code box below and run the code. Then edit it to get the Bayes factor for the experiment that had 200 participants.

teddy_tib %>%
  dplyr::filter(study_n == "Total N = 200") %>%
  BayesFactor::ttestBF(formula = self_esteem ~ group, data = .)

The resulting Bayes factor is 1.27 which means that the probability of the data given the alternative hypothesis is about the same as the probability of the data given the null. The value here suggests that we should not change our prior beliefs. By using the default prior we assigned a 50% probability to the effect size (d) lying between −0.7071 and +0.7071, and this Bayes factor tells us not to change this belief.

For the larger sample the Bayes factor is 754768.4, which is huge. The probability of the data given the alternative hypothesis is 754768.4 greater than the probability of the data given the null. The value here suggests that we should shift our prior beliefs towards the alternative hypothesis by a factor of 754768.4. In other words, having looked at the data we should hold a very much stronger belief that the means are different.

Remember that the effect sizes in the two studies were basically identical, so it may seem strange that the Bayes factors are so different. However, if you think about it, Bayesian methods update prior beliefs using the data. A small amount of data will have less impact on beliefs than a huge sample (if the priors are the same). Imagine you're asked the probability that a sports team will win their next game. You guess at 33% change that they'll win (you assume win, lose or draw are equally likely). Take two scenarios: you're told that this team won their last game. This might increase your belief in a win, but probably not by much - perhaps that game was a fluke. The second scenario is that you're told that they have won their last 20 games. You're belief in a win will probably shift quite strongly towards a win - they have consistently shown a winning mentality. In both cases you're being told that they have won 100% of past games, but in the first case that is based on a tiny amount of data, in the latter case it's based on a lot of date. This illustrates the principle of how (other things being equal) larger data sets have increasing influence.

question("What is a Bayes factor??",
    answer("The relative probability of the data given the alternative to the data given the null.", correct = TRUE),
    answer("The ratio of the probability of the alternative hypothesis given the data to the probability of the null hypothesis given the data.", message = "This statement describes the posterior odds."),
    answer("The ratio of the probability of the alternative hypothesis to the probability of the null hypothesis.", message = "This statement describes the prior odds."),
    answer("Tell me a terrible and possible inappropriate joke about Bayesians.", message = "Q. Why do Bayesians make good proctologists? A. Because they're always looking at the posterior."),
    correct = "Correct - well done!",
    random_answer_order = TRUE,
    allow_retry = T

)

Concluding task

The tibble zhang_female_tib contains a small random subsample of females from a study that looked at performance on a maths test when the test is taken under the participant's own name, or a fake name. There are two variables:

name: Specifies whether participants completed the test under their own name or a fake name
accuracy: Accuracy on the maths test (%)

In the code box, produce some code to:

Produce an error bar plot of the two group means and their confidence intervals
Hedges g for the difference between mean accuracy scores in the two groups

zhang_plot <- ggplot2::ggplot(zhang_female_tib, aes(name, accuracy))
zhang_plot +
  stat_summary(fun.data = "mean_cl_normal", size = 1) +
  coord_cartesian(ylim = c(0,100)) + 
  scale_y_continuous(breaks = seq(0, 100, 10)) +
  labs(x = "Experimental condition", y = "Accuracy on maths test (%)") +
  theme_bw()

#to get g:
zhang_female_tib %>%
  effsize::cohen.d(formula = accuracy ~ name, data = ., hedges.correction = T)

#to get a Bayes factor:

zhang_female_tib %>%
  BayesFactor::ttestBF(formula = accuracy ~ name, data = .)

Other resources

Statistics

The tutorials typically follow examples described in detail in @RN10163, so for most of them there's a thorough account in there. You might also find @RN4832 useful for the R stuff.
There are free lectures and screen casts on my YouTube channel
There are free statistical resources on my website www.discoveringstatistics.com

R

Information on using ggplot2
R for data science is the open-access version of the book by tidyverse creator Hadley Wickham [@RN11404]. It covers the tidyverse and data management.
ModernDive is an open-access textbook on R and RStudio
RStudio cheat sheets
RStudio list of online resources
SwirlStats is a package for R that launches a bunch of interactive tutorials.

References

Any scripts or data that you put into this service are public.

adventr documentation built on July 1, 2020, 11:50 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adventr
Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

adventr: hypothesis testing
In adventr: Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

An Adventure in R: Hypothesis testing

Overview

Story précis

Why a précis?

General context for the story

Main Protagonists

How Zach's adventure begins

Packages and data

Packages

Data

A cuddly example

Effect sizes

Cohen's d

An optional tricky section

Bayes factors

Bayes factors and priors

Bayes factors in R

Concluding task

Other resources

Statistics

R

References

Try the adventr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

adventr Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

adventr: hypothesis testing In adventr: Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

An Adventure in R: Hypothesis testing

Overview

Story précis

Why a précis?

General context for the story

Main Protagonists

How Zach's adventure begins

Packages and data

Packages

Data

A cuddly example

Effect sizes

Cohen's d

An optional tricky section

Bayes factors

Bayes factors and priors

Bayes factors in R

Concluding task

Other resources

Statistics

R

References

Try the adventr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

adventr
Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

adventr: hypothesis testing
In adventr: Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"