adventr: factorial designs

library(forcats)
library(learnr)
library(tidyverse)

library(BayesFactor)
library(car)
library(lm.beta)
library(robust)
library(sjstats)
library(WRS2)



knitr::opts_chunk$set(echo = FALSE, warning = FALSE)
tutorial_options(exercise.cap = "Exercise")

hint_text <- function(text, text_color = "#E69F00"){
  hint <- paste("<font color='", text_color, "'>", text, "</font>", sep = "")
  return(hint)
}

#Read dat files needed for the tutorial

mem_cov_tib <- adventr::mem_cov_dat;
alice_tib <- adventr::alice_dat

#setup objects for code blocks

An Adventure in R: Factorial designs (and adjusted means)

Overview

This tutorial is one of a series that accompanies An Adventure in Statistics [@RN10163] by me, Andy Field. These tutorials contain abridged sections from the book so there are some copyright considerations but I offer them under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, ^[Basically you can use this tutorial for teaching and non-profit activities but do not meddle with it or claim it as your own work.]

Story précis

Why a précis?

Because these tutorials accompany my book An adventure in statistics, which uses a fictional narrative to teach the statistics, some of the examples might not make sense unless you know something about the story. For those of you who don't have the book I begin each tutorial with a précis of the story. If you're not interested then fair enough - click past this section.

General context for the story

It is the future. Zach, a rock musician and Alice, a geneticist, who have been together since high school live together in Elpis, the ‘City of Hope’.

Zach and Alice were born in the wake of the Reality Revolution which occurred after a Professor Milton Gray invented the Reality Prism – a transparent pyramid worn on the head – that brought honesty to the world. Propaganda and media spin became unsustainable, religions collapsed, advertising failed. Society could no longer be lied to. Everyone could know the truth about anything that they could look at. A gift, some said, to a previously self-interested, self-obsessed society in which the collective good had been eroded.

But also a curse. For, it soon became apparent that through this Reality Prism, people could no longer kid themselves about their own puffed-up selves as they could see what they were really like – by and large, pretty ordinary. And this caused mass depression. People lost faith in themselves. Artists abandoned their pursuits, believing they were untalented and worthless.

Zach and Alice have never worn a Reality Prism and have no concept of their limitations. They were born after the World Governance Agency (WGA) destroyed all Reality Prisms, along with many other pre-revolution technologies, with the aim of restoring community and well-being. However, this has not been straightforward and in this post-Prism world, society has split into pretty much two factions

Everyone has a star, a limitless space on which to store their digital world.

Zach and Alice are Clocktarians. Their technology consists mainly of:

Main Protagonists

How Zach's adventure begins

Alice has been acting strangely, on edge for weeks, disconnected and uncommunicative, as if she is hiding something and Zach can’t get through to her. Arriving home from band practice, unusually, she already home and listening to an old album that the two of them enjoyed together, back in a simpler, less complicated time in their relationship. During an increasingly testy evening, that involves a discussion with the Head about whether or not a Proteus causes brain cancer, Alice is interrupted by an urgent call which she takes in private. She returns looking worried and is once again, distracted. She tells Zach that she has ‘a big decision to make’. Before going to bed, Zach asks her if he can help with the decision but she says he ‘already has’, thanking him for making ‘everything easier.’ He has no idea what she means and goes to sleep, uneasy.

On waking, Zach senses that something is wrong. And he is right. Alice has disappeared. Her clothes, her possessions and every photo of them together have gone. He can’t get hold of any of her family or friends as their contact information is stored on her Proteus, not on his diePad. He manages to contact the Beimeni Centre but is told that no one by the name of Alice Nightingale has ever worked there. He logs into their constellation but her star has gone. He calls her but finds that her number never existed. She has, thinks Zach, been ‘wiped from the planet.’ He summons The Head but he can’t find her either. He tells Zach that there are three possibilities: Alice has doesn’t want to be found, someone else doesn’t want her to be found or she never existed.

Zach calls his friend Nick, fellow band member and fan of the WGA-installed Repositories, vast underground repositories of actual film, books, art and music. Nick is a Chipper – solely for the purpose of promoting the band using memoryBank – and he puts the word out to their fans about Alice missing.

Thinking as hard as he can, Zach recalls the lyrics of the song she’d been playing the previous evening. Maybe they are significant? It may well be a farewell message and the Head is right. In searching for clues, he comes across a ‘memory stone’ which tells him to read what’s on there. File 1 is a research paper that Zach can’t fathom. It’s written in the ‘language of science’ and the Head offers to help Zach translate it and tells him that it looks like the results of her current work were ‘gonna blow the world’. Zach resolves to do ‘something sensible’ with the report.

Zach doesn’t want to believe that Alice has simply just left him. Rather, that someone has taken her and tried to erase her from the world. He decides to find her therapist, Dr Murali Genari and get Alice’s file. As he breaks into his office, Dr Genari comes up behind him and demands to know what he is doing. He is shaking but not with rage – with fear of Zach. Dr Genari turns out to be friendly and invites Zach to talk to him. Together they explore the possibilities of where Alice might have gone and the likelihood, rating her relationship satisfaction, that she has left him. During their discussion Zach is interrupted by a message on his diePad from someone called Milton. Zach is baffled as to who he is and how he knows that he is currently discussing reverse scoring. Out of the corner of his eye, he spots a ginger cat jumping down from the window ledge outside. The counsellor has to go but suggests that Zach and ‘his new friend Milton’ could try and work things out.

Packages and data

Packages

This tutorial uses the following packages:

These packages are automatically loaded within this tutorial. If you are working outside of this tutorial (i.e. in RStudio) then you need to make sure that the package has been installed by executing install.packages("package_name"), where package_name is the name of the package. If the package is already installed, then you need to reference it in your current session by executing library(package_name), where package_name is the name of the package.

Data

This tutorial has the data files pre-loaded so you shouldn't need to do anything to access the data from within the tutorial. However, if you want to play around with what you have learnt in this tutorial outside of the tutorial environment (i.e. in a stand-alone RStudio session) you will need to download the data files and then read them into your R session. This tutorial uses the following files:

You can load the files in several ways (using the first file as an example):

Adapt the above instructions to load the second data file into a tibble called mem_cov_tib.

Categorical variables

When working within the tutorial the data are already prepared for you. However, if you are trying to replicate the tutorial within R or R Studio then you need to explicitly convert categorical predictors to factor. The first data file, which you have loaded into alice_tib has two categorical variables (picture and *gene), which will be read in from the csv file as character variables. To convert these variables to factors we can use the as_factor() function from the forcats package. There are several ways to do this, but the most tidyverse way is to use mutate() from dplyr, which is a way of adding variables to a tibble or overwriting existing variables. We could execute:

library(tidyverse)
alice_tib <- alice_tib %>% 
  dplyr::mutate(
    picture = forcats::as_factor(picture),
    gene = forcats::as_factor(gene)
  )

This code recreates the alice_tib tibble from itself then uses mutate to recreate the variables picture and gene from themselves by placing each one in turn within the as_factor() function.

We can do the same for the second data file, which you have stored in mem_cov_tib. This tibble contains one categoircal variable called group. We can convert this variable to a factor by adapting the code above:

library(tidyverse)

mem_cov_tib <- mem_cov_tib %>% 
  dplyr::mutate(
    group = forcats::as_factor(group)
  )

Exploring the data

The model

At the start of the story when Alice first disappears Zach finds a memory stone containing some of Alice's secret research. Unable to make sense of it, he keeps the stone and sets off on his mission to find her. At the climax of the book, Zach foolishly gives Vic Luten this stone. Luten takes Alice, against her will, to the epicentre of the JIG:SAW complex and throws her into a Schrödinger's prison. Zach and Milton follow. Vic brokers a deal: he will release Alice if Zach helps him to understand the results of Alice's secret research.Can Zach fathom it out and save Alice?

The experiment on Alice's memory stone relates to work she has been doing to see whether a genetic toggle switch can be used in conjunction with the chameleon gene to help people with injuries to regenerate their wounds. Participants who had facial burns were recruited. One-week pre-test, a gene C–10XFMG (the so-called chameleon gene) was introduced into all participants. Two days prior to test, a genetic toggle switch was introduced into half of the participants. Half of the participants were asked to look at a photograph of themselves from before their injury and to imagine the cells in their faces changing to become like the photo. The remainder acted as a control and were asked to look at a picture of a same-sex stranger and to try to change their face to become the person in the picture. All participants looked at the picture for 6 sessions of 20 minutes each. At the end of the sessions their faces were scanned into a computer and compared to the face in the photograph. Facial recognition software produced a precise resemblance measure as a percentage (100% = the participants face is exactly like the face in the photograph, 0% the person in the photo bears no resemblance at all to the participant).

The data are in the tibble alice_tib, which has four variables:

Use the code box to inspect the tibble.


alice_tib   

The model we're fitting is described by the following equation:

$$ \begin{aligned} Y_i & = b_0 + b_1X_i+ ε_i\ \text{resemblance}_i & = b_0 + b_1\text{gene}_i + b_2\text{picture}_i + b_3\text{gene} \times \text{picture}_i + ε_i \end{aligned} $$

The variables gene and picture are made up of 0s and 1s that code the three groups across the two variables (see the book chapter for details).

Descriptive statistics

Using what you've learnt in previous tutorials, create a tibble called alice_summary containing the group means and their confidence intervals split by gene and picture.

r hint_text("Tip: to group means by more than one variable you can use group_by(x, y). If you're doing this outside of the tutorial remember to load the packages tidyverse and Hmisc")


alice_summary <- alice_tib %>%
  dplyr::group_by(gene, picture) %>%
  dplyr::summarize(
    mean = mean(resemblance),
    ci_low = ggplot2::mean_cl_normal(resemblance)$ymin,
    ci_upp = ggplot2::mean_cl_normal(resemblance)$ymax
)
alice_summary              

It would be useful to get some means for the main effects of gene and picture, so create two objects gene_main and pic_main that contain means and confidence intervals when the data are split by only gene (for gene_main) and picture (for pic_main).


gene_main <- alice_tib %>%
  dplyr::group_by(gene) %>%
  dplyr::summarize(
    mean = mean(resemblance),
    ci_low = ggplot2::mean_cl_normal(resemblance)$ymin,
    ci_upp = ggplot2::mean_cl_normal(resemblance)$ymax
)

pic_main <- alice_tib %>%
  dplyr::group_by(picture) %>%
  dplyr::summarize(
    mean = mean(resemblance),
    ci_low = ggplot2::mean_cl_normal(resemblance)$ymin,
    ci_upp = ggplot2::mean_cl_normal(resemblance)$ymax
)

gene_main  
pic_main

Plotting the data

Let's plot the data. Use the code box below to create an error bar chart with group on the y-axis, recall on the x-axis and picture displayed in different colours. If you feel like it, try to superimpose the raw data.


alice_plot <- ggplot2::ggplot(alice_tib, aes(gene, resemblance, colour = picture))
alice_plot +
  stat_summary(fun.data = "mean_cl_normal", position = position_dodge(width = 0.5)) +
  scale_colour_manual(values = c("#E69F00", "#56B4E9")) + 
  coord_cartesian(ylim = c(0,100)) +
  scale_y_continuous(breaks = seq(0, 100, 10)) +
  labs(y = "Resemblance (%)", x = "Participant", colour = "Picture") + 
  theme_bw()

Several categorical predictors (factorial designs)

Fitting the model

To fit the model we use the lm() function, because we are fitting a linear model with two categorical predictors (and an interaction term). We've used this function before, just to recap it takes the following general form (I've retained only the key options):

lm(outcome ~ predictor(s), data, subset, na.action = na.fail)

We can specify the model itself in one of two ways. The first is to write out each predictor and the interaction term longhand. To specify an interaction term we use a colon to separate the terms in the interaction. So, the $gene \times picture$ interaction would be specified as gene:picture. Therefore, we could specify the model within lm() as:

resemblance ~ gene + picture + gene:picture

A quicker way, for models where we want to retain all main effects and lower order interaction terms, is just to specify the highest-order interaction separating any variable names by an asterisks. This has the effect of including that interaction and any lower-order effects. For example, if we specify:

resemblance ~ gene*picture

The resulting model will include the main effects of gene and picture as well as the interaction between them.

Using what I have just explained and what you know from previous tutorials do the following:

Fit the model in the code box below.


gen_mod <- lm(resemblance ~ gene*picture, data = alice_tib)
summary(gen_mod)
confint(gen_mod)
plot(gen_mod)

The output provides estimates of the model parameters (the b-values) and the significance of these values. R automatically turns the variables gene and picture into dummy variables. Left to its own devices R will order the categories alphabetically (so the order for gene will be C-Gene and C-Gene + Toggle and for picture will be Other and Self) but I have turned these variables into factors and in doing so changed the order of levels for picture to be Self and Other. It doesn't make any difference for these variables (because they have only two categories) but for predictors with more than two categories bear in mind that you might want to set specific orders for factor levels for the predictor variables to make the model parameters conform to meaningful group comparisons.

The Y intercept ($b_0$) is 51.50, which is the value of resemblance when all three predictors are zero. That is when gene is zero (i.e. C-Gene) and picture is zero (i.e. Self) and the interaction is also zero. So, ($b_0$) is 51.50 because that is the mean of the group that had C-gene therapy only and studied photos of their own face. The predictors are dummy variables, and their respective bs reflect the differences between means that the dummy variables code.

$$ \begin{aligned} b_3 &= \big(\bar{X}\text{Toggle, other} - \bar{X}\text{C-gene, other}\big) - \big(\bar{X}\text{Toggle, self} - \bar{X}\text{C-gene, self}\big)\ &= \big(85.25-8.63\big)-\big(86.13-51.50\big) \ &= 42 \end{aligned} $$

Using F-statistics to evaluate effects

It is common to evaluate the effects in factorial models using an F-statistic (the so called Analysis of variance, ANOVA). We can get these F-statistics in two ways. The first is to place the model we have just created into the aov function and then summarize it. For example, we could execute:

gen_aov <- aov(gen_mod)
summary(gen_aov)

The first takes the model gen_mod that we just created and extracts the information needed to present a summary in terms of F-statistics. We store this information in a new object called gen_aov. The second line uses summary() to print the summary table of F-values. This method will give us Type I sums of squares (refer to the lecture). These sums of squares are sequential and are affected by the order in which we place variables into the model. That's why software like SAS and SPSS uses type III sums of squares. The fact that our data are balanced (equal numbers of participants in all conditions) and orthogonal (all of our predictors are binary) means that for these data the Type I sums of squares are the same as the Type III. However, if you have unbalanced designs or predictors with more than two categories you need to look at Type III sums of squares.

r hint_text("Warning: the following section is quite tricky", "#DF4738")

To get these we use the Anova() function from the car package. It's a complicated process because for Type III sums of squares to make sense, the contrasts for predictor variables need to be independent (or to use the posh term orthogonal). Orthogonal means that the numbers used to code groups must sum to zero and cross multiple to sum to zero. By default, R will use dummy coding (0s and 1s) which results in non-orthogonal contrasts. You can tell that the contrasts are not orthogonal by the fact that the codes do not sum to zero, and when you cross multiply them that sum is also not zero. Table 1 shows what I mean. The codes for the variable gene are 0 for c-gene and 1 for toggle switch. So, the two c-gene groups get a code of 0 and the two toggle groups get a code of 1. Sum these codes and you get 0 + 0 + 1 + 1 = 2. This sum needs to be zero. Similarly, for picture, other is coded as 1 and self as 0. So the two 'other' groups get a 1 and the two 'self' groups get a 0. The resulting codes again do not sum to zero: 1 + 0 + 1 + 0 = 2. The final column shows the codes for gene multiplied by the corresponding code for picture. For example, for the c-gene and other condition we get $0 \times 1 = 0$, for the c-gene and self condition we get $0 \times 0 = 0$ and so on. If we add these products we get a sum of 1. Again, we need this sum to be zero.

Group <-  c("C-Gene, Other", "C-Gene, Self", "Toggle, Other", "Toggle, Self", "Sum")
gene_codes <- c(0, 0, 1, 1, 2)
picture_codes <- c(1, 0, 1, 0, 2)
codes_multiplied <- c(0, 0, 1, 0, 1)

knitr::kable(tibble(Group, gene_codes, picture_codes, codes_multiplied), caption = "Table 1: default codes for the two predictors")

Table 2 shows a different set of codes that result in orthogonal contrasts. Instead of using 0 and 1, we use 0 and -1 (this is known as effect coding). All I have done is replace all of the zero codes with -1. The codes for the variable gene are -1 for c-gene and 1 for toggle switch. So, the two c-gene groups get a code of -1 and the two toggle groups get a code of 1. Sum these codes and you get -1 + -1 + 1 + 1 = 0. Similarly, for picture, other is coded as 1 and self as -1. So the two 'other' groups get a 1 and the two 'self' groups get a -1. The resulting codes again sum to zero: 1 - 1 + 1 - 1 = 0. The final column shows the codes for gene multiplied by the corresponding code for picture. For example, for the c-gene and other condition we get $-1 \times 1 = -1$, for the c-gene and self condition we get $-1 \times -1 = 1$ and so on. If we add these products we get a sum of -1 + 1 + 1 -1 = 0. So the sum of codes for each predictor sum to zero as do those codes multiplied. These are, therefore, orthogonal contrast codes.

Group <-  c("C-Gene, Other", "C-Gene, Self", "Toggle, Other", "Toggle, Self", "Sum")
gene_codes <- c(-1, -1, 1, 1, 0)
picture_codes <- c(1, -1, 1, -1, 0)
codes_multiplied <- c(-1, 1, 1, -1, 0)

knitr::kable(tibble(Group, gene_codes, picture_codes, codes_multiplied), caption = "Table 2: Codes for the two predictors resulting in orthogonal contrasts")

So, to get Type III sums of squares we do the following:

contrasts(alice_tib$picture) <- c(1, -1)
contrasts(alice_tib$gene) <- c(1, -1)
gen_mod2 <- lm(resemblance ~ gene*picture, data = alice_tib)
car::Anova(gen_mod2, type = 3)
car::Anova(gen_mod2, type = 3, white.adjust = "hc3")

The results show significant main effects and interactions for all variables. Using the means we generated at the start of the tutorial we can say that resemblance scores are significantly higher for the C-gene and Toggle group than those that had only C-gene therapy. Resemblance scores were significantly higher for those studying their own face than someone else's face. There was also a significant interaction: in the group that had the toggle switch resemblance scores are similar when they studied their own face compared to studying a stranger's face, but in the c-gene group resemblance scores are much lower when studying a stranger's face compared to their own.

Use the code box to generate F-statistics for this model based on Type III sums of squares and with a white adjustment ("hc3").


contrasts(alice_tib$picture) <- c(1, -1)
contrasts(alice_tib$gene) <- c(1, -1)
gen_mod2 <- lm(resemblance ~ gene*picture, data = alice_tib)
car::Anova(gen_mod2, type = 3)
car::Anova(gen_mod2, type = 3, white.adjust = "hc3")

Robust models

Robust standard errors

As in previous tutorials, we can use one of the methods (HC0 to HC5) for producing standard errors (and hence confidence intervals) that are robust to heteroscedasticity.

r hint_text("Tip: if you're doing this outside of the tutorial you need the sjstats package installed and you need to execute library(sjstats).")

Remember that we place the model we created with lm() (in this case gen_mod) into the function robust(). For example, we could execute:

The HC3 and HC4 refer to different methods to correct the standard errors (refer back to the tutorial adventr_14). Try these commands in the code box and compare the results.

gen_mod <- lm(resemblance ~ gene*picture, data = alice_tib)

sjstats::robust(gen_mod, conf.int = T)
sjstats::robust(gen_mod, vcov.type = "HC4", conf.int = T)

Robust Fs

We saw in the previous section that if we evaluate effects with an F-statistic we can correct for heteroscedasticity by including white.adjust = "hc3" in the car::Anova() function.

Robust estimates

In advetr_14 we used the lmRob() function from the robust package (again, if you're working outside of the tutorial remember to execute library(robust)) to get robust parameter estimates. Recall that the function is used in the same way as lm() so you can copy your earlier code and replace lm with lmRob. Try this in the code box to create a model called gen_rob, and use summary() to view it.


gen_rob <- robust::lmRob(resemblance ~ gene*picture, data = alice_tib)
summary(gen_rob)   

Note that the parameter estimates for the effect of group have changed because they no longer represents the difference between arithmetic means but instead represents a robust estimate of those differences.

The WRS2 package

The WRS2 package [@RN10205] wraps up a few of the many functions described by Wilcox to perform robust variants of tests [@RN5244]. We'll look at these functions that compare several means:

These functions have similar arguments:

See if you can run the three commands in the code box below based on my description (if not, press )


WRS2::t2way(resemblance ~ gene*picture, data = alice_tib)
WRS2::mcp2atm(resemblance ~ gene*picture, data = alice_tib)
WRS2::pbad2way(resemblance ~ gene*picture, data = alice_tib, nboot = 1000)
WRS2::mcp2a(resemblance ~ gene*picture, data = alice_tib, nboot = 1000)

Because our predictors are both made up of only two categories, there isn't really any need for post hoc tests (because the main effects can only reflect the difference between the two groups). Based on t2way() we'd conclude that there was a significant main effect of gene, Q = 451.36, p = 0.001, a significant main effect of picture, Q = 65.50, p = 0.001, and a significant interaction between the two, Q = 61.47, p = 0.001. pbad2way() returns only p-values for each effect, and these are all 0, which would again suggest all effects in the model are significant.

Bayesian models

Bayes factors

Like in previous tutorials (adventr_11, and adventr_14 to adventr_16) we can obtain Bayes factors using the BayesFactor package. We are going to use the lmBF() function, which is basically the same as the regressionBF() and anovaBF() functions that we have encountered before. The difference is that with lmBF we can build individual models and compare them. This function has this format:

object = lmBF(formula = outcome ~ group_variable, data = tibble, rscaleCont = "medium")

The function uses default priors and by default (i.e. if you exclude the option altogether) the prior is set at "medium" ($^\sqrt{2}/_4$), but you can specify rscaleCont = "wide" ($^1/_2$) or rscaleCont = "ultrawide" ($^\sqrt{2}/_2$). These priors are explained in adventr_11 and also read @RN9309.

We're going to build three models: (1) gene as the only predictor; (2) a model that adds picture as a predictor; (3) a model that adds the interaction term (gene:picture). Having created these models we will compared them. So, we'll start by looking at the Bayes factor for gene as the sole predictor, then get the Bayes factor for what picture adds to the model, then finally get the Bayes factor for what the interaction term adds to the model.

r hint_text("Tip: if you are using a tibble (rather than a data frame) then place the tibble's name into the function data.frame(), to convert it to a data frame. This step is necessary because (at the time of writing) the BayesFactor package doesn't accept tibbles.")

For our data (using the default prior) we could execute the following to create our three models:

gene_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene, data = .)
pic_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene + picture, data = .)
int_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene + picture + gene:picture, data = .)

The first line creates model 1 and stores it in the object gene_bf, the second line creates model 2 and stores it in the object pic_bf and the third line creates model 3 and stores it in the object int_bf. having created them we can compare them with the following commands:

gene_bf
pic_bf/gene_bf
int_bf/pic_bf

The first line gives us the Bayes factor for model 1, the second shows us the Bayes factor for model 2 relative to model 1 (i.e. what does picture add to the model), and the third line shows us the Bayes factor for model 3 relative to model 2 (i.e. what does the gene:picture interaction add above and beyond the main effects?)

Try these commands in the code box:


gene_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene, data = .)
pic_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene + picture, data = .)
int_bf <- alice_tib %>% 
  BayesFactor::lmBF(formula = resemblance ~ gene + picture + gene:picture, data = .)

gene_bf
pic_bf/gene_bf
int_bf/pic_bf

Because the results are based on sampling (and so will vary each time you generate them) Figure 1 shows a particular set of results that I'll interpret. First we put the main effect of gene therapy into the model and see that the Bayes factor is 16929560, which is huge. You should shift your belief in the alternative hypothesis relative to the null by a factor of 16929560. In short, your belief in gene therapy predicting resemblance scores should shift a lot. We then added the main effect of picture to see how this improved the model compared to when we included only gene therapy as a predictor. The Bayes factor for this main effect is 237.49, which also suggests that your beliefs should move a lot in favour of the type of picture predicting resemblance scores. Finally, we added the interaction effect and compared this full model to the model that included only the main effects. The Bayes factor for the interaction term is 174996.6. Again, this number is huge, suggesting that you should shift your belief in the alternative hypothesis relative to the null by a factor of 174996.6 In short, your belief in the combined effect of gene therapy and the type of picture studied predicting resemblance scores should shift a lot.

Figure 1: Bayes factors output

Adjusting means (ANCOVA)

A familiar example (unless your ID chip has been zapped)

In the lecture (but not the book) we also looked at models including categorical and continuous predictors. In these models, the categorical predictors no longer compare the means of the categories, but compare the means adjusted for other variables in the model. This is the basic 'analysis of covariance (ANCOVA)' model. In the lecture we imagined an experiment similar to the one we encountered in the previous tutorial, but with an added covariate. Just to recap:

During Zach's visit to the JIG:SAW complex he visits several research buildings. In the second, he discovers experiments being conducted related to memory. In one experiment thirty-six participants with ID chips were tested. All participants met a stranger who had a 5-minute scripted conversation with them containing 10 critical pieces of information. A week later, the participants were asked to recall the encounter for 5 minutes, and then after a 10-minute break wrote everything that they could remember about the original encounter. During the 5-minute initial recall, one of three things happened: the control group (N = 12) received no intervention; the erase group (N = 12) had a pulse of electricity sent to their brain via their ID chip; and the replace group (N = 12) had conflicting verbal descriptions sent to their ID chip. The outcome was how many of the 10 critical pieces of information the participants wrote down after the recall phase. The prediction was that recall will be worse (i.e. lower scores) for participants who had an electrical pulse, or conflicting information sent to their ID chips during recall than those, in the control group, who experienced no interference. Imagine that (unlike in the book) they had also measured working memory in each participant (as their digit span). The data are in the tibble mem_cov_tib, which has these variables:

Use the code box to inspect the tibble.


mem_cov_tib   

The model we're fitting is described by the following equation:

$$ \begin{aligned} Y_i & = b_0 + b_1X_i+ ε_i\ \text{recall}_i & = b_0 + b_1\text{dummy 1}_i + b_2\text{dummy 2}_i + b_3\text{covariate}_i + ε_i \ & = b_0 + b_1\text{erase vs. control}_i + b_2\text{replace vs. control}_i + b_3\text{working memory}_i + ε_i \end{aligned} $$

Fitting the model

We can fit this model in the same way as described for a factorial design. Remember that R will dummy code the variable group for us, but we'd want to set some orthogonal contrasts. We did this in the previous tutorial using these commands:

interference_vs_none <- c(-2, 1, 1)
erase_vs_replace <- c(0, -1, 1)
contrasts(mem_cov_tib$group) <- cbind(interference_vs_none, erase_vs_replace)

Having done that we can follow the same process as the previous example for fitting the model (using lm()), getting F-statistics (using Anova()), fitting robust models (using lmRob() or robust()) and getting Bayes factors (using lmBF()). Either in a separate RStudio session or in the box below generate a script to do the following:

If you want help, click on .


interference_vs_none <- c(-2, 1, 1)
erase_vs_replace <- c(0, -1, 1)
contrasts(mem_cov_tib$group) <- cbind(interference_vs_none, erase_vs_replace)

mem_cov_mod <- lm(recall ~ working_mem + group, data = mem_cov_tib)
summary(mem_cov_mod)

car::Anova(mem_cov_mod, type = 3)
sjstats::robust(mem_cov_mod)
mem_cov_rob <- robust::lmRob(recall ~ working_mem + group, data = mem_cov_tib)
summary(mem_cov_rob)

wm_bf <- mem_cov_tib %>% 
  BayesFactor::lmBF(formula = recall ~ working_mem, data = .)
grp_bf <- mem_cov_tib %>% 
  BayesFactor::lmBF(formula = recall ~ working_mem + group, data = .)

wm_bf
grp_bf/wm_bf

Other resources

Statistics

R

References



Try the adventr package in your browser

Any scripts or data that you put into this service are public.

adventr documentation built on July 1, 2020, 11:50 p.m.