# adventr: repeated-measures designs In adventr: Interactive R Tutorials to Accompany Field (2016), "An Adventure in Statistics"

library(forcats)
library(learnr)
library(tidyverse)
library(BayesFactor)
library(nlme)
library(WRS2)

ci_upp_recognition = ggplot2::mean_cl_normal(recognition)$ymax ) calcite_summary  ### Plotting the data Use the code box below to create an error bar chart with face_type on the x-axis and recognition on the y-axis. If you feel like it, try to superimpose the raw data. (If you do this last part you'll use the geom_point() function, try this, but then add into the function position = position_jitter(width = 0.1, height = 0). This adds what's known as a 'jitter' to the points, which just means that small a random value is added so that the dots don't overlap. By setting width to 0.1, we're limiting the horizontal jitter and disabling the horizontal jitter so that the values on the y-axis are true to the actual scores.) Remember to use the 'tidy' version of the data.   calcite_plot <- ggplot2::ggplot(calcite_tidy, aes(face_type, recognition)) calcite_plot + geom_point(colour = "#E69F00", position = position_jitter(width = 0.1, height = 0)) + stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") + labs(x = "Experimental condition", y = "Recoognition (out of 20") + coord_cartesian(ylim = c(0, 20)) + scale_y_continuous(breaks = 0:20) + scale_x_discrete(labels = c("Actual face", "Different face")) + #adds more descriptive labels to the x-axis theme_bw()  ## Fitting the model It looks like recognition scores are lower in the different condition. To fit the model we use the lme() function from the nlme package, because we are fitting a linear model with a categorical predictor, but we want to also model the dependency between scores. This function is explained in the tutorial adventr_mlm, just to recap it takes the following general form (I've retained only the key options): new_object <- lme(outcome ~ predictors, random = formula, data = tibble, method = "REML", na.action = na.fail) • new_model: an object created that contains information about the model. We can get summary statistics for this model by executing the name of the model. • outcome: the variable that contains the scores for the outcome measure (in this case recognition). • predictor: the variable that contains information about to which condition a score belongs (in this case face_type). • random: defines the random parts of the model. This takes a formula in the style ~ predictor|context. In this case our context is participants, which is represented by the variable id. If we want only to let intercepts vary by participant we could use ~ 1|id, but in fact we also want to let the effect of face_type vary by participant so we'd use ~ face_type|id. • tibble: the name of the tibble containing the data (in this case calcite_tidy). • method: defines which estimation method is used. By default restricted maximum likelihood is used (REML), but if you want to compare models you should override the default and use maximum likelihood (method = "ML"). • na.action: If you have complete data (as we have here) exclude this option, but if you have missing values (i.e., ‘NA’s in the data frame) then by default the model will fail, so include na.action = na.exclude to exclude cases with missing values. We could, therefore, specify the model as: calcite_rs <- nlme::lme(recognition ~ face_type, random = ~ face_type|id, data = calcite_tidy, method = "ML") summary(calcite_rs) intervals(calcite_rs, which = "fixed")  The first line creates an object called calcite_rs which predicts recognition scores from the effect of face_type but with intercepts and slopes allowed to vary across participants (I appended _rs to the name to remind me it is a random slopes model). The summary() function gives me a summary of the model (including a t-statistic and p-value for the fixed effect of face_type) and intervals() gives me 95% confidence intervals for the fixed effects (hence the which = "fixed"), which in this case are the intercept and effect of face_type. Try executing these commands in the code box.   calcite_rs <- nlme::lme(recognition ~ face_type, random = ~ face_type|id, data = calcite_tidy, method = "ML") summary(calcite_rs) intervals(calcite_rs, which = "fixed")  The output provides estimates of the model parameters (the b-values) and the significance of these values. The Y intercept ($b_0$) is 9.20, which is the value of recognition when face_type = 0. R will have coded the variable face_type using dummy coding. It will do this alphabetically (unless you tell it something different), which means that it will have coded 'actual' as 0 and 'different' as 1 (because a comes before d in the alphabet). Note that 9.20 is the value of the mean in the actual condition that we calculated earlier. quiz( question("How would we interpret the *Value* (-2.35) for *face_typedifferent*? [Select **two** correct answers.]", answer("As the value of **face_type** changed from 0 (actual) to 1 (different), recognition scores decrease by 2.35.", correct = T), answer("The difference between group means is -2.35.", correct = T), answer("As the value of **face_type** changed from 0 (actual) to 1 (different), recognition scores decrease by 2.35 of a standard deviation", message = "This describes the *standardized* B, not the *unstandardized*."), answer("Group membership explains -2.35% of the variance in recognition scores", message = sprintf("This is what$R^2$tells us and you can't have a negative percentage of variance explained!")), correct = "Correct - well done!", random_answer_order = TRUE, allow_retry = T ) )  The confidence interval for the effect of face_type tells us that (assuming that this is one of the 95% of samples that yields a confidence interval containing the population value) the true difference between recognition scores when a picture of the actual person's face was used compared to a different face lies between -4.596 and -0.104. This means it could be anything between a very small effect (practically zero) to a fairly large one (4 to 5 fewer faces recognised). The p-value suggests that this difference is significant at p = 0.046. In other words, we can significantly predict recognition scores from the type of face implanted while the participant met people with calcite paste on their face. ### A more traditional approach We have just used a hierarchical linear model to compare two means using a t-statistic. You have likely been taught about t-tests as tests in their own right, and also taught that to compare dependent means you should use the paired (dependent) t-test. You can do the paired t-test in R using the t.test() function described in adventr_15: new_model <- t.test(outcome ~ predictor, data = tibble, paired = FALSE, var.equal = FALSE, conf.level = 0.95, na.action) In which: • new_model: an object created that contains information about the model. We can get summary statistics for this model by executing the name of the model. • outcome: the variable that contains the scores for the outcome measure (in this case recognition). • predictor: the variable that contains information about to which group a score belongs (in this case face_type). • tibble: the name of the tibble containing the data (in this case calcite_tidy) • paired: by default scores are treated as independent (paired = FALSE), but in this case we want to treat scores as dependent so we'd change this to paired = TRUE • conf.level: determines the alpha level for the p-value and confidence intervals. By default it is 0.95 (for 95% confidence intervals) and usually you’d exclude this option, but if you want to use a different value, say 99%, you could include conf.level = 0.99. • na.action: If you have complete data (as we have here) exclude this option, but if you have missing values (i.e., ‘NA’s in the data frame) then it can be useful to include na.action = na.exclude, which will exclude all cases with missing values To get a t-test for the current data we would execute: t.test(recognition ~ face_type, data = calcite_tidy, paired = TRUE)  or to create an object that we can use later: calcite_t <- t.test(recognition ~ face_type, data = calcite_tidy, paired = TRUE) calcite_t  Try this in the code box:   calcite_t <- t.test(recognition ~ face_type, data = calcite_tidy, paired = TRUE) calcite_t  The value of t (2.1346) and the corresponding p (0.04604) are identical to the linear model that we have already fitted. However, for more complicated designs the linear model has many benefits. ## Robust models The WRS2 package [@RN10205] wraps up a few of the many functions described by Wilcox to perform robust variants of tests [@RN5244]. We'll look at one function that compare two dependent means: yuend(x, y, tr = 0.2) The arguments are: • x: scores in the first condition. • y: scores in the first condition. • tr = 0.2: determines the level of trim. The default is a 20% trim, which has been shown to perform well, but you could change this to 0.1 (10% trim) or any value up to 0.5. You can omit this argument if you're happy with a 20% trim. To specify x and y we need to extract them from our tibble. (I will describe one approach that is consistent with using tidy data, but there are others.) We would start by creating separate objects for the scores in the actual and different conditions, using select() and filter(), then insert these into the yuend() function: actual <- calcite_tidy %>% dplyr::filter(face_type == "actual") %>% dplyr::select(recognition) different <- calcite_tidy %>% dplyr::filter(face_type == "different") %>% dplyr::select(recognition) WRS2::yuend(different, actual, tr = 0.2)  The first line takes the calcite_tidy tibble and filters it to retain only the scores relating to when the variable face_type has a value equal to "actual", it then selects the variable recognition, which contains the recognition scores. This pipe basically extracts the 20 recognition scores from the actual condition and stores them in a tibble called actual. The second command does the same but extracts the 20 recognition scores from the different condition and stores them in a tibble called different. The third lines applies the yuend() to the two tibbles we have just created. (You can exclude tr = 0.2 because it is the default but I have left it there to make explicit that we are performing a 20% trim.) Execute these commands in the code box (if stuck, click )   actual <- calcite_tidy %>% dplyr::filter(face_type == "actual") %>% dplyr::select(recognition) different <- calcite_tidy %>% dplyr::filter(face_type == "different") %>% dplyr::select(recognition) WRS2::yuend(different, actual, tr = 0.2)  Note that the trimmed mean difference is -2.75 (rather than the raw mean difference of -2.35). This difference is not significant because p = 0.087 and the confidence intervals cross zero, indicating that if this is one of the 95% of samples where the confidence interval contains the population value, then the population difference between means could be positive, negative or zero. The effect size is a robust variant of Cohen's d so can be interpreted as implanting a different face producing recognition scores 0.52 of a standard deviation below that for the actual face. ## Bayesian models ### Bayes factors Like in adventr_15 we can use the ttestBF() function from the BayesFactor package. This function was explained in adventr_15, but with repeated measures data we can't use formula input: object = ttestBF(x, y, data = tibble, paired = FALSE, rscale = "medium") To recap, by default the function assumes independent scores (paired = FALSE) so for dependent scores we'd change this to paired = TRUE. The function uses default priors that can be specified as a number or as "medium" (the default), "wide", and "ultrawide". These labels correspond to r scale values of$^\sqrt{2}/_2$, 1, and$\sqrt{2}$. These priors were explained in adventr_11. To specify x and y we can use the actual and different objects that we created for the yuend() function. The ttextBF() is expecting only scores, not a tibble or data frame and the objects actual and different are tibbles. In both objects the scores are stored in a variable called recognition so we can specify the scores as different$recognition and actual$recognition. (Remember that $ means 'in', so different$recognition translates as 'the variable called recognition in the tibble called different'. It's a good idea to save this model into a new object (lets call it calcite_bf) because you can do useful things with it (we didn't do this for the yuend() functions, but you could). For our data (using the default prior) we could execute: calcite_bf <- BayesFactor::ttestBF(different$recognition, actual$recognition, paired = TRUE) calcite_bf  The first line creates the object calcite_bf as described above and the second line prints it for us to see. Try this in the code box: actual <- calcite_tidy %>% dplyr::filter(face_type == "actual") %>% dplyr::select(recognition) different <- calcite_tidy %>% dplyr::filter(face_type == "different") %>% dplyr::select(recognition)    calcite_bf <- BayesFactor::ttestBF(different$recognition, actual$recognition, paired = TRUE) calcite_bf  The value matches that in the book (an adventure in statistics). The Bayes factor is 1.48, which means that the probability of the data given the alternative hypothesis is 1.48 times greater than the probability of the data given the null hypothesis. Put another way, you should shift your belief in the alternative hypothesis relative to the null by a factor of 1.48 (i.e. not by much at all). The result favours the alternative hypothesis over the null by an amount that is ‘barely worth mentioning’. ### Bayesian parameter estimates We can get estimate of the parameters in the model using the posterior() function that we used in adventr_15 to extract the same information. The process is identical to what we did in that previous tutorial. We place the object we just created (calcite_bf) into the function and specify a number of iterations, then use the summary() function to view what we have created: calcite_bf_est <- BayesFactor::posterior(calcite_bf, iterations = 10000) summary(calcite_bf_est)  Try executing these two commands to view the estimates: actual <- calcite_tidy %>% dplyr::filter(face_type == "actual") %>% dplyr::select(recognition) different <- calcite_tidy %>% dplyr::filter(face_type == "different") %>% dplyr::select(recognition) calcite_bf <- BayesFactor::ttestBF(different$recognition, actual\$recognition, paired = TRUE)



calcite_bf_est <- BayesFactor::posterior(calcite_bf, iterations = 10000)
summary(calcite_bf_est)


The output will differ each time you run this (because it is based on a sampling process) so Figure 2 shows my output. The Bayesian estimate, assuming that the alternative hypothesis is true, of the difference between means (mu) is -2.09. You can use the 2.5% and 97.5% quantiles as the limits of the 95% credible interval for that difference. Again, assuming the alternative hypothesis is true, there is a 95% probability that the difference between means is somewhere between -4.298 and 0.056. Remember that you cannot use a credible interval to test hypotheses because it is constructed assuming that the alternative hypothesis is true. It tells you the interval within which the effect will fall with a 95% probability, assuming that the effect exists.

## Other resources

### Statistics

• The tutorials typically follow examples described in detail in @RN10163, so for most of them there's a thorough account in there. You might also find @RN4832 useful for the R stuff.
• There are free lectures and screen casts on my YouTube channel
• There are free statistical resources on my website www.discoveringstatistics.com