Introduction

This vignette documents the use of the MANOVA.RM package for the analysis of semi-parametric repeated measures designs and multivariate data. The package consists of three parts - one for repeated measurements, one for multivariate data and one for the combination of both - which will be explained in detail below. All functions calculate the Wald-type statistic (WTS) as well as the ANOVA-type statistic (ATS) for repeated measures and a modification thereof (MATS) for multivariate data based on means. These test statistics can be used for arbitrary semi-parametric designs, even with unequal covariance matrices among groups and small sample sizes. For a short overview of the test statistics and corresponding references see the next section. Furthermore, different resampling approaches are provided in order to improve the small sample behavior of the test statistics. The WTS requires non-singular covariance matrices. If the covariance matrix is singular, a warning is returned.

For detailed explanations and examples, see also Friedrich, Konietschke and Pauly (2019).

Theoretical background

We consider the following model $$ X_{ik} = \mu_i + \varepsilon_{ik} $$ for observation vectors from group $i$ and individual $k$. Here, $\mu_i$ denotes the group mean which we wish to infer and $\varepsilon_{ik}$ are the common error terms with expectation 0 and existing (co-)variances. We do not need to assume normally distributed errors for these methods and the covariances may be unequal between groups. Null hypotheses are formulated via contrast matrices, i.e., as $H \mu = 0$, where the rows of $H$ sum to zero. Then the Wald-type statistic (WTS) is a quadratic form in the estimated mean vector involving the Moore-Penrose inverse of the (transformed) empirical covariance matrix. Under the null hypothesis the WTS is asymptotically $\chi^2$ distributed with rank(H) degrees of freedom. However, this approximation is only valid for large sample sizes, see e.g. Brunner (2001). We therefore recommend to use a resampling approach as proposed by Konietschke et al. (2015) or Friedrich et al. (2017).

Since the WTS requires non-singular covariance matrices, two other test statistics have been proposed: the ATS for repeated measures designs (Brunner, 2001) and the MATS for multivariate data (Friedrich and Pauly, 2018). The ATS is a scaled quadratic form in the mean vector, which is usually approximated by an F-distribution with estimated degrees of freedom. This approach is also implemented in the SAS PROC Mixed procedure. However, it is rather conservative for small sample sizes and does not provide an asymptotic level $\alpha$ test.

The MATS for multivariate data has the advantage of being invariant under scale transformations of the data, an important feature for multivariate data. Since its asymptotic distribution involves unknown parameters, we again propose to use bootstrap approaches instead (Friedrich and Pauly, 2018).

Difference between RM and MANOVA design

Note that a combination of both types of data, i.e., multivariate longitudinal data, can now also be analyzed with MANOVA.RM, see below.

The RM function

The RM function calculates the WTS and ATS in a repeated measures design with an arbitrary number of crossed whole-plot (between-subject) and sub-plot (within-subject) factors. The resampling methods provided are a permutation procedure, a parametric bootstrap approach and a wild bootstrap using Rademacher weights. The permutation procedure provides no valid approach for the ATS and is thus not implemented.

Data Example 1 (One between-subject and two within-subject factors)

For illustration purposes, we consider the data set o2cons, which is included in MANOVA.RM.

library(MANOVA.RM)
data(o2cons)

The data set contains measurements on the oxygen consumption of leukocytes in the presence and absence of inactivated staphylococci at three consecutive time points. More details on the study and the statistical model can be found in Friedrich et al. (2017). Due to the study design, both time and staphylococci are within-subject factors while the treatment (Verum vs. Placebo) is a between-subject factor.

head(o2cons)

We will now analyze this data using the RM function. The RM function takes as arguments:

model1 <- RM(O2 ~ Group * Staphylococci * Time, data = o2cons, 
             subject = "Subject", no.subf = 2, iter = 1000, 
             resampling = "Perm", seed = 1234)
summary(model1)

The output consists of four parts: model1$Descriptive gives an overview of the descriptive statistics: The number of observations, mean and confidence intervals (based on either quantiles of the t-distribution or the resampling-based quantiles) are displayed for each factor level combination. model1$WTS contains the results for the Wald-type test: The test statistic, degree of freedom and p-values based on the asymptotic (\chi^2) distribution are displayed. Note that the $\chi^2$ approximation is very liberal for small sample sizes. model1$ATS contains the corresponding results based on the ATS. This test statistic tends to rather conservative decisions in case of small sample sizes and is even asymptotically only an approximation, thus not providing an asymptotic level $\alpha$ test. Finally, model1$resampling contains the p-values based on the chosen resampling approach. For the ATS, the permutation procedure cannot be applied. Due to the above mentioned issues for small sample sizes, the resampling procedure is recommended.

Data Example 2 (Two within-subject and two between-subject factors)

We consider the data set EEG from the MANOVA.RM package: At the Department of Neurology, University Clinic of Salzburg, 160 patients were diagnosed with either Alzheimer's Disease (AD), mild cognitive impairments (MCI), or subjective cognitive complaints without clinically significant deficits (SCC), based on neuropsychological diagnostics (Bathke et al.(2018)). This data set contains z-scores for brain rate and Hjorth complexity, each measured at frontal, temporal and central electrode positions and averaged across hemispheres. In addition to standardization, complexity values were multiplied by -1 in order to make them more easily comparable to brain rate values: For brain rate we know that the values decrease with age and pathology, while Hjorth complexity values are known to increase with age and pathology. The three between-subject factors considered were sex (men vs. women), diagnosis (AD vs. MCI vs. SCC), and age ($< 70$ vs. $>= 70$ years). Additionally, the within-subject factors region (frontal, temporal, central) and feature (brain rate, complexity) structure the response vector.

data(EEG)
EEG_model <- RM(resp ~ sex * diagnosis * feature * region, 
                data = EEG, subject = "id", within = c("feature", "region"), 
                resampling = "WildBS",
                iter = 1000,  alpha = 0.01, seed = 987)
summary(EEG_model)

We find significant effects at level $\alpha = 0.01$ of the between-subject factors sex and diagnosis, while none of the within-subject factors or interactions become significant.

Plotting

The RM() function is equipped with a plotting option, displaying the calculated means along with the $(1-\alpha)$ confidence intervals. The plot function takes an RM object as an argument. Furthermore, additional graphical parameters can be used to customize the plots. The optional argument legendpos specifies the position of the legend in higher-way layouts, while the argument gap (default 0.1) specifies the distance between the error bars.

plot(model1, leg = FALSE)

For illustration purposes, we reduce the EEG-model above to a two-way design:

EEGnew <- EEG[EEG$region == "temporal", ]
EEG_model2 <- RM(resp ~ sex*feature, within = "feature", no.subf = 1, subject = "id", data = EEGnew)
plot(EEG_model2, legendpos = "topleft", col = c(4, 2))

The MANOVA function

The MANOVA function calculates the WTS for multivariate data in a design with crossed or nested factors. Additionally, a modified ANOVA-type statistic (MATS) is calculated which has the additional advantage of being applicable to designs involving singular covariance matrices and is invariant under scale transformations of the data, see Friedrich and Pauly (2018) for details. The resampling methods provided are a parametric bootstrap approach and a wild bootstrap using Rademacher weights. Note that only balanced nested designs (i.e., the same number of factor levels $b$ for each level of the factor $A$) with up to three factors are implemented. Designs involving both crossed and nested factors are not implemented. Data must be provided in long format (for wide format, see MANOVA.wide below).

Data Example MANOVA (two crossed factors)

We again consider the data set EEG from the MANOVA.RM package, but now we ignore the within-subject factors. Therefore, we are now in a multivariate setting with 6 measurements per patient and three crossed factors sex, age and diagnosis. Due to the small number of subjects in some groups (e.g., only 2 male patients aged $<$ 70 were diagnosed with AD) we restrict our analyses to two factors at a time. The analysis of this example is shown below.

The most important arguments of the MANOVA function are:

data(EEG)
EEG_MANOVA <- MANOVA(resp ~ sex * diagnosis, 
                     data = EEG, subject = "id", resampling = "paramBS", 
                     iter = 1000,  alpha = 0.01, seed = 987)
summary(EEG_MANOVA)

The output consists of several parts: First, some descriptive statistics of the data set are displayed, namely the sample size and mean for each factor level combination and each dimension. (Dimensions occur in the same order as in the original data set. For a labeled output, use MANOVA.wide().) In this example, Mean 1 to Mean 3 correspond to the brainrate (temporal, frontal, central) while Mean 4--6 correspond to complexity. Second, the results based on the WTS are displayed. For each factor, the test statistic, degree of freedom and p-value is given. For the MATS, only the value of the test statistic is given, since inference is here based on the resampling procedure only. The resampling-based p-values are finally displayed for both test statistics.

The MANOVA.wide function

The MANOVA.wide function is used for data provided in wide format, i.e., with one row per unit. Input and output are almost identical to the MANOVA function, except that no subject variable needs to be specified. The formula now consists of the matrix of outcome variables (bound together via cbind()) on the left hand side of the \~ operator and the factors of interest on the right. For an example we use a data set on producing plastic film from Krzanowski (1998, p. 381), see also summary.manova:

tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
          6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
           9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
             2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
rate     <- gl(2,10, labels = c("Low", "High"))
additive <- gl(2, 5, length = 20, labels = c("Low", "High"))

example <- data.frame(tear, gloss, opacity, rate, additive)
fit <- MANOVA.wide(cbind(tear, gloss, opacity) ~ rate * additive, data = example, iter = 1000)
summary(fit)

Confidence regions

A function for calculating and plotting of confidence regions is available for MANOVA objects. Details on the methods can be found in Friedrich and Pauly (2018).

Confidence regions

Confidence regions can be calculated using the conf.reg function. Note that confidence regions can only be plotted in designs with 2 dimensions. The conf.reg function takes as arguments:

As an example, we consider the data set water from the package HSAUR3. The data set contains measurements of mortality and drinking water hardness for 61 cities in England and Wales. Suppose we want to analyse whether these measurements differ between northern and southern towns. Since the data set is in wide format, we need to use the MANOVA.wide function.

if (requireNamespace("HSAUR3", quietly = TRUE)) {
library(HSAUR3)
data(water)
test <- MANOVA.wide(cbind(mortality, hardness) ~ location, data = water, iter = 1000, resampling = "paramBS", seed = 123)
summary(test)
cr <- conf.reg(test)
cr
plot(cr, col = 2, lty = 2, xlab = "Difference in mortality", ylab ="Difference in water hardness")
}

The output consists of the necessary parameters specifying the ellipsoid: the center, the eigenvectors which determine the axes of the ellipsoid as well as the scaling factors for the eigenvectors, which are calculated based on the eigenvalues, the bootstrap quantile and the total sample size. For more information on the construction of the confidence ellipses see Friedrich and Pauly (2018). For observations with two dimensions, the confidence ellipse can be plotted using the generic plot function. The usual plotting parameters can be used to customize the plots.

Post-hoc comparisons

Multivariate post-hoc comparisons and simultaneous confidence intervals for contrasts

Calculation of simultaneous confidence intervals and multivariate p-values for contrasts of the mean vector is based on the sum statistic, see Friedrich and Pauly (2018) for details. Note that the confidence intervals and p-values returned are simultaneous, i.e., they maintain the given alpha-level. Confidence intervals are calculated based on summary effects, i.e., averaging over all dimensions, whereas the returned p-values are multivariate. If the original model contains more than one factor, the corresponding contrasts are calculated for all combinations of factor levels. If this is not desired, the parameter interaction may be set to FALSE and the factor of interest specified. This has the same effect as fitting the model with only the respective factor and then calculating the post-hoc tests based on this model. The function simCI takes the following arguments:

As an example, we consider the EEG_MANOVA example from above:

# pairwise comparison using Tukey contrasts
simCI(EEG_MANOVA, contrast = "pairwise", type = "Tukey")

A one-way layout using MANOVA.wide():

oneway <- MANOVA.wide(cbind(brainrate_temporal, brainrate_central) ~ diagnosis, data = EEGwide, iter = 1000)
# and a user-defined contrast matrix
H <- as.matrix(cbind(rep(1, 5), -1*Matrix::Diagonal(5)))
# user-specified comparison
simCI(oneway, contrast = "user-defined", contmat = H)

Univariate comparisons

If the global null hypothesis is rejected, it may be of interest to infer the univariate outcomes that caused the rejection. To answer this question, one can simply calculate the univariate p-values and adjust them accordingly for multiple testing, e.g., using Bonferroni-correction. An example is given below. We consider a one-way layout of the EEG data with influencing factor sex and all 6 outcome variables:

model_sex <- MANOVA.wide(cbind(brainrate_temporal, brainrate_central, brainrate_frontal,
                            complexity_temporal, complexity_central, complexity_frontal) ~ sex, data = EEGwide, iter = 1000, seed = 987)
summary(model_sex)

Since the global hypothesis is rejected at 5% level, we continue with the univariate calculations:

EEG1 <- MANOVA.wide(brainrate_temporal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG2 <- MANOVA.wide(brainrate_central ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG3 <- MANOVA.wide(brainrate_frontal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG4 <- MANOVA.wide(complexity_temporal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG5 <- MANOVA.wide(complexity_central ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG6 <- MANOVA.wide(complexity_frontal ~ sex, data = EEGwide, iter = 1000, seed = 987)

Adjust for multiple testing using the parametric bootstrap MATS and Bonferroni adjustment:

p.adjust(c(EEG1$resampling[, 2], EEG2$resampling[, 2], EEG3$resampling[, 2],
           EEG4$resampling[, 2], EEG5$resampling[, 2], EEG6$resampling[, 2]),
         method = "bonferroni")

This reveals that the central variables (comparison 2 and 5) do not contribute to the significant difference between male and female patients.

Nested Design

To create a data example for a nested design, we use the curdies data set from the GFD package and extend it by introducing an artificial second outcome variable. In this data set, the levels of the nested factor (site) are named uniquely, i.e., levels 1-3 of factor site belong to "WINTER", whereas levels 4-6 belong to "SUMMER". Therefore, nested.levels.unique must be set to TRUE. The code for the analysis using both wide and long format is presented below.

if (requireNamespace("GFD", quietly = TRUE)) {
library(GFD)
data(curdies)
set.seed(123)
curdies$dug2 <- curdies$dugesia + rnorm(36)

# first possibility: MANOVA.wide
fit1 <- MANOVA.wide(cbind(dugesia, dug2) ~ season + season:site, data = curdies, iter = 1000, nested.levels.unique = TRUE, seed = 123)

# second possibility: MANOVA (long format)
dug <- c(curdies$dugesia, curdies$dug2)
season <- rep(curdies$season, 2)
site <- rep(curdies$site, 2)
curd <- data.frame(dug, season, site, subject = rep(1:36, 2))

fit2 <- MANOVA(dug ~ season + season:site, data = curd, subject = "subject", nested.levels.unique = TRUE, seed = 123, iter = 1000)

# comparison of results
summary(fit1)
summary(fit2)
}

The multRM() function

The multRM() function provides a combination of the approaches described above. It is suitable for repeated measures designs, in which multiple outcomes have been recorded at each time point. The multRM() function takes as arguments:

As an example, we again use the EEG dataset. This time, imagine we have two outcomes (brainrate and complexity) measured for each of the three regions (within-subject factor). We additionally consider the between-subject factor sex. The tidyr package can be used to transform our original data to this format.

if (requireNamespace("tidyr", quietly = TRUE)) {
library(tidyr)
eeg <- spread(EEG, feature, resp)
head(eeg)
fit <- multRM(cbind(brainrate, complexity) ~ sex * region, data = eeg, subject = "id", within = "region", iter = 1000)
summary(fit)
}

The output is similar to that of MANOVA() described above.

References



smn74/MANOVA.RM documentation built on Aug. 30, 2023, 12:01 a.m.