In andjar/ALASCA: Assorted, Linear ASCA Functions

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%", 
  fig.asp = 0.7,
  fig.width = 12,
  fig.align = "center",
  cache = FALSE,
  external = FALSE
)
library("ALASCA")
library("data.table")
library("ggplot2")
theme_set(theme_bw() + theme(legend.position = "bottom"))

Regression models in ALASCA

Please note: We do not comment on scaling in this vignette. For details, see our paper ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.

Observational data

Generating a data set

We will start by creating an artificial data set with 100 participants, 5 time points, 2 groups, and 20 variables. The variables follow four patterns

Linear increase
Linear decrease
A v-shape
An inverted v-shape

The two groups are different at baseline and one of the groups have larger changes throughout the study.

wzxhzdk:2

Overall (ignoring the random effects), the four patterns look like this:

ggplot(df[variable %in% c("variable_1", "variable_2", "variable_3", "variable_4"),],
       aes(time, value, color = group)) +
  geom_smooth() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_viridis_d(end = 0.8)

We want time to be a categorical variable:

df[, time := paste0("t_", time)]

Common time and group development

With ALASCA, you can either assess how the two groups develop overall, or you can focus your analyses on how they diverge (see below). In R, you commonly specify your regression equation as value ~ v1*v2 + (1|v3) where value is outcome, v1 and v2 are predictors, and v3 defines the random intercept. v1*v2 is shorthand for v1 + v1:v2 + v2, that is, the main effects of v1, v2, and their interaction v1:v2. You can, of course, also write v1 + v1:v2 + v2 - this will become important when you do not want the main effect of v2 (see randomized trial below).

res <- ALASCA(
  df,
  value ~ time*group + (1|id)
)
plot(res, component = c(1,2), type = 'effect')

Accessing the regression coefficients and p-values

The regression coefficients can be fetched with:

res$regression_coefficients

P-values are not available when you use hte Rfast package. If you choose to use the lme4 package instead, p-values will be estimated with the lmerTest package:

res <- ALASCA(
  df,
  value ~ time*group + (1|id),
  use_Rfast = FALSE
)
res$regression_coefficients

You can also specify a method to adjust for multiple testing (by default, unadjusted p-values are provided). Here, Benjamini-Hochberg is used (alternative: c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none")):

res <- ALASCA(
  df,
  value ~ time*group + (1|id),
  use_Rfast = FALSE,
  p_adjust_method = "BH"
)
res$regression_coefficients

The values are then adjusted within each variable level (e.g., adjusted within time point 1).

Separating time and group effects

Now, we would like to assess how the two groups diverge over time. To do this, we will separate the time and group effect: separate_effects = TRUE.

res <- ALASCA(
  df,
  value ~ time*group + (1|id),
  separate_effects = TRUE
)

Importantly, we must plot the two effects separately. We begin with the time effect of the reference group:

plot(res, component = c(1,2), effect = 2, type = 'effect')

Next, we will look at the effect of group and how the two groups differ:

flip(res, effect = 2) # We flip the model to simplify the comparison with the previous model
plot(res, component = c(1,2), effect = 2, type = 'effect')

In short, we can see that the difference between the two groups increases for some variables (e.g., variable_13 and variable_8). But, the lower plot shows that some variables become more similar with time (e.g., variable_15). An easy way to confirm how a variable develops is to look at the marginal means. For example, variable_8, variable_13, and variable_15 can be assessed like this:

plot(res, type = 'prediction', variable = c("variable_8", "variable_13", "variable_15"))

See the vignette on plotting the model for more visualizations.

Randomized, interventional data

Generating a data set

We will start by creating an artificial data set with 100 participants, 5 time points, 2 groups, and 20 variables. The variables follow four patterns

Linear increase
Linear decrease
A v-shape
An inverted v-shape

The two groups have similar baseline, but one of the groups have a larger change in the values.

wzxhzdk:13

Overall (ignoring the random effects), the four patterns look like this:

ggplot(df[variable %in% c("variable_1", "variable_2", "variable_3", "variable_4"),],
       aes(time, value, color = group)) +
  geom_smooth() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_viridis_d(end = 0.8)

We want time to be a categorical variable:

df[, time := paste0("t_", time)]

Regression model

A regression model that assumes equal baseline between the two groups can be defined as:

res <- ALASCA(
  df,
  value ~ time + time:group + (1|id),
  equal_baseline = TRUE
)

plot(res, component = c(1,2), type = 'effect')

Why is equal_baseline = TRUE necessary? Well, it is due to how the model matrices are made by r - by default, there is an interaction term between group and the first time point. Just look at these two model matrices:

res_2 <- ALASCA(
  df,
  value ~ time + time:group + (1|id),
  equal_baseline = FALSE
)

res$effect_list$model_matrix[[1]][1:3, ]
res_2$effect_list$model_matrix[[1]][1:3, ]