knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%", 
  fig.asp = 0.7,
  fig.width = 12,
  fig.align = "center",
  cache = FALSE,
  external = FALSE
)
library("ALASCA")
library("data.table")
library("ggplot2")
theme_set(
  theme_bw() + theme(legend.position = "bottom")
  )

Using ALASCA for classification or prediction

We will use the same code to simulate data sets here as in the Regression vignette. In brief, we generate a training and test data set, and use ALASCA and PLS-DA to test group classification.

Generate a data set

We will start by creating an artificial data set with 100 participants, 5 time points, 2 groups, and 20 variables. The variables follow four patterns

The two groups are different at baseline and one of the groups have larger changes throughout the study.

wzxhzdk:1

Overall (ignoring the random effects), the four patterns look like this:

ggplot(df[variable %in% c("variable_1", "variable_2", "variable_3", "variable_4"),],
       aes(time, value, color = group)) +
  geom_smooth() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_viridis_d(end = 0.8)

We want time to be a categorical variable:

df[, time := paste0("t_", time)]

Generate a second data set

We now generate a second data set using the same code as above. We will do classification on these data.

wzxhzdk:4

Subtract baseline

Later on, we will do classification on the test data set. But, as we would like to take individual differences into account, we create copies of the data sets and subtract the baseline for each participant.

wzxhzdk:5

Run ALASCA and calculate scores

We now use the first data set to create an ALASCA model

wzxhzdk:6

Next, we use the ALASCA::predict_scores() function introduced in version 1.0.14 to get a score for each data point. Note that the number of ASCA components can be specified. For simplicity, we only use three here, but increasing the number of components may improve the classification model.

wzxhzdk:7

Just for illustration, here is the first three PC scores of training set (on which we built the ALASCA model, without removing baseline):

wzxhzdk:8

And here is the test data set:

wzxhzdk:9

Using PLS-DA for classification

Since ASCA is not intended to be used for classification, we will construct a PLS-DA model using ASCA scores. Note that the number of components must be specified. In this example, we use four components as illustration.

wzxhzdk:10

Next, we do prediction on the test data set using the PLS-DA model above.

wzxhzdk:11

And, as we can see, the model does quite well:

caret::confusionMatrix(table(kkk[, .(pred, group)]))


andjar/ALASCA documentation built on March 2, 2024, 12:55 p.m.