knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%", fig.asp = 0.7, fig.width = 12, fig.align = "center", cache = FALSE, external = FALSE ) library("ALASCA") library("data.table") library("ggplot2") theme_set( theme_bw() + theme(legend.position = "bottom") )
We will use the same code to simulate data sets here as in the Regression vignette. In brief, we generate a training and test data set, and use ALASCA and PLS-DA to test group classification.
We will start by creating an artificial data set with 100 participants, 5 time points, 2 groups, and 20 variables. The variables follow four patterns
The two groups are different at baseline and one of the groups have larger changes throughout the study.
Overall (ignoring the random effects), the four patterns look like this:
ggplot(df[variable %in% c("variable_1", "variable_2", "variable_3", "variable_4"),], aes(time, value, color = group)) + geom_smooth() + facet_wrap(~variable, scales = "free_y") + scale_color_viridis_d(end = 0.8)
We want time to be a categorical variable:
df[, time := paste0("t_", time)]
We now generate a second data set using the same code as above. We will do classification on these data.
Later on, we will do classification on the test data set. But, as we would like to take individual differences into account, we create copies of the data sets and subtract the baseline for each participant.
We now use the first data set to create an ALASCA model
Next, we use the ALASCA::predict_scores()
function introduced in version 1.0.14 to get a score for each data point. Note that the number of ASCA components can be specified. For simplicity, we only use three here, but increasing the number of components may improve the classification model.
Just for illustration, here is the first three PC scores of training set (on which we built the ALASCA model, without removing baseline):
And here is the test data set:
Since ASCA is not intended to be used for classification, we will construct a PLS-DA model using ASCA scores. Note that the number of components must be specified. In this example, we use four components as illustration.
Next, we do prediction on the test data set using the PLS-DA model above.
And, as we can see, the model does quite well:
caret::confusionMatrix(table(kkk[, .(pred, group)]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.