soplsda: Block dimension reduction by SO-PLS-DA

soplsrdaR Documentation

Block dimension reduction by SO-PLS-DA

Description

Function soplsrda implements dimension reductions of pre-selected blocks of variables (= set of columns) of a reference (= training) matrix, by sequential orthogonalization-PLS (said "SO-PLS") in a context of discrimination.

Function soplsrdacv perfoms repeteated cross-validation of an SO-PLS-RDA model in order to choose the optimal lv combination from the different blocks.

The block reduction consists in calculating latent variables (= scores) for each block, each block being sequentially orthogonalized to the information computed from the previous blocks.

The function allows giving a priori weights to the rows of the reference matrix in the calculations.

Insoplslda and soplsqda, probabilistic LDA and QDA are run over the PLS2 LVs, respectively.

Usage


soplsrda(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlv)

soplslda(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlv, 
prior = c("unif", "prop"))

soplsqda(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlv, 
prior = c("unif", "prop"))

soplsrdacv(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlvlist=list(), 
nbrep=30, cvmethod="kfolds", seed = 123, samplingk = NULL, nfolds = 7, 
optimisation = c("global","sequential")[1], 
criterion = c("err","rmse")[1], selection = c("localmin","globalmin","1std")[1])

soplsldacv(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlvlist=list(), 
prior = c("unif", "prop"), nbrep = 30, cvmethod = "kfolds", seed = 123, samplingk = NULL, 
nfolds = 7, optimisation = c("global","sequential")[1], 
criterion = c("err","rmse")[1], selection = c("localmin","globalmin","1std")[1])

soplsqdacv(Xlist, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlvlist = list(), 
prior = c("unif", "prop"), nbrep = 30, cvmethod = "kfolds", seed = 123, samplingk = NULL, 
nfolds = 7, optimisation = c("global","sequential")[1], 
criterion = c("err","rmse")[1], selection = c("localmin","globalmin","1std")[1])

## S3 method for class 'Soplsrda'
transform(object, X, ...) 

## S3 method for class 'Soplsprobda'
transform(object, X, ...) 

## S3 method for class 'Soplsrda'
predict(object, X, ...) 

## S3 method for class 'Soplsprobda'
predict(object, X, ...) 

Arguments

Xlist

For the main functions: A list of matrices or data frames of reference (= training) observations.

X

For the auxiliary functions: list of new X-data, with the same variables than the training X-data.

y

Training class membership (n). Note: If y is a factor, it is replaced by a character vector.

Xscaling

vector (of length Xlist) of variable scaling for each datablock, among "none" (mean-centering only), "pareto" (mean-centering and pareto scaling), "sd" (mean-centering and unit variance scaling). If "pareto" or "sd", uncorrected standard deviation is used.

Yscaling

variable scaling for the Y-block, among "none" (mean-centering only), "pareto" (mean-centering and pareto scaling), "sd" (mean-centering and unit variance scaling). If "pareto" or "sd", uncorrected standard deviation is used.

weights

a priori weights to the rows of the reference matrix in the calculations.

nlv

A vector of same length as the number of blocks defining the number of scores to calculate for each block, or a single number. In this last case, the same number of scores is used for all the blocks.

nlvlist

A list of same length as the number of X-blocks. Each component of the list gives the number of PLS components of the corresponding X-block to test.

nbrep

An integer, setting the number of CV repetitions. Default value is 30.

cvmethod

"kfolds" for k-folds cross-validation, or "loo" for leave-one-out.

seed

a numeric. Seed used for the repeated resampling, and if cvmethod is "kfolds" and samplingk is not NULL.

samplingk

Optional. A vector of length n. The elements are the values of a qualitative variable used for stratified partition creation.

nfolds

An integer, setting the number of partitions to create. Default value is 7.

optimisation

"global" or "sequential" optimisation of the number of components. If "sequential", the optimal lv number is found for the first X-block, then for the 2nd one, etc...

criterion

optimisation criterion among "rmse" and "err" (for classification error rate)

selection

a character indicating the selection method to use to choose the optimal combination of components, among "localmin","globalmin","1std". If "localmin": the optimal combination corresponds to the first local minimum of the mean CV rmse or error rate. If "globalmin" : the optimal combination corresponds to the minimum mean CV rmse or error rate. If "1std" (one standard error rule) : it corresponds to the first combination after which the mean cross-validated rmse or error rate does not decrease significantly.

prior

The prior probabilities of the classes. Possible values are "unif" (default; probabilities are set equal for all the classes) or "prop" (probabilities are set equal to the observed proportions of the classes in y).

object

For the auxiliary functions: A fitted model, output of a call to the main functions.

...

For the auxiliary functions: Optional arguments. Not used.

Value

For soplsrda, soplslda, soplsqda:

fm

list with the PLS models: (T): X-scores matrix; (P): X-loading matrix;(R): The PLS projection matrix (p,nlv); (W): X-loading weights matrix ;(C): The Y-loading weights matrix; (TT): the X-score normalization factor; (xmeans): the centering vector of X (p,1); (ymeans): the centering vector of Y (q,1); (weights): vector of observation weights; (Xscales): X scaling values; (Yscales): Y scaling values; (U): intermediate output.

lev

classes

ni

number of observations in each class

For transform.Soplsrda, transform.Soplsprobda: the LVs Calculated for the new matrices list Xlist from the model.

For predict.Soplsrda, predict.Soplsprobda:

pred

predicted class for each observation

posterior

calculated probability of belonging to a class for each observation

For soplsrdacv, soplsldacv, soplsqdacv:

lvcombi

matrix or list of matrices, of tested component combinations.

optimCombiLine

number of the combination line corresponding to the optimal one. In the case of a sequential optimisation, it is the number of the combination line in the model with all the X-blocks.

optimcombi

the number of PLS components of each X-block allowing the optimisation of the mean rmseCV.

optimExplVarCV

cross-validated explained variance for the optimal soplsda model.

rmseCV

matrix or list of matrices of mean and sd of cross-validated rmse in the model for each combination and response variables.

ExplVarCV

matrix or list of matrices of mean and sd of cross-validated explained variances in the model for each combination and response variables.

errCV

matrix or list of matrices of mean and sd of cross-validated classification error rates in the model for each combination and response variables.

References

- Biancolillo et al. , 2015. Combining SO-PLS and linear discriminant analysis for multi-block classification. Chemometrics and Intelligent Laboratory Systems, 141, 58-67.

- Biancolillo, A. 2016. Method development in the area of multi-block analysis focused on food analysis. PhD. University of copenhagen.

- Menichelli et al., 2014. SO-PLS as an exploratory tool for path modelling. Food Quality and Preference, 36, 122-134.

- Tenenhaus, M., 1998. La régression PLS: théorie et pratique. Editions Technip, Paris, France.

See Also

soplsr_soplsda_allsteps function to help determine the optimal number of latent variables, perform a permutation test, calculate model parameters and predict new observations.

Examples


N <- 10 ; p <- 12
set.seed(1)
X <- matrix(rnorm(N * p, mean = 10), ncol = p, byrow = TRUE)
y <- matrix(sample(c("1", "4", "10"), size = N, replace = TRUE), ncol=1)
colnames(X) <- paste("x", 1:ncol(X), sep = "")
set.seed(NULL)

n <- nrow(X)

X_list <- list(X[,1:4], X[,5:7], X[,9:ncol(X)])
X_list_2 <- list(X[1:2,1:4], X[1:2,5:7], X[1:2,9:ncol(X)])

# EXEMPLE WITH SO-PLS-RDA
soplsrdacv(X_list, y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL,
nlvlist=list(0:1, 1:2, 0:1), nbrep=1, cvmethod="loo", seed = 123, 
samplingk = NULL, nfolds = 3, optimisation = "global", 
criterion = c("err","rmse")[1], selection = "localmin")

ncomp <- 2
fm <- soplsrda(X_list, y, nlv = ncomp)
predict(fm,X_list_2)
transform(fm,X_list_2)

ncomp <- c(2, 0, 3)
fm <- soplsrda(X_list, y, nlv = ncomp)
predict(fm,X_list_2)
transform(fm,X_list_2)

ncomp <- 0
fm <- soplsrda(X_list, y, nlv = ncomp)
predict(fm,X_list_2)
transform(fm,X_list_2)

# EXEMPLE WITH SO-PLS-LDA
ncomp <- 2
weights <- rep(1 / n, n)
#w <- 1:n
soplslda(X_list, y, Xscaling = "none", nlv = ncomp, weights = weights)
soplslda(X_list, y, Xscaling = "pareto", nlv = ncomp, weights = weights)
soplslda(X_list, y, Xscaling = "sd", nlv = ncomp, weights = weights)

fm <- soplslda(X_list, y, Xscaling = c("none","pareto","sd"), nlv = ncomp, weights = weights)
predict(fm,X_list_2)
transform(fm,X_list_2)


rchemo documentation built on Sept. 11, 2024, 8:05 p.m.