soplsr: Block dimension reduction by SO-PLS

View source: R/soplsr.R

soplsR Documentation

Block dimension reduction by SO-PLS

Description

Function soplsr implements dimension reductions of pre-selected blocks of variables (= set of columns) of a reference (= training) matrix, by sequential orthogonalization-PLS (said "SO-PLS").

Function soplsrcv perfoms repeteated cross-validation of an SO-PLS model in order to choose the optimal lv combination from the different blocks.

SO-PLS is described for instance in Menichelli et al. (2014), Biancolillo et al. (2015) and Biancolillo (2016).

The block reduction consists in calculating latent variables (= scores) for each block, each block being sequentially orthogonalized to the information computed from the previous blocks.

The function allows giving a priori weights to the rows of the reference matrix in the calculations.

Auxiliary functions

transform Calculates the LVs for any new matrices list Xlist from the model.

predict Calculates the predictions for any new matrices list Xlist from the model.

Usage


soplsr(Xlist, Y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlv)

soplsrcv(Xlist, Y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, nlvlist = list(), 
nbrep = 30, cvmethod = "kfolds", seed = 123, samplingk = NULL, nfolds = 7, 
optimisation = c("global","sequential")[1], 
selection = c("localmin","globalmin","1std")[1], majorityvote = FALSE)


## S3 method for class 'Soplsr'
transform(object, X, ...)  

## S3 method for class 'Soplsr'
predict(object, X, ...)  

Arguments

Xlist

A list of matrices or data frames of reference (= training) observations.

X

For the auxiliary functions: list of new X-data, with the same variables than the training X-data.

Y

A n x q matrix or data frame, or a vector of length n, of reference (= training) responses.

Xscaling

vector (of length Xlist) of variable scaling for each datablock, among "none" (mean-centering only), "pareto" (mean-centering and pareto scaling), "sd" (mean-centering and unit variance scaling). If "pareto" or "sd", uncorrected standard deviation is used.

Yscaling

variable scaling for the Y-block, among "none" (mean-centering only), "pareto" (mean-centering and pareto scaling), "sd" (mean-centering and unit variance scaling). If "pareto" or "sd", uncorrected standard deviation is used.

weights

a priori weights to the rows of the reference matrix in the calculations.

nlv

A vector of same length as the number of blocks defining the number of scores to calculate for each block, or a single number. In this last case, the same number of scores is used for all the blocks.

nlvlist

A list of same length as the number of X-blocks. Each component of the list gives the number of PLS components of the corresponding X-block to test.

nbrep

An integer, setting the number of CV repetitions. Default value is 30.

cvmethod

"kfolds" for k-folds cross-validation, or "loo" for leave-one-out.

seed

a numeric. Seed used for the repeated resampling, and if cvmethod is "kfolds" and samplingk is not NULL.

samplingk

A vector of length n. The elements are the values of a qualitative variable used for stratified partition creation. If NULL, the first observation is set in the first fold, the second observation in the second fold, etc...

nfolds

An integer, setting the number of partitions to create. Default value is 7.

optimisation

"global" or "sequential" optimisation of the number of components. If "sequential", the optimal lv number is found for the first X-block, then for the 2nd one, etc...

selection

a character indicating the selection method to use to choose the optimal combination of components, among "localmin","globalmin","1std". If "localmin": the optimal combination corresponds to the first local minimum of the mean CV rmse. If "globalmin" : the optimal combination corresponds to the minimum mean CV rmse. If "1std" (one standard errror rule): it corresponds to the first combination after which the mean cross-validated rmse does not decrease significantly.

majorityvote

only if optimisation is "global" or one X-block. If majorityvote is TRUE, the optimal combination is chosen for each Y variable, with the chosen selection, before a majority vote. If majorityvote is "FALSE, the optimal combination is simply chosen with the chosen selection.

object

For the auxiliary functions: A fitted model, output of a call to the main functions.

...

For the auxiliary functions: Optional arguments. Not used.

Value

For soplsr:

fm

A list of the plsr models.

T

A matrix with the concatenated scores calculated from the X-blocks.

pred

A matrice n x q with the calculated fitted values.

xmeans

list of vectors of X-mean values.

ymeans

vector of Y-mean values.

xscales

list of vectors of X-scaling values.

yscales

vector of Y-scaling values.

b

A list of X-loading weights, used in the orthogonalization step.

weights

Weights applied to the training observations.

nlv

vector of numbers of latent variables from each X-block.

For transform.Soplsr: the LVs calculated for the new matrices list Xlist from the model.

For predict.Soplsr: predicted values for each observation

For soplsrcv:

lvcombi

matrix or list of matrices, of tested component combinations.

optimcombi

the number of PLS components of each X-block allowing the optimisation of the mean rmseCV.

rmseCV_byY

matrix or list of matrices of mean and sd of cross-validated RMSE in the model for each combination and each response variable.

ExplVarCV_byY

matrix or list of matrices of mean and sd of cross-validated explained variances in the model for each combination and each response variable.

rmseCV

matrix or list of matrices of mean and sd of cross-validated RMSE in the model for each combination and response variables.

ExplVarCV

matrix or list of matrices of mean and sd of cross-validated explained variances in the model for each combination and response variables.

References

- Biancolillo et al. , 2015. Combining SO-PLS and linear discriminant analysis for multi-block classification. Chemometrics and Intelligent Laboratory Systems, 141, 58-67.

- Biancolillo, A. 2016. Method development in the area of multi-block analysis focused on food analysis. PhD. University of copenhagen.

- Menichelli et al., 2014. SO-PLS as an exploratory tool for path modelling. Food Quality and Preference, 36, 122-134.

- Tenenhaus, M., 1998. La régression PLS: théorie et pratique. Editions Technip, Paris, France.

See Also

soplsr_soplsda_allsteps function to help determine the optimal number of latent variables, perform a permutation test, calculate model parameters and predict new observations.

Examples


N <- 10 ; p <- 12
set.seed(1)
X <- matrix(rnorm(N * p, mean = 10), ncol = p, byrow = TRUE)
Y <- matrix(rnorm(N * 2, mean = 10), ncol = 2, byrow = TRUE)
colnames(X) <- paste("varx", 1:ncol(X), sep = "")
colnames(Y) <- paste("vary", 1:ncol(Y), sep = "")
rownames(X) <- rownames(Y) <- paste("obs", 1:nrow(X), sep = "")
set.seed(NULL)
X
Y

n <- nrow(X)

X_list <- list(X[,1:4], X[,5:7], X[,9:ncol(X)])
X_list_2 <- list(X[1:2,1:4], X[1:2,5:7], X[1:2,9:ncol(X)])

soplsrcv(X_list, Y, Xscaling = c("none", "pareto", "sd")[1], 
Yscaling = c("none", "pareto", "sd")[1], weights = NULL, 
nlvlist=list(0:1, 1:2, 0:1), nbrep=1, cvmethod="loo", seed = 123, samplingk=NULL,
optimisation="global", selection="localmin", majorityvote=FALSE)


ncomp <- 2
fm <- soplsr(X_list, Y, nlv = ncomp)
transform(fm, X_list_2)
predict(fm, X_list_2)

mse(predict(fm, X_list), Y)

# VIP calculation based on the proportion of Y-variance explained by the components
vip(fm$fm[[1]], X_list[[1]], Y = NULL, nlv = ncomp)
vip(fm$fm[[2]], X_list[[2]], Y = NULL, nlv = ncomp)
vip(fm$fm[[3]], X_list[[3]], Y = NULL, nlv = ncomp)

ncomp <- c(2, 0, 3)
fm <- soplsr(X_list, Y, nlv = ncomp)
transform(fm, X_list_2)
predict(fm, X_list_2)
mse(predict(fm, X_list), Y)

ncomp <- 0
fm <- soplsr(X_list, Y, nlv = ncomp)
transform(fm, X_list_2)
predict(fm, X_list_2)

ncomp <- 2
weights <- rep(1 / n, n)
#w <- 1:n
fm <- soplsr(X_list, Y, Xscaling = c("sd","pareto","none"), nlv = ncomp, weights = weights)
transform(fm, X_list_2)
predict(fm, X_list_2)


rchemo documentation built on Sept. 11, 2024, 8:05 p.m.