Description Usage Arguments Details Value Author(s) References See Also Examples
Function to evaluate the performance of the fitted PLS, sparse PLS, PLSDA, sparse PLSDA, MINT (mint.splsda) and DIABLO (block.splsda) models using various criteria.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99  perf(object, ...)
## S3 method for class 'mixo_pls'
perf(
object,
validation = c("Mfold", "loo"),
folds = 10,
progressBar = FALSE,
...
)
## S3 method for class 'mixo_spls'
perf(
object,
validation = c("Mfold", "loo"),
folds = 10,
progressBar = FALSE,
...
)
## S3 method for class 'mixo_plsda'
perf(
object,
dist = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10,
nrepeat = 1,
auc = FALSE,
progressBar = FALSE,
signif.threshold = 0.01,
cpus = 1,
...
)
## S3 method for class 'mixo_splsda'
perf(
object,
dist = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10,
nrepeat = 1,
auc = FALSE,
progressBar = FALSE,
signif.threshold = 0.01,
cpus = 1,
...
)
## S3 method for class 'sgccda'
perf(
object,
dist = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10,
nrepeat = 1,
auc = FALSE,
progressBar = FALSE,
signif.threshold = 0.01,
cpus = 1,
...
)
## S3 method for class 'mint.pls'
perf(
object,
validation = c("Mfold", "loo"),
folds = 10,
progressBar = FALSE,
...
)
## S3 method for class 'mint.spls'
perf(
object,
validation = c("Mfold", "loo"),
folds = 10,
progressBar = FALSE,
...
)
## S3 method for class 'mint.plsda'
perf(
object,
dist = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
auc = FALSE,
progressBar = FALSE,
signif.threshold = 0.01,
...
)
## S3 method for class 'mint.splsda'
perf(
object,
dist = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
auc = FALSE,
progressBar = FALSE,
signif.threshold = 0.01,
...
)

object 
object of class inherited from 
... 
not used 
validation 
character. What kind of (internal) validation to use,
matching one of 
folds 
the folds in the Mfold crossvalidation. See Details. 
progressBar 
by default set to 
dist 
only applies to an object inheriting from 
nrepeat 
Number of times the CrossValidation process is repeated. This is an important argument to ensure the estimation of the performance to be as accurate as possible. 
auc 
if 
signif.threshold 
numeric between 0 and 1 indicating the significance threshold required for improvement in error rate of the components. Default to 0.01. 
cpus 
Number of cpus to use when running the code in parallel. 
Procedure. The process of evaluating the performance of a fitted model
object
is similar for all PLSderived methods; a crossvalidation
approach is used to fit the method of object
on folds1
subsets of the data and then to predict on the subset left out. Different
measures of performance are available depending on the model. Parameters
such as logratio
, multilevel
, keepX
or keepY
are
retrieved from object
.
Parameters. If validation = "Mfold"
, Mfold crossvalidation is
performed. folds
specifies the number of folds to generate. The folds
also can be supplied as a list of vectors containing the indexes defining
each fold as produced by split
. When using validation =
"Mfold"
, make sure that you repeat the process several times (as the
results will be highly dependent on the random splits and the sample size).
If validation = "loo"
, leaveoneout crossvalidation is performed
(in that case, there is no need to repeat the process).
Measures of performance. For fitted PLS and sPLS regression models,
perf
estimates the mean squared error of prediction (MSEP),
R^2, and Q^2 to assess the predictive perfity of the model using
Mfold or leaveoneout crossvalidation. Note that only the classic
,
regression
and invariant
modes can be applied. For sPLS, the
MSEP, R^2, and Q^2 criteria are averaged across all folds. Note
that for PLS and sPLS objects, perf is performed on the preprocessed data
after log ratio transform and multilevel analysis, if any.
Sparse methods. The sPLS, sPLSDA and sgccda functions are run on several
and different subsets of data (the crossfolds) and will certainly lead to
different subset of selected features. Those are summarised in the output
features$stable
(see output Value below) to assess how often the
variables are selected across all folds. Note that for PLSDA and sPLSDA
objects, perf is performed on the original data, i.e. before the
preprocessing step of the log ratio transform and multilevel analysis, if
any. In addition for these methods, the classification error rate is
averaged across all folds.
The mint.sPLSDA function estimates errors based on Leaveonegroupout
cross validation (where each levels of object$study is left out (and
predicted) once) and provides studyspecific outputs
(study.specific.error
) as well as global outputs
(global.error
).
AUROC. For PLSDA, sPLSDA, mint.PLSDA, mint.sPLSDA, and block.splsda
methods: if auc=TRUE
, Area Under the Curve (AUC) values are
calculated from the predicted scores obtained from the predict
function applied to the internal test sets in the crossvalidation process,
either for all samples or for studyspecific samples (for mint models).
Therefore we minimise the risk of overfitting. For block.splsda model, the
calculated AUC is simply the blockscombined AUC for each component
calulcated using auroc.sgccda
. See auroc
for more
details. Our multivariate supervised methods already use a prediction
threshold based on distances (see predict
) that optimally determine
class membership of the samples tested. As such AUC and ROC are not needed
to estimate the performance of the model. We provide those outputs as
complementary performance measures. See more details in our mixOmics
article.
Prediction distances. See details from ?predict
, and also our
supplemental material in the mixOmics article.
Repeats of the CVfolds. Repeated crossvalidation implies that the whole CV
process is repeated a number of times (nrepeat
) to reduce variability
across the different subset partitions. In the case of LeaveOneOut CV
(validation = 'loo'
), each sample is left out once (folds = N
is set internally) and therefore nrepeat is by default 1.
BER is appropriate in case of an unbalanced number of samples per class as it calculates the average proportion of wrongly classified samples in each class, weighted by the number of samples in each class. BER is less biased towards majority classes during the performance assessment.
More details about the PLS modes in ?pls
.
For PLS and sPLS models, perf
produces a list with the
following components:
MSEP 
Mean Square Error Prediction for each
Y variable, only applies to object inherited from 
R2 
a matrix of R^2 values of the
Yvariables for models with 1, … , 
Q2 
if Y containts one variable, a vector of Q^2 values
else a list with a matrix of Q^2 values for each Yvariable.
Note that in the specific case of an sPLS model, it is better to have a look
at the Q2.total criterion, only applies to object inherited from

Q2.total 
a vector of Q^2total
values for models with 1, … , 
features 
a list of features selected across the folds
( 
error.rate 
For
PLSDA and sPLSDA models, 
auc 
Averaged AUC values
over the 
For mint.splsda models, perf
produces the following outputs:
study.specific.error 
A list that gives BER, overall error rate and error rate per class, for each study 
global.error 
A list that gives BER, overall error rate and error rate per class for all samples 
predict 
A list of length 
class 
A list which gives the
predicted class of each sample for each 
auc 
AUC values 
auc.study 
AUC values for each study in mint models 
For sgccda models, perf
produces the following outputs:
error.rate 
Prediction error rate for each block of 
error.rate.per.class 
Prediction error rate for
each block of 
predict 
Predicted values of each sample for each class, each block and each component 
class 
Predicted class of each sample for each
block, each 
features 
a
list of features selected across the folds ( 
AveragedPredict.class 
if more than one block, returns
the average predicted class over the blocks (averaged of the 
AveragedPredict.error.rate 
if more than one block, returns the
average predicted error rate over the blocks (using the

WeightedPredict.class 
if more
than one block, returns the weighted predicted class over the blocks
(weighted average of the 
WeightedPredict.error.rate 
if more than
one block, returns the weighted average predicted error rate over the blocks
(using the 
MajorityVote 
if more than one block, returns the majority class over the blocks. NA for a sample means that there is no consensus on the predicted class for this particular sample over the blocks. 
MajorityVote.error.rate 
if more than one
block, returns the error rate of the 
WeightedVote 
if more than one block, returns the weighted majority class over the blocks. NA for a sample means that there is no consensus on the predicted class for this particular sample over the blocks. 
WeightedVote.error.rate 
if more than one block, returns the error
rate of the 
weights 
Returns the weights of each block used for the weighted predictions, for each nrepeat and each fold 
choice.ncomp 
For supervised models; returns the optimal number of components for the model for each prediction distance using onesided ttests that test for a significant difference in the mean error rate (gain in prediction) when components are added to the model. See more details in Rohart et al 2017 Suppl. For more than one block, an optimal ncomp is returned for each prediction framework. 
Ignacio González, Amrit Singh, KimAnh Lê Cao, Benoit Gautier, Florian Rohart, Al J Abadi
Singh A., Shannon C., Gautier B., Rohart F., Vacher M., Tebbutt S. and Lê Cao K.A. (2019), DIABLO: an integrative approach for identifying key molecular drivers from multiomics assays, Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 3055–3062.
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
MINT:
Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao KA (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.
PLS and PLS citeria for PLS regression: Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.
Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de regression et du PRESS en regression PLS1. Modulad n, 30 111. (this is the formula we use to calculate the Q2 in perf.pls and perf.spls)
Mevik, B.H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422429.
sparse PLS regression mode:
Lê Cao, K. A., Rossouw D., RobertGranie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Onesided ttests (suppl material):
Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao KA&, Wells CA& (2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ 4:e1845.
predict
, nipals
,
plot.perf
, auroc
and www.mixOmics.org for
more details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158  ## validation for objects of class 'pls' (regression)
# 
data(liver.toxicity)
X < liver.toxicity$gene
Y < liver.toxicity$clinic
# try tune the number of component to choose
# 
# first learn the full model
liver.pls < pls(X, Y, ncomp = 10)
# with 5fold cross validation: we use the same parameters as in model above
# but we perform cross validation to compute the MSEP, Q2 and R2 criteria
# 
liver.val < perf(liver.pls, validation = "Mfold", folds = 5)
# Q2 total should decrease until it reaches a threshold
liver.val$Q2.total
# ncomp = 2 is enough
plot(liver.val$Q2.total, type = 'l', col = 'red', ylim = c(0.5, 0.5),
xlab = 'PLS components', ylab = 'Q2 total')
abline(h = 0.0975, col = 'darkgreen')
legend('topright', col = c('red', 'darkgreen'),
legend = c('Q2 total', 'threshold 0.0975'), lty = 1)
title('Liver toxicity PLS 5fold, Q2 total values')
## Not run:
#have a look at the other criteria
# 
# R2
liver.val$R2
matplot(t(liver.val$R2), type = 'l', xlab = 'PLS components', ylab = 'R2 for each variable')
title('Liver toxicity PLS 5fold, R2 values')
# MSEP
liver.val$MSEP
matplot(t(liver.val$MSEP), type = 'l', xlab = 'PLS components', ylab = 'MSEP for each variable')
title('Liver toxicity PLS 5fold, MSEP values')
## validation for objects of class 'spls' (regression)
# 
ncomp = 7
# first, learn the model on the whole data set
model.spls = spls(X, Y, ncomp = ncomp, mode = 'regression',
keepX = c(rep(10, ncomp)), keepY = c(rep(4,ncomp)))
# with leaveoneout cross validation
##set.seed(45)
model.spls.val < perf(model.spls, validation = "Mfold", folds = 5 )#validation = "loo")
#Q2 total
model.spls.val$Q2.total
# R2:we can see how the performance degrades when ncomp increases
model.spls.val$R2
plot(model.spls.val, criterion="R2", type = 'l')
plot(model.spls.val, criterion="Q2", type = 'l')
## validation for objects of class 'splsda' (classification)
# 
data(srbct)
X < srbct$gene
Y < srbct$class
ncomp = 2
srbct.splsda < splsda(X, Y, ncomp = ncomp, keepX = rep(10, ncomp))
# with Mfold
# 
set.seed(45)
error < perf(srbct.splsda, validation = "Mfold", folds = 8,
dist = "all", auc = TRUE)
error
error$auc
plot(error)
# parallel code
set.seed(45)
error < perf(srbct.splsda, validation = "Mfold", folds = 8,
dist = "all", auc = TRUE, cpus =2)
# with 5 components and nrepeat =5, to get a $choice.ncomp
ncomp = 5
srbct.splsda < splsda(X, Y, ncomp = ncomp, keepX = rep(10, ncomp))
set.seed(45)
error < perf(srbct.splsda, validation = "Mfold", folds = 8,
dist = "all", nrepeat =5)
error
plot(error)
# parallel code
set.seed(45)
error < perf(srbct.splsda, validation = "Mfold", folds = 8,
dist = "all", auc = TRUE, cpus =2)
## validation for objects of class 'mint.splsda' (classification)
# 
data(stemcells)
res = mint.splsda(X = stemcells$gene, Y = stemcells$celltype, ncomp = 3, keepX = c(10, 5, 15),
study = stemcells$study)
out = perf(res, auc = TRUE)
out
out$auc
out$auc.study
## validation for objects of class 'sgccda' (classification)
# 
data(nutrimouse)
Y = nutrimouse$diet
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid)
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)
nutrimouse.sgccda < block.splsda(X=data,
Y = Y,
design = design,
keepX = list(gene=c(10,10), lipid=c(15,15)),
ncomp = 2,
scheme = "horst")
perf = perf(nutrimouse.sgccda)
perf
#with 5 components and nrepeat=5 to get $choice.ncomp
nutrimouse.sgccda < block.splsda(X=data,
Y = Y,
design = design,
keepX = list(gene=c(10,10), lipid=c(15,15)),
ncomp = 5,
scheme = "horst")
perf = perf(nutrimouse.sgccda, folds = 5, nrepeat = 5)
perf
perf$choice.ncomp
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.