Description Usage Arguments Details Value Author(s) References See Also Examples
Function to perform sparse Partial Least Squares to classify samples (supervised analysis) and select variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 
X 
Numeric matrix of predictors. 
Y 
a factor or a class vector for the discrete outcome. 
ncomp 
Integer, the number of components to include in the model. Default to 2. 
keepX 
numeric vector of length 
scale 
Logical. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE) 
tol 
Numeric, convergence stopping value. 
max.iter 
Integer, the maximum number of iterations. 
near.zero.var 
Logical, see the internal 
logratio 
Character, one of ('none','CLR') specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing data. Default to 'none'. 
multilevel 
sample information for multilevel decomposition for
repeated measurements. A numeric matrix or data frame indicating the
repeated measures on each individual, i.e. the individuals ID. See examples
in 
all.outputs 
Logical. Computation can be faster when some specific
(and nonessential) outputs are not calculated. Default = 
splsda
function fits an sPLS model with 1, … ,ncomp
components to the factor or class vector Y
. The appropriate indicator
(dummy) matrix is created.
Logratio transformation and multilevel analysis are
performed sequentially as internal preprocessing step, through
logratio.transfo
and withinVariation
respectively. Logratio can only be applied if the data do not contain any 0 value (for
count data, we thus advise the normalise raw data with a 1 offset).
The type of deflation used is 'regression'
for discriminant algorithms.
i.e. no deflation is performed on Y.
splsda
returns an object of class "splsda"
, a list
that contains the following components:
X 
the centered and standardized original predictor matrix. 
Y 
the centered and standardized indicator response vector or matrix. 
ind.mat 
the indicator matrix. 
ncomp 
the number of components included in the model. 
keepX 
number of X variables kept in the model on each component. 
variates 
list containing the variates. 
loadings 
list containing the estimated loadings for the 
names 
list containing the names to be used for individuals and variables. 
nzv 
list containing the zero or nearzero predictors information. 
tol 
the tolerance used in the iterative algorithm, used for subsequent S3 methods 
iter 
Number of iterations of the algorthm for each component 
max.iter 
the maximum number of iterations, used for subsequent S3 methods 
scale 
boolean indicating whether the data were scaled in MINT S3 methods 
logratio 
whether logratio transformations were used for compositional data 
explained_variance 
amount of variance explained per component (note that contrary to PCA, this amount may not decrease as the aim of the method is not to maximise the variance, but the covariance between X and the dummy matrix Y). 
mat.c 
matrix of coefficients from the regression of
X / residual matrices X on the Xvariates, to be used internally by

defl.matrix 
residual matrices X for each dimension. 
Florian Rohart, Ignacio González, KimAnh Lê Cao, Al J abadi
On sPLSDA: Lê Cao, K.A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253. On log ratio transformations: Filzmoser, P., Hron, K., Reimann, C.: Principal component analysis for compositional data with outliers. Environmetrics 20(6), 621632 (2009) Lê Cao K.A., Costello ME, Lakis VA, Bartolo, F,Chua XY, Brazeilles R, Rondeau P. MixMC: Multivariate insights into Microbial Communities. PLoS ONE, 11(8): e0160169 (2016). On multilevel decomposition: Westerhuis, J.A., van Velzen, E.J., Hoefsloot, H.C., Smilde, A.K.: Multivariate paired data analysis: multilevel plsda versus oplsda. Metabolomics 6(1), 119128 (2010) Liquet, B., Lê Cao K.A., Hocini, H., Thiebaut, R.: A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC bioinformatics 13(1), 325 (2012)
spls
, summary
, plotIndiv
,
plotVar
, cim
, network
,
predict
, perf
, mint.block.splsda
,
block.splsda
and http://www.mixOmics.org for more details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83  ## First example
data(breast.tumors)
X < breast.tumors$gene.exp
# Y will be transformed as a factor in the function,
# but we set it as a factor to set up the colors.
Y < as.factor(breast.tumors$sample$treatment)
res < splsda(X, Y, ncomp = 2, keepX = c(25, 25))
# individual names appear
plotIndiv(res, ind.names = Y, legend = TRUE, ellipse =TRUE)
## Not run:
## Second example: onefactor analysis with sPLSDA, selecting a subset of variables
# as in the paper Liquet et al.
#
data(vac18)
X < vac18$genes
Y < vac18$stimulation
# sample indicates the repeated measurements
design < data.frame(sample = vac18$sample)
Y = data.frame(stimul = vac18$stimulation)
# multilevel sPLSDA model
res.1level < splsda(X, Y = Y, ncomp = 3, multilevel = design,
keepX = c(30, 137, 123))
# set up colors for plotIndiv
col.stim < c("darkblue", "purple", "green4","red3")
plotIndiv(res.1level, ind.names = Y, col.per.group = col.stim)
## Third example: twofactor analysis with sPLSDA, selecting a subset of variables
# as in the paper Liquet et al.
#
data(vac18.simulated) # simulated data
X < vac18.simulated$genes
design < data.frame(sample = vac18.simulated$sample)
Y = data.frame( stimu = vac18.simulated$stimulation,
time = vac18.simulated$time)
res.2level < splsda(X, Y = Y, ncomp = 2, multilevel = design,
keepX = c(200, 200))
plotIndiv(res.2level, group = Y$stimu, ind.names = vac18.simulated$time,
legend = TRUE, style = 'lattice')
## Fourth example: with more than two classes
# 
data(liver.toxicity)
X < as.matrix(liver.toxicity$gene)
# Y will be transformed as a factor in the function,
# but we set it as a factor to set up the colors.
Y < as.factor(liver.toxicity$treatment[, 4])
splsda.liver < splsda(X, Y, ncomp = 2, keepX = c(20, 20))
# individual name is set to the treatment
plotIndiv(splsda.liver, ind.names = Y, ellipse = TRUE, legend = TRUE)
## Fifth example: 16S data with multilevel decomposion and log ratio transformation
# 
splsda.16S = splsda(
X = diverse.16S$data.TSS, # TSS normalised data
Y = diverse.16S$bodysite,
multilevel = diverse.16S$sample, # multilevel decomposition
ncomp = 2,
keepX = c(10, 150),
logratio= 'CLR') # CLR log ratio transformation
plotIndiv(splsda.16S, ind.names = FALSE, pch = 16, ellipse = TRUE, legend = TRUE)
#OTUs selected at the family level
diverse.16S$taxonomy[selectVar(splsda.16S, comp = 1)$name,'Family']
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.