splsda: Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
In mixOmics: Omics Data Integration Project

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to perform sparse Partial Least Squares to classify samples (supervised analysis) and select variables.

splsda(X,
Y,
ncomp = 2,
mode = c("regression", "canonical", "invariant", "classic"),
keepX,
scale = TRUE,
tol = 1e-06,
max.iter = 100,
near.zero.var = FALSE,
logratio="none", # one of "none", "CLR"
multilevel=NULL,
all.outputs = TRUE)

`X`	numeric matrix of predictors. `NA`s are allowed.
`Y`	a factor or a class vector for the discrete outcome.
`ncomp`	the number of components to include in the model (see Details). Default is set to from one to the rank of `X`.
`mode`	character string. What type of algorithm to use, (partially) matching one of `"regression"`, `"canonical"`, `"invariant"` or `"classic"`. See Details.
`keepX`	numeric vector of length `ncomp`, the number of variables to keep in X-loadings. By default all variables are kept in the model.
`scale`	boleean. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE)
`tol`	Convergence stopping value.
`max.iter`	integer, the maximum number of iterations.
`near.zero.var`	boolean, see the internal `nearZeroVar` function (should be set to TRUE in particular for data with many zero values). Setting this argument to FALSE (when appropriate) will speed up the computations. Default value is FALSE
`logratio`	one of ('none','CLR') specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing dadta. Default to 'none'
`multilevel`	sample information for multilevel decomposition for repeated measurements. A numeric matrix or data frame indicating the repeated measures on each individual, i.e. the individuals ID. See examples.
`all.outputs`	boolean. Computation can be faster when some specific (and non-essential) outputs are not calculated. Default = `TRUE`.

splsda function fits an sPLS model with 1, … ,ncomp components to the factor or class vector Y. The appropriate indicator (dummy) matrix is created. Logratio transform and multilevel analysis are performed sequentially as internal pre-processing step, through logratio.transfo and withinVariation respectively.

Logratio can only be applied if the data do not contain any 0 value (for count data, we thus advise the normalise raw data with a 1 offset).

More details about the PLS modes in ?pls.

splsda returns an object of class "splsda", a list that contains the following components:

`X`	the centered and standardized original predictor matrix.
`Y`	the centered and standardized indicator response vector or matrix.
`ind.mat`	the indicator matrix.
`ncomp`	the number of components included in the model.
`keepX`	number of X variables kept in the model on each component.
`variates`	list containing the variates.
`loadings`	list containing the estimated loadings for the `X` and `Y` variates.
`names`	list containing the names to be used for individuals and variables.
`nzv`	list containing the zero- or near-zero predictors information.
`tol`	the tolerance used in the iterative algorithm, used for subsequent S3 methods
`iter`	Number of iterations of the algorthm for each component
`max.iter`	the maximum number of iterations, used for subsequent S3 methods
`scale`	boolean indicating whether the data were scaled in MINT S3 methods
`logratio`	whether logratio transformations were used for compositional data
`explained_variance`	amount of variance explained per component (note that contrary to PCA, this amount may not decrease as the aim of the method is not to maximise the variance, but the covariance between X and the dummy matrix Y).
`mat.c`	matrix of coefficients from the regression of X / residual matrices X on the X-variates, to be used internally by `predict`.
`defl.matrix`	residual matrices X for each dimension.

Florian Rohart, Ignacio González, Kim-Anh Lê Cao.

On sPLS-DA:

Lê Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.

An overview as part of mixOmics: Rohart F, Gautier B, Singh A, Lê Cao K-A (2017) mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752. https://doi.org/10.1371/journal.pcbi.1005752

On log ratio transformations:

Filzmoser, P., Hron, K., Reimann, C.: Principal component analysis for compositional data with outliers. Environmetrics 20(6), 621-632 (2009)

Lê Cao K.-A., Costello ME, Lakis VA, Bartolo, F,Chua XY, Brazeilles R, Rondeau P. MixMC: Multivariate insights into Microbial Communities. PLoS ONE, 11(8): e0160169 (2016).

On multilevel decomposition:

Westerhuis, J.A., van Velzen, E.J., Hoefsloot, H.C., Smilde, A.K.: Multivariate paired data analysis: multilevel plsda versus oplsda. Metabolomics 6(1), 119-128 (2010)

Liquet, B., Lê Cao K.-A., Hocini, H., Thiebaut, R.: A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC bioinformatics 13(1), 325 (2012)

spls, summary, plotIndiv, plotVar, cim, network, predict, perf, mint.block.splsda, block.splsda and http://www.mixOmics.org for more details.

## First example
data(breast.tumors)
X <- breast.tumors$gene.exp
# Y will be transformed as a factor in the function,
# but we set it as a factor to set up the colors.
Y <- as.factor(breast.tumors$sample$treatment)

res <- splsda(X, Y, ncomp = 2, keepX = c(25, 25))


# individual names appear
plotIndiv(res, ind.names = Y, legend = TRUE, ellipse =TRUE)

## Second example: one-factor analysis with sPLS-DA, selecting a subset of variables
# as in the paper Liquet et al.
#--------------------------------------------------------------
data(vac18)
X <- vac18$genes
Y <- vac18$stimulation
# sample indicates the repeated measurements
design <- data.frame(sample = vac18$sample)
Y = data.frame(stimul = vac18$stimulation)

# multilevel sPLS-DA model
res.1level <- splsda(X, Y = Y, ncomp = 3, multilevel = design,
    keepX = c(30, 137, 123))

# set up colors for plotIndiv
col.stim <- c("darkblue", "purple", "green4","red3")
plotIndiv(res.1level, ind.names = Y, col.per.group = col.stim)

## Third example: two-factor analysis with sPLS-DA, selecting a subset of variables
# as in the paper Liquet et al.
#--------------------------------------------------------------
## Not run: 
data(vac18.simulated) # simulated data

X <- vac18.simulated$genes
design <- data.frame(sample = vac18.simulated$sample)
Y = data.frame( stimu = vac18.simulated$stimulation,
                time = vac18.simulated$time)

res.2level <- splsda(X, Y = Y, ncomp = 2, multilevel = design,
keepX = c(200, 200))

plotIndiv(res.2level, group = Y$stimu, ind.names = vac18.simulated$time,
legend = TRUE, style = 'lattice')

## End(Not run)


## Fourth example: with more than two classes
# ------------------------------------------------
## Not run: 
data(liver.toxicity)
X <- as.matrix(liver.toxicity$gene)
# Y will be transformed as a factor in the function,
# but we set it as a factor to set up the colors.
Y <- as.factor(liver.toxicity$treatment[, 4])

splsda.liver <- splsda(X, Y, ncomp = 2, keepX = c(20, 20))

# individual name is set to the treatment
plotIndiv(splsda.liver, ind.names = Y, ellipse = TRUE, legend = TRUE)

## End(Not run)

## Fifth example: 16S data with multilevel decomposion and log ratio transformation
# ------------------------------------------------
## Not run: 
splsda.16S = splsda(
X = diverse.16S$data.TSS,  # TSS normalised data
Y =  diverse.16S$bodysite,
multilevel = diverse.16S$sample, # multilevel decomposition
ncomp = 2,
keepX =  c(10, 150),
logratio= 'CLR')  # CLR log ratio transformation


plotIndiv(splsda.16S, ind.names = FALSE, pch = 16, ellipse = TRUE, legend = TRUE)
#OTUs selected at the family level
diverse.16S$taxonomy[selectVar(splsda.16S, comp = 1)$name,'Family']

## End(Not run)

Loading required package: MASS
Loading required package: lattice
Loading required package: ggplot2

Loaded mixOmics 6.2.0

Visit http://www.mixOmics.org for more details about our methods.
Any bug reports or comments? Notify us at mixomics at math.univ-toulouse.fr or https://bitbucket.org/klecao/package-mixomics/issues

Thank you for using mixOmics!
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 
Splitting the variation for 1 level factor.
Splitting the variation for 2 level factors.
Splitting the variation for 1 level factor.
        OTU_97.38174         OTU_97.39439           OTU_97.108 
 "Flavobacteriaceae"   "Streptococcaceae"      "Neisseriaceae" 
           OTU_97.20         OTU_97.39456            OTU_97.55 
  "Burkholderiaceae"   "Streptococcaceae" "Campylobacteraceae" 
        OTU_97.29530            OTU_97.46         OTU_97.33396 
  "Streptococcaceae"      "Neisseriaceae"     "Micrococcaceae" 
         OTU_97.1893 
    "Micrococcaceae"

mixOmics documentation built on June 1, 2018, 5:06 p.m.

mixOmics index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mixOmics
Omics Data Integration Project

splsda: Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
In mixOmics: Omics Data Integration Project

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to splsda in mixOmics...

R Package Documentation

Browse R Packages

We want your feedback!

mixOmics Omics Data Integration Project

splsda: Sparse Partial Least Squares Discriminant Analysis (sPLS-DA) In mixOmics: Omics Data Integration Project

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to splsda in mixOmics...

R Package Documentation

Browse R Packages

We want your feedback!

mixOmics
Omics Data Integration Project

splsda: Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
In mixOmics: Omics Data Integration Project