Partial Least Squares Discriminant Analysis for IsomirDataSeq

Share:

Description

Use PLS-DA method with the normalized count data to detect the most important features (miRNAs/isomiRs) that explain better the group of samples given by the experimental design. It is a supervised clustering method with permutations to calculate the significance of the analysis.

Usage

1
2
isoPLSDA(ids, group, validation = NULL, learn = NULL, test = NULL,
 tol = 0.001, nperm = 400, refinment = FALSE, vip = 1.2)

Arguments

ids

object of class IsomirDataSeq

group

column name in colData(ids) to use as variable to explain.

validation

type of validation, either NULL or "learntest". Default NULL

learn

optional vector of indexes for a learn-set. Only used when validation="learntest". Default NULL

test

optional vector of indices for a test-set. Only used when validation="learntest". Default NULL

tol

tolerance value based on maximum change of cumulative R-squared coefficient for each additional PLS component. Default tol=0.001

nperm

number of permutations to compute the PLD-DA p-value based on R2 magnitude. Default nperm=400

refinment

logical indicating whether a refined model, based on filtering out variables with low VIP values

vip

Variance Importance in Projection threshold value when a refinement process is considered. Default vip=1.2

Details

Partial Least Squares Discriminant Analysis (PLS-DA) is a technique specifically appropriate for analysis of high dimensionality data sets and multicollinearity (Perez-Enciso, 2013). PLS-DA is a supervised method (i.e. makes use of class labels) with the aim to provide a dimension reduction strategy in a situation where we want to relate a binary response variable (in our case young or old status) to a set of predictor variables. Dimensionality reduction procedure is based on orthogonal transformations of the original variables (miRNAs/isomiRs) into a set of linearly uncorrelated latent variables (usually termed as components) such that maximizes the separation between the different classes in the first few components (Xia, 2011). We used sum of squares captured by the model (R2) as a goodness of fit measure.

We implemented this method using the DiscriMiner-package into isoPLSDA function. The output p-value of this function will tell about the statistical significant of the group separation using miRNA/isomiR expression data.

Read more about the parameters related to the PLS-DA directly from plsDA function.

Value

A list with the following elements: R2Matrix (R-squared coefficients of the PLS model), components (of the PLS, similar to PCs in a PCA), vip (most important isomiRs/miRNAs), group (classification of the samples), p.value and R2PermutationVector obtained by the permutations.

If the option refinment is set to TRUE, then the following elements will appear: R2RefinedMatrix and componentsRefinedModel (R-squared coefficients of the PLS model only using the most important miRNAs/isomiRs). As well, p.valRefined and R2RefinedPermutationVector with p-value and R2 of the permutations where samples were randomized. And finally, p.valRefinedFixed and R2RefinedFixedPermutationVector with p-value and R2 of the permutations where miRNAs/isomiRs were randomized.

References

Perez-Enciso, Miguel and Tenenhaus, Michel. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human Genetics. 2003.

Xia, Jianguo and Wishart, David S. Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst. Nature Protocols. 2011.

Examples

1
2
3
4
5
6
7
data(mirData)
# Only miRNAs with > 10 reads in all samples.
ids <- isoCounts(mirData, minc=10, mins=6)
ids <- isoNorm(ids)
pls.ids = isoPLSDA(ids, "condition", nperm = 2)
cat(paste0("pval:",pls.ids$p.val))
cat(paste0("components:",pls.ids$components))