isoPLSDA: Partial Least Squares Discriminant Analysis for...
In isomiRs: Analyze isomiRs and miRNAs from small RNA-seq

Description Usage Arguments Details Value References Examples

Use PLS-DA method with the normalized count data to detect the most important features (miRNAs/isomiRs) that explain better the group of samples given by the experimental design. It is a supervised clustering method with permutations to calculate the significance of the analysis.

1 2	isoPLSDA(ids, group, validation = NULL, learn = NULL, test = NULL, tol = 0.001, nperm = 400, refinment = FALSE, vip = 1.2)

`ids`	Object of class `IsomirDataSeq`
`group`	Column name in `colData(ids)` to use as variable to explain.
`validation`	Type of validation, either NULL or "learntest". Default NULL.
`learn`	Optional vector of indexes for a learn-set. Only used when validation="learntest". Default NULL.
`test`	Optional vector of indices for a test-set. Only used when validation="learntest". Default NULL
`tol`	Tolerance value based on maximum change of cumulative R-squared coefficient for each additional PLS component. Default tol=0.001.
`nperm`	Number of permutations to compute the PLD-DA p-value based on R2 magnitude. Default nperm=400.
`refinment`	Logical indicating whether a refined model, based on filtering out variables with low VIP values.
`vip`	Variance Importance in Projection threshold value when a refinement process is considered. Default vip=1.2 .

Partial Least Squares Discriminant Analysis (PLS-DA) is a technique specifically appropriate for analysis of high dimensionality data sets and multicollinearity (Perez-Enciso, 2013). PLS-DA is a supervised method (i.e. makes use of class labels) with the aim to provide a dimension reduction strategy in a situation where we want to relate a binary response variable (in our case young or old status) to a set of predictor variables. Dimensionality reduction procedure is based on orthogonal transformations of the original variables (miRNAs/isomiRs) into a set of linearly uncorrelated latent variables (usually termed as components) such that maximizes the separation between the different classes in the first few components (Xia, 2011). We used sum of squares captured by the model (R2) as a goodness of fit measure.

We implemented this method using the DiscriMiner::DiscriMiner-package into isoPLSDA() function. The output p-value of this function will tell about the statistical significant of the group separation using miRNA/isomiR expression data.

Read more about the parameters related to the PLS-DA directly from DiscriMiner::plsDA() function.

A base::list with the following elements: R2Matrix (R-squared coefficients of the PLS model), components (of the PLS, similar to PCs in a PCA), vip (most important isomiRs/miRNAs), group (classification of the samples), p.value and R2PermutationVector obtained by the permutations.

If the option refinment is set to TRUE, then the following elements will appear: R2RefinedMatrix and componentsRefinedModel (R-squared coefficients of the PLS model only using the most important miRNAs/isomiRs). As well, p.valRefined and R2RefinedPermutationVector with p-value and R2 of the permutations where samples were randomized. And finally, p.valRefinedFixed and R2RefinedFixedPermutationVector with p-value and R2 of the permutations where miRNAs/isomiRs were randomized.

Perez-Enciso, Miguel and Tenenhaus, Michel. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human Genetics. 2003.

Xia, Jianguo and Wishart, David S. Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst. Nature Protocols. 2011.

data(mirData)
# Only miRNAs with > 10 reads in all samples.
ids <- isoCounts(mirData, minc=10, mins=6)
ids <- isoNorm(ids, formula=~condition)
pls.ids = isoPLSDA(ids, "condition", nperm = 2)
cat(paste0("pval:",pls.ids$p.val))
cat(paste0("components:",pls.ids$components))