svplsSurr: svplsSurr

Description Usage Arguments Value Examples

View source: R/svplsSurr.R

Description

This function extracts the surrogated estimates of the hidden variables in the data by using the partial least squares (PLS) algorithm on two multivariate random matrices. It provides the user with two options:

(1) Unsupervised SVAPLS: Here a standard linear regression model is first used on a transformed version of the expression count matrix to estimate the primary signals of differential expression for all the features. The fitted model residuals and the transformed count matrix are then organized respectively into two multivariate matrices E and Y, in such a way that each column corresponds to a certain feature. Y is then regressed on E using a Non-linear partial least squares (NPLS) algorithm and the extracted factor estimates (scores) in the column-space of Y are deemed as the surrogate variables.

(2) Supervised SVAPLS: In case information on a set of control features (control genes, transcripts, spike-ins, etc.) is provided, this function uses a Non-linear partial least squares (NPLS) algorithm to regress Y on another expression matrix Y.cont corresponding to the set of controls and the factor estimates (scores) in the column-space of Y.cont are considered as the surrogate variables.

An optimal subset of these variables is then selected either manually by the user (manual selection) or by testing them for statistical significance (automatic selection). For the automatic selection the function regresses the first right singular vector of the residual matrix E (for Unsupervised SVAPLS) or the control matrix Y.cont (for Supervised SVAPLS), on all the surrogate variables and the estimated regression coefficients are used to perform a t-test with a certain user-specified pvalue cutoff. The variables yielding a pvalue below the cutoff are returned as the optimal surrogate variables.

Usage

1
2
3
4
svplsSurr(dat, group, controls = NULL, phi = function(x) log(x + const),
  const = 1, pls.method = "oscorespls", max.surrs = 3, opt.surrs = 1,
  surr.select = c("automatic", "manual"), cutoff = 10^-7,
  parallel = FALSE, num.cores = NULL, plot = FALSE)

Arguments

dat

The original feature expression count matrix.

group

a factor representing the sample indices belonging to the two different groups.

controls

The set of control features with no differential expression between the two groups (set to NULL by default).

phi

The transforming function to be applied on the original feature expression count data (set to be log function with an offset const).

const

The offset parameter for the transforming function phi (set to 1 by default).

pls.method

The non-linear partial least squares method to be used. The different options available are: the classical orthogonal scores algorithm ("oscorespls, default), the kernel algorithm ("kernelpls") and wide kernel algorithm ("widekernelpls"). Using the "oscorespls" option is recommended for producing mutually orthogonal surrogate variables.

max.surrs

The maximum number of factor estimates to be extracted from the NPLS algorithm (set to 3 by default).

opt.surrs

The index vector of factor estimates to be taken as the optimal surrogate variables (used for manual selection only).

surr.select

The method for selecting the optimal surrogate variables ("automatic" or "manual").

cutoff

The user-specified pvalue cutoff for testing the significance of the extracted surrogate variables (set to 1e-07 by default) (used for "automatic" selection only).

parallel

Logical, indicating if the computations should be parallelized or not (set to FALSE by default).

num.cores

The requested number of cores to be used in the parallel computations inside the function (used only when parallel is TRUE, NULL by default).

plot

Logical, if TRUE a barplot of the variance proportions explained by the significant surrogate variables is returned (set to FALSE by default).

Value

surr A data.frame of the optimal surrogate variables.

prop.vars A vector of the variance proportions explained by the variables in surr.

Examples

1
2
3
4
5
6
7
8
9
##Loading a simulated RNAseq gene expression count dataset
data(sim.dat)

##Extracting the optimal surrogate variables
group = as.factor(c(rep(1, 10), rep(-1, 10)))
sv <- svplsSurr(dat = sim.dat, group = group, surr.select = "automatic")
slotNames(sv)
head(surr(sv))
head(prop.vars(sv))

SVAPLSseq documentation built on April 28, 2020, 6:30 p.m.