Description Usage Arguments Value Examples

This function extracts the surrogated estimates of the hidden variables in the data by using the partial least squares (PLS) algorithm on two multivariate random matrices. It provides the user with two options:

(1) **Unsupervised SVAPLS**: Here a standard linear regression model is first used on
a transformed version of the expression count matrix to estimate the primary signals
of differential expression for all the features. The fitted model residuals and the
transformed count matrix are then organized respectively into two multivariate matrices
`E`

and `Y`

, in such a way that each column corresponds to a certain feature.
`Y`

is then regressed on `E`

using a Non-linear partial least squares (NPLS)
algorithm and the extracted factor estimates (scores) in the column-space of `Y`

are deemed as the surrogate variables.

(2) **Supervised SVAPLS**: In case information on a set of control features (control genes, transcripts, spike-ins, etc.)
is provided, this function uses a Non-linear partial least squares (NPLS) algorithm to regress `Y`

on another expression
matrix `Y.cont`

corresponding to the set of controls and the factor estimates (scores) in the column-space of `Y.cont`

are considered as the surrogate variables.

An optimal subset of these variables is then selected either manually by the user (manual selection) or by testing them for
statistical significance (automatic selection). For the automatic selection the function regresses the first right
singular vector of the residual matrix `E`

(for Unsupervised SVAPLS) or the control matrix `Y.cont`

(for Supervised SVAPLS), on all the surrogate variables and the estimated regression coefficients are used to perform
a t-test with a certain user-specified pvalue cutoff. The variables yielding a pvalue below the cutoff are returned
as the optimal surrogate variables.

1 2 3 4 |

`dat` |
The original feature expression count matrix. |

`group` |
a factor representing the sample indices belonging to the two different groups. |

`controls` |
The set of control features with no differential expression between the two groups (set to NULL by default). |

`phi` |
The transforming function to be applied on the original feature
expression count data (set to be log function with an offset |

`const` |
The offset parameter for the transforming function |

`pls.method` |
The non-linear partial least squares method to be used. The different options available are: the classical orthogonal scores algorithm ("oscorespls, default), the kernel algorithm ("kernelpls") and wide kernel algorithm ("widekernelpls"). Using the "oscorespls" option is recommended for producing mutually orthogonal surrogate variables. |

`max.surrs` |
The maximum number of factor estimates to be extracted from the NPLS algorithm (set to 3 by default). |

`opt.surrs` |
The index vector of factor estimates to be taken as the optimal surrogate variables (used for manual selection only). |

`surr.select` |
The method for selecting the optimal surrogate variables ("automatic" or "manual"). |

`cutoff` |
The user-specified pvalue cutoff for testing the significance of the extracted surrogate variables (set to 1e-07 by default) (used for "automatic" selection only). |

`parallel` |
Logical, indicating if the computations should be
parallelized or not (set to |

`num.cores` |
The requested number of cores to be used in the parallel
computations inside the function (used only when |

`plot` |
Logical, if |

surr A `data.frame`

of the optimal surrogate variables.

prop.vars A vector of the variance proportions explained by the
variables in `surr`

.

1 2 3 4 5 6 7 8 9 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.