gsvaParam-class: 'gsvaParam' class
In rcastelo/GSVA: Gene Set Variation Analysis for Microarray and RNA-Seq Data

gsvaParam-class

R Documentation

`gsvaParam` class

Description

S4 class for GSVA method parameter objects.

Objects of class gsvaParam contain the parameters for running the GSVA method.

Usage

gsvaParam(
  exprData,
  geneSets,
  assay = NA_character_,
  annotation = NULL,
  minSize = 1,
  maxSize = Inf,
  kcdf = c("auto", "Gaussian", "Poisson", "none"),
  kcdfNoneMinSampleSize = 200,
  tau = 1,
  maxDiff = TRUE,
  absRanking = FALSE,
  sparse = TRUE,
  checkNA = c("auto", "yes", "no"),
  use = c("everything", "all.obs", "na.rm")
)

## S4 replacement method for signature 'gsvaRanksParam,GsvaGeneSets'
geneSets(object) <- value

Arguments

`exprData`	The expression data set. Must be one of the classes supported by `GsvaExprData`. For a list of these classes, see its help page using `help(GsvaExprData)`.
`geneSets`	The gene sets. Must be one of the classes supported by `GsvaGeneSets`. For a list of these classes, see its help page using `help(GsvaGeneSets)`.
`assay`	Character vector of length 1. The name of the assay to use in case `exprData` is a multi-assay container, otherwise ignored. By default, an assay called 'logcounts' will be used if present, otherwise the first assay is used.
`annotation`	An object of class `GeneIdentifierType` from package `GSEABase` describing the gene identifiers used as the row names of the expression data set. See `GeneIdentifierType` for help on available gene identifier types and how to construct them. This information can be used to map gene identifiers occurring in the gene sets. If the default value `NULL` is provided, an attempt will be made to extract the gene identifier type from the expression data set provided as `exprData` (by calling `gsvaAnnotation` on it). If still not successful, the `NullIdentifier()` will be used as the gene identifier type, gene identifier mapping will be disabled and gene identifiers used in expression data set and gene sets can only be matched directly.
`minSize`	Numeric vector of length 1. Minimum size of the resulting gene sets after gene identifier mapping. By default, the minimum size is 1.
`maxSize`	Numeric vector of length 1. Maximum size of the resulting gene sets after gene identifier mapping. By default, the maximum size is `Inf`.
`kcdf`	Character vector of length 1 denoting the kernel to use during the non-parametric estimation of the empirical cumulative distribution function (ECDF) of expression levels across samples. The value `kcdf="auto"` will allow GSVA to automatically choose one of the possible values. The value `kcdf="Gaussian"` is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs, or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to `kcdf="Poisson"`. When we do not want to use a kernel approach for the estimation of the ECDF, then we should set `kcdf="none"`.
`kcdfNoneMinSampleSize`	Integer vector of length 1. When `kcdf="auto"`, this parameter decides at what minimum sample size `kcdf="none"`, i.e., the estimation of the empirical cumulative distribution function (ECDF) of expression levels across samples is performed directly without using a kernel. By default, this value is set to 200; see the `kcdf` slot.
`tau`	Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the `GSVA` (Hänzelmann et al., 2013) method. The default value is 1 as described in the paper.
`maxDiff`	Logical vector of length 1 which offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic. `FALSE`: ES is calculated as the maximum distance of the random walk from 0. This approach produces a distribution of enrichment scores that is bimodal, but it can give large enrichment scores to gene sets whose genes are not concordantly activated in one direction only. `TRUE` (the default): ES is calculated as the magnitude difference between the largest positive and negative random walk deviations. This default value gives larger enrichment scores to gene sets whose genes are concordantly activated in one direction only.
`absRanking`	Logical vector of length 1 used only when `maxDiff=TRUE`. When `absRanking=FALSE` (default) a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. When `absRanking=TRUE` the original Kuiper statistic that sums the largest positive and negative random walk deviations is used.
`sparse`	Logical vector of length 1 used only when the input expression data in `exprData` is stored in a sparse matrix (e.g., a `dgCMatrix` or a `SingleCellExperiment` object storing the expression data in a `dgCMatrix`). In such a case, when `sparse=TRUE` (default), a sparse version of the GSVA algorithm will be applied. Otherwise, when `sparse=FALSE`, the classical version of the GSVA algorithm will be used.
`checkNA`	Character vector of length 1 specifying whether the input expression data should be checked for the presence of missing (`NA`) values. This must be one of the strings `"auto"` (default), `"yes"`, or `"no"`. The default value `"auto"` means that the software will perform that check only when the input expression data is provided as a base `matrix`, an `ExpressionSet` or a `SummarizedExperiment` object, while every other type of input expression data container (e.g., `SingleCellExperiment`, etc.) will not be checked. If `checkNA="yes"`, then the input expression data will be checked for missing values irrespective of the object class of the data container, and if `checkNA="no"`, then that check will not be performed.
`use`	Character vector of length 1 specifying a policy for dealing with missing values (`NA`s) in the input expression data argument `exprData`. It only applies when either `checkNA="yes"`, or `checkNA="auto"` (see the `checkNA` parameter. The argument value must be one of the strings `"everything"` (default), `"all.obs"`, or `"na.rm"`. The policy of the default value `"everything"` consists of propagating `NA`s so that the resulting enrichment score will be `NA`, whenever one or more of its contributing values is `NA`, giving a warning when that happens. When `use="all.obs"`, the presence of `NA`s in the input expression data will produce an error. Finally, when `use="na.rm"`, `NA` values in the input expression data will be removed from calculations, giving a warning when that happens, and giving an error if no values are left after removing the `NA` values.
`object`	For the replacement method, an object of class `gsvaRanksParam`.
`value`	For the replacement method, an object of the classes supported by `GsvaGeneSets`.

Details

In addition to the common parameter slots inherited from ⁠[GsvaMethodParam]⁠, this class has slots for the six method-specific parameters of the GSVA method described below.

In addition to a number of parameters shared with all methods implemented by package GSVA, GSVA takes six method-specific parameters. All of these parameters are described in detail below.

Value

A new gsvaParam object.

Slots

kcdf

Character vector of length 1 denoting the kernel to use during the non-parametric estimation of the empirical cumulative distribution function (ECDF) of expression levels across samples. The value kcdf="auto" will allow GSVA to automatically choose one of the possible values. The value kcdf="Gaussian" is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs, or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to kcdf="Poisson". When we do not want to use a kernel approach for the estimation of the ECDF, then we should set kcdf="none".

kcdfNoneMinSampleSize

Integer vector of length 1. When kcdf="auto", this parameter decides at what minimum sample size kcdf="none", i.e., the estimation of the empirical cumulative distribution function (ECDF) of expression levels across samples is performed directly without using a kernel; see the kcdf slot.

tau

Numeric vector of length 1. The exponent defining the weight of the tail in the random walk performed by the GSVA (Hänzelmann et al., 2013) method.

maxDiff

Logical vector of length 1 which offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic.

FALSE: ES is calculated as the maximum distance of the random walk from 0.
TRUE: ES is calculated as the magnitude difference between the largest positive and negative random walk deviations.

absRanking

Logical vector of length 1 used only when maxDiff=TRUE. When absRanking=FALSE a modified Kuiper statistic is used to calculate enrichment scores, taking the magnitude difference between the largest positive and negative random walk deviations. When absRanking=TRUE the original Kuiper statistic that sums the largest positive and negative random walk deviations, is used. In this latter case, gene sets with genes enriched on either extreme (high or low) will be regarded as ’highly’ activated.

sparse

Logical vector of length 1 used only when the input expression data in exprData is stored in a sparse matrix (e.g., a dgCMatrix or a container object, such as a SingleCellExperiment, storing the expression data in a dgCMatrix). In such a case, when sparse=TRUE, a sparse version of the GSVA algorithm will be applied. Otherwise, when sparse=FALSE, the classical version of the GSVA algorithm will be used.

checkNA

Character vector of length 1. One of the strings "auto" (default), "yes", or "no", which refer to whether the input expression data should be checked for the presence of missing (NA) values.

didCheckNA

Logical vector of length 1, indicating whether the input expression data was checked for the presence of missing (NA) values.

anyNA

Logical vector of length 1, indicating whether the input expression data contains missing (NA) values.

use

Character vector of length 1. One of the strings "everything" (default), "all.obs", or "na.rm", which refer to three different policies to apply in the presence of missing values in the input expression data; see ssgseaParam.

References

Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/1471-2105-14-7")}

Examples

library(GSVA)
suppressPackageStartupMessages(library(GSVAdata))

data(leukemia)
data(c2BroadSets)

## for simplicity, use only a subset of the sample data
ses <- leukemia_eset[1:1000, ]
gsc <- c2BroadSets[1:100]
gp1 <- gsvaParam(ses, gsc)
gp1

rcastelo/GSVA documentation built on June 14, 2025, 6:38 p.m.