gsva: gsva
In oppar: Outlier profile and pathway analysis in R

Description Usage Arguments Value Methods (by class) See Also Examples

Gene Set Variation Analysis

gsva(expr, gset.idx.list, ...)

## S4 method for signature 'ExpressionSet,list'
gsva(expr, gset.idx.list, annotation,
  method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE,
  abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0,
  bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK",
  mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA),
  kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE,
  is.gset.list.up.down = FALSE)

## S4 method for signature 'ExpressionSet,GeneSetCollection'
gsva(expr, gset.idx.list,
  annotation, method = c("gsva", "ssgsea", "zscore", "plage"),
  rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf,
  no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0,
  parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1,
  ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE,
  is.gset.list.up.down = FALSE)

## S4 method for signature 'matrix,GeneSetCollection'
gsva(expr, gset.idx.list, annotation,
  method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE,
  abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0,
  bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK",
  mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA),
  kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE,
  is.gset.list.up.down = FALSE)

## S4 method for signature 'matrix,list'
gsva(expr, gset.idx.list, annotation,
  method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE,
  abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0,
  bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK",
  mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA),
  kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE,
  is.gset.list.up.down = FALSE)

`expr`	Gene expression data which can be given either as an `ExpressionSet` object or as a matrix of expression values where rows correspond to genes and columns correspond to samples.
`gset.idx.list`	Gene sets provided either as a `list` object or as a `GeneSetCollection` object.
`...`	other optional arguments.
`annotation`	In the case of calling `gsva()` with expression data in a `matrix` and gene sets as a `GeneSetCollection` object, the `annotation` argument can be used to supply the name of the Bioconductor package that contains annotations for the class of gene identifiers occurring in the row names of the expression data matrix. By default `gsva()` will try to match the identifiers in `expr` to the identifiers in `gset.idx.list` just as they are, unless the `annotation` argument is set.
`method`	Method to employ in the estimation of gene-set enrichment scores per sample. By default this is set to `gsva` (Hanzelmann et al, 2013) and other options are `ssgsea` (Barbie et al, 2009), `zscore` (Lee et al, 2008) or `plage` (Tomfohr et al, 2005). The latter two standardize first expression profiles into z-scores over the samples and, in the case of `zscore`, it combines them together as their sum divided by the square-root of the size of the gene set, while in the case of `plage` they are used to calculate the singular value decomposition (SVD) over the genes in the gene set and use the coefficients of the first right-singular vector as pathway activity profile.
`rnaseq`	Flag to inform whether the input gene expression data comes from microarray (`rnaseq=FALSE`, default) or RNA-Seq (`rnaseq=TRUE`) experiments.
`abs.ranking`	Flag to determine whether genes should be ranked according to their sign (`abs.ranking=FALSE`) or by absolute value (`abs.ranking=TRUE`). In the latter, pathways with genes enriched on either extreme (high or low) will be regarded as 'highly' activated.
`min.sz`	Minimum size of the resulting gene sets.
`max.sz`	Maximum size of the resulting gene sets.
`no.bootstraps`	Number of bootstrap iterations to perform.
`bootstrap.percent`	.632 is the ideal percent samples bootstrapped.
`parallel.sz`	Number of processors to use when doing the calculations in parallel. This requires to previously load either the `parallel` or the `snow` library. If `parallel` is loaded and this argument is left with its default value (`parallel.sz=0`) then it will use all available core processors unless we set this argument with a smaller number. If `snow` is loaded then we must set this argument to a positive integer number that specifies the number of processors to employ in the parallel calculation.
`parallel.type`	Type of cluster architecture when using `snow`.
`mx.diff`	Offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic. `mx.diff=FALSE`: ES is calculated as the maximum distance of the random walk from 0. `mx.diff=TRUE` (default): ES is calculated as the magnitude difference between the largest positive and negative random walk deviations.
`tau`	Exponent defining the weight of the tail in the random walk performed by both the `gsva` (Hanzelmann et al., 2013) and the `ssgsea` (Barbie et al., 2009) methods. By default, this `tau=1` when `method="gsva"` and `tau=0.25` when `method="ssgsea"` just as specified by Barbie et al. (2009) where this parameter is called `alpha`.
`kernel`	Logical, set to `TRUE` when the GSVA method employes a kernel non-parametric estimation of the empirical cumulative distribution function (default) and `FALSE` when this function is directly estimated from the observed data. This last option is justified in the limit of the size of the sample by the so-called Glivenko-Cantelli theorem.
`ssgsea.norm`	Logical, set to `TRUE` (default) with `method="ssgsea"` runs the SSGSEA method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. When `ssgsea.norm=FALSE` this last normalization step is skipped.
`verbose`	Gives information about each calculation step. Default: `FALSE`.
`is.gset.list.up.down`	logical. Is the gene list divided into up/down sublists? Please note that it is important to name the up-regulated gene set list 'up', and the down-regulated gene set list to 'down', if this argument is used (e.g gset = list(up = up_gset, down = down_gset))

returns gene set enrichment scores for each sample and gene set

expr = ExpressionSet,gset.idx.list = list: Method for ExpressionSet and list
expr = ExpressionSet,gset.idx.list = GeneSetCollection: Method for ExpressionSet and GeneSetCollection
expr = matrix,gset.idx.list = GeneSetCollection: Method for matrix and GeneSetCollection
expr = matrix,gset.idx.list = list: Method for matrix and list

Hanzelmann, S., Castelo, R., & Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14, 7. http://doi.org/10.1186/1471-2105-14-7

data("Maupin")
names(maupin)
geneSet<- maupin$sig$EntrezID    #Symbol  ##EntrezID # both up and down genes:
up_sig<- maupin$sig[maupin$sig$upDown == "up",]
d_sig<- maupin$sig[maupin$sig$upDown == "down",]
u_geneSet<- up_sig$EntrezID   #Symbol   # up_sig$Symbol  ## EntrezID
d_geneSet<- d_sig$EntrezID
es.dif <- gsva(maupin$data, list(up = u_geneSet, down= d_geneSet), mx.diff=1,
    verbose=TRUE, abs.ranking=FALSE, is.gset.list.up.down=TRUE, parallel.sz = 1 )$es.obs