PGSEA: Parametric Gene Set Enrichment Analysis

Description Usage Arguments Details Value Note Author(s) References Examples


This package contains functions for an exploratory parametric analysis of gene expression data. This type of analysis can assist in determining of lists of genes, such as those deregulated in defined experimental systems, are similarly deregulated in other data sets.

This function subsets the data based on lists of genes, computes a summary statistic for each gene list, and returns the results in a convenient form.


PGSEA(exprs, cl, range = c(25, 500), ref = NULL, center = TRUE, p.value = 0.005, weighted = TRUE, enforceRange=TRUE, ...)



matrix expression data, a numeric matrix, eSet, or ExpressionSet


gene set list - "GeneSetCollection" or list of "SMC" objects


a 2 element vector describing the min and max length of concepts to analyze


boolean - if TRUE, the expression matrix must contain data for the proper number of genes as set by the range argument to return a significant result. (this argument is used for data that contains NA's...)


a vector containing the index of reference samples from which to make comparisons. Defaults to NULL (internally referenced samples)


boolean - median center gene expression matrix columns prior to analysis. Can be helpful if 'ref' is used


numeric p.value threshold or NA to return all data or TRUE to return a matrix of p.values


boolean - weight results by the size of each gene list


extra arguments passed along to FUN


Gene expression values are separated into subsets based on the lists of genes contained in the cl argument. This can be a "GeneSetCollection" or a list of "SMC" (Simple Molecular Concept) objects. For example, readGmt can be used to produce a 'smc' object list from a simple tab-delimited text file. The gene expression values from each of these gene lists is extracted and a summary statistic is computed for each subset (or region in the case of chromosomal bands/arms).

The expression data must have the same identifiers as the list of genes being tested. If they are not, the expression data can be converted using the aggregateExprs function, that can use a current annotation environment to convert and condense the gene expression data.

By default the method set out by Kim and Volsky is applied to the gene set. If weighted==FALSE than the default t.test function is used.

The function is set up to perform the analysis on individual samples. For convenient method to analyze groups of samples, see the "Limma User's Guide" for more information on how to see up a contrast matrix and perform a linear model fit. The coefficients of the fit can then be used a input into the PGSEA function.

This package has not been extensively tested beyond a set of well defined curated pathways using the Affymetrix platform and significance values represent approximations. Any results should be confirmed by additional gene set testing methodologies.


If p.value is set to a number, a matrix of results that pass at that significance is returned, of size <number of samples> x <number of molecular concepts>.

If p.value is set to NA, all results are returned.

If p.value is set to TRUE, then a list is returned that consists of the PGSEA results as well as their p.values.



Kim SY, Volsky DJ., and


PGSEA: Parametric Analysis of Gene Set Enrichment


	datadir <- system.file("extdata", package = "PGSEA")
	sample <- readGmt(file.path(datadir, "sample.gmt"))
	pg <- PGSEA(nbEset,cl=sample,ref=1:5)

Example output

Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,, cbind, colMeans, colSums, colnames,,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax,,
    pmin,, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':


Loading required package: KEGG.db

KEGG.db contains mappings based on older data because the original
  resource was removed from the the public domain before the most
  recent update was produced. This package should now be considered
  deprecated and future versions of Bioconductor may not have it
  available.  Users who want more current data are encouraged to look
  at the KEGGREST or reactome.db packages

Loading required package: annaffy
                              GSM90306.CEL GSM90307.CEL GSM90308.CEL
ras UP - pmid: 16273092   NA            NA           NA           NA
ras DN - pmid: 16273092   NA            NA           NA           NA
myc UP - pmid: 16273092   NA            NA           NA           NA
myc DN - pmid: 16273092   NA            NA           NA           NA
5p   NA                                 NA           NA           NA
5q   NA                                 NA     3.142932           NA
                              GSM90309.CEL GSM90310.CEL GSM90387.CEL
ras UP - pmid: 16273092   NA            NA           NA           NA
ras DN - pmid: 16273092   NA            NA           NA           NA
myc UP - pmid: 16273092   NA            NA           NA           NA
myc DN - pmid: 16273092   NA            NA           NA           NA
5p   NA                                 NA           NA           NA
5q   NA                                 NA      4.78358           NA
                              GSM90388.CEL GSM90389.CEL GSM90390.CEL
ras UP - pmid: 16273092   NA            NA           NA    -2.925284
ras DN - pmid: 16273092   NA            NA           NA           NA
myc UP - pmid: 16273092   NA      4.114215     3.910187     5.732666
myc DN - pmid: 16273092   NA            NA           NA    -2.936305
5p   NA                                 NA           NA           NA
5q   NA                                 NA           NA           NA
ras UP - pmid: 16273092   NA            NA
ras DN - pmid: 16273092   NA            NA
myc UP - pmid: 16273092   NA            NA
myc DN - pmid: 16273092   NA            NA
5p   NA                                 NA
5q   NA                                 NA

PGSEA documentation built on April 28, 2020, 8:28 p.m.