gagePipe: GAGE analysis pipeline

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/gagePipe.R

Description

Function gagePipe runs mutliple rounds of GAGE in a batch without interference, and outputs signcant gene set lists in text format, heatmaps in pdf format, and save the results in RData format.

Usage

1
2
3
4
gagePipe(arraydata, dataname = "arraydata", trim.at=TRUE, sampnames, gsdata = NULL,
gsname = c("kegg.gs", "go.gs"), ref.list, samp.list, weight.list = NULL,
comp.list = "paired", q.cutoff = 0.1, heatmap=TRUE, pdf.size = c(7,
7), p.limit=c(0.5, 5.5), stat.limit=5, ... )

Arguments

arraydata

corresponds to exprs argument for gage function. But can either be a matrix-like data structure when the data has been loaded into R or the full path to the data file in .RData format if the data has not been loaded.

dataname

character, the name of the data to be analyzed. This name will be included as the prefix of the output file names. Default to be "arraydata".

trim.at

boolean, whether to trim the suffix "_at" from the probe set IDs or row names of the microarray data. Default to be TRUE.

sampnames

character vector, the names of the sample groups, on which the GAGE analysis is done. Each sample groups corresponds to one element of samp.list and the matching element of ref.list. These names will be included in the output file names or object names.

gsdata

character, the full path to the gene set data file in .RData format if the data has not been loaded. Default to be NULL, i.e. the gene set data has been loaded. Make sure that the same gene ID system is used for both gsdata and arraydata.

gsname

character, the name(s) of the gene set collections to be used. Default to be c("kegg.gs", "go.gs").

ref.list

a list of ref inputs for gage function. In other words, each element of the list is a column number vector for the reference condition or phenotype (i.e. the control group) in the exprs data matrix.

samp.list

a list of samp inputs for gage function. In other words, each element of the list is a column number vector for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix.

weight.list

a list or a vector of weights input(s) for gage function. As a list, the length of weight.list should equal to the length of ref.list and samp.list or 1. The weight.list vector or its element vectors of should match the elements of ref.list and samp.list in length or being NULL. When weight.list is a vector or length 1 list, all analyses will use the same weights setting.

comp.list

a list or a vector of compare input(s) for gage function. The length of the list or vector should equal to the length of ref.list and samp.list or 1. In the latter case, all analyses will use the same comparison scheme. The same as compare, the element value(s) in comp.list can be 'paired', 'unpaired', '1ongroup' or 'as.group'. Default to be 'paired'.

q.cutoff

numeric, q-value cutoff between 0 and 1 for signficant gene sets selection. Default to be 0.1.

heatmap

boolean, whether to plot heatmap for the selected gene data as a PDF file. Default to be FALSE.

pdf.size

a numeric vector to specify the the width and height of PDF graphics region in inches. Default to be c(7, 7).

stat.limit

numeric vector of length 1 or 2 to specify the value range of gene set statistics to visualize using the heatmap. Statistics beyong will be reset to equal the proximal limit. Default to 5, i.e. plot all gene set statistics within (-5, 5) range. May also be NULL, i.e. plot all statistics without limit. This argument allows optimal differentiation between most gene set statistic values when extremely positive/negative values exsit and squeeze the normal-value region.

p.limit

numeric vector of length 1 or 2 to specify the value range of gene set -log10(p-values) to visualize using the heatmap. Values beyong will be reset to equal the proximal limit. Default to c(0.5,5.5), i.e. plot all -log10(p-values) within this range. This argument is similar to argument stat.limit.

...

other arguments to be passed into gage or gs.heatmap function, which is a wrapper of the heatmap2 function.

Details

gagePipe carries two rounds of GAGE analysis on each sample groups for each gene set collection specified in gsnames: one test for 1-direction changes (up- or down-regualted gene sets), one test for 2-direction changes (two-way perturbed gene sets). Correspondingly, the gage result p-value matrices for the signficant gene sets are written into two tab-delimited text files, named after the dataname and sampnames. Note that the text file for 1-direction changes tests combines results for both up- and down-regulated gene sets. By default, heatmaps in pdf format are also produced to show the gene set perturbations using either -log10(p-value) or statistics. Meanwhile, the full gage analysis result objects (named lists of p-value or statistics matrices) are saved into a .RData file. The result objects are name after the sampnames and gsnames.

Value

The function returns invisible 1 when successfully executed.

Author(s)

Weijun Luo <luo_weijun@yahoo.com>

References

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

See Also

gage the main function for GAGE analysis; heter.gage GAGE analysis for heterogeneous data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)

library(gageData)
data(gse16873.2)
cn2=colnames(gse16873.2)
hn2=grep('HN',cn2, ignore.case =TRUE)
dcis2=grep('DCIS',cn2, ignore.case =TRUE)

#multiple GAGE analysis in a batch with the combined data
gse16873=cbind(gse16873, gse16873.2)
dataname='gse16873' #output data prefix
sampnames=c('dcis.1', 'dcis.2')
refList=list(hn, hn2+12)
sampList=list(dcis, dcis2+12)
gagePipe(gse16873, gsname = "kegg.gs", dataname = "gse16873",
    sampnames = sampnames, ref.list = refList, samp.list = sampList,
    comp.list = "paired")

#follow up comparison between the analyses
load('gse16873.gage.RData')
#list gage result objects
objects(pat = "[.]p$")
gageComp(sampnames, dataname, gsname = "kegg.gs",
    do.plot = TRUE)

Example output

[1] "dcis.1.kegg.gs.2d.p" "dcis.1.kegg.gs.p"    "dcis.2.kegg.gs.2d.p"
[4] "dcis.2.kegg.gs.p"   

gage documentation built on Dec. 13, 2020, 2:01 a.m.