geneData: View the expression data for selected genes
In gage: Generally Applicable Gene-set Enrichment for Pathway Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

This function outputs and visualizes the expression data for seleted genes. Potential output files include: a tab-delimited text file, a heatmap in PDF format, and a scatter plot in PDF format.

geneData(genes, exprs, ref = NULL, samp = NULL, outname = "array",
txt = TRUE, heatmap = FALSE, scatterplot = FALSE, samp.mean = FALSE,
pdf.size = c(7, 7), cols = NULL, scale = "row", limit = NULL,
label.groups = TRUE, ...)

`genes`	character, either a vector of interesting genes IDs or a 2-column matrix, where the first column specifies gene IDs used in `expData` while the second column gives another type of IDs to use for the output data files.
`exprs`	an expression matrix or matrix-like data structure, with genes as rows and samples as columns.
`ref`	a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments.
`samp`	a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments.
`outname`	a character string, to be used as the prefix of the output data files. Default to be "array".
`txt`	boolean, whether to output the selected gene data as a tab-delimited text file. Default to be TRUE.
`heatmap`	boolean, whether to plot heatmap for the selected gene data as a PDF file. Default to be FALSE.
`scatterplot`	boolean, whether to make scatter plot for the selected gene data as a PDF file. Default to be FALSE.
`samp.mean`	boolean, whether to take the mean of gene data over the ref and samp group when making the scatter plot. Default to be FALSE, i.e. make scatter plots for the first two ref-samp pairs and label them differently on the same graph panel.
`pdf.size`	a numeric vector to specify the the width and height of PDF graphics region in inches. Default to be c(7, 7).
`cols`	a character vector to specify colors used for the heatmap image blocks. Default to be NULL, i.e. to generate a green-red spectrum based on the gene data automatically.
`scale`	character indicating if the values should be centered and scaled in either the row direction or the column direction, or none for the heatmap. The default is "row", other options include "column" and "none".
`limit`	numeric value to specify the maximal absolute value of gene data to visualize using the heatmap. Gene data beyong will be reset to equal this value. Default to NULL, i.e. plot all gene data values. This argument allows optimal differentiation between most gene data values when extremely positive/negative values exsit and squeeze the normal-value region. Recommend limit = 3 when the gene data is scaled by row.
`label.groups`	boolean, whether to label the two sample groups, i.e. ref and samp, differently using side color bars along the heatmap area. Default to be TRUE.
`...`	other arguments to be passed into the inside `heatmap2` function.

This function integrated three most common presentation methods for gene expression data: tab-delimited text file, heatmap and scatter plot. Heatmap is ideal for visualizing relative changes with gene-wise standardized (or row-scaled) data. The heatmap is generated by calling a improved version of the heatmap.2 function from gplots package. Scatter plot is ideal for visualizing the modest or small but consistent changes over a gene set between two states under comparison.

Although geneData is designed to be a standard-alone function, it is frequently used in tandem with essGene function to present the changes of the essential genes in signficant gene sets.

The function returns invisible 1 when successfully executed.

Weijun Luo <luo_weijun@yahoo.com>

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

essGene extract the essential member genes in a gene set; gage the main function for GAGE analysis;

data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)

#kegg test for 1-directional changes
data(kegg.gs)
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
rownames(gse16873.kegg.p$greater)[1:3]
gs=unique(unlist(kegg.gs[rownames(gse16873.kegg.p$greater)[1:3]]))
essData=essGene(gs, gse16873, ref =hn, samp =dcis)
head(essData)
ref1=1:6
samp1=7:12
#generated text file for data table, pdf files for heatmap and scatterplot
for (gs in rownames(gse16873.kegg.p$greater)[1:3]) {
    outname = gsub(" |:|/", "_", substr(gs, 10, 100))
    geneData(genes = kegg.gs[[gs]], exprs = essData, ref = ref1,
        samp = samp1, outname = outname, txt = TRUE, heatmap = TRUE,
        Colv = FALSE, Rowv = FALSE, dendrogram = "none", limit = 3, scatterplot = TRUE)
}

[1] "hsa04141 Protein processing in endoplasmic reticulum"
[2] "hsa00190 Oxidative phosphorylation"                  
[3] "hsa03050 Proteasome"                                 
          HN_1     HN_2      HN_3     HN_4      HN_5      HN_6    DCIS_1
1345  9.109413 9.373454 10.988181 9.161435 11.032016 11.231293 12.675099
5691  8.283191 7.716745  7.553621 8.381538  7.768811  7.635745  8.840405
51128 7.424312 7.970012  8.034436 6.806669  8.508019  8.523295  8.636513
2923  9.362371 9.150221  8.537944 8.828966  9.890736  9.980784 11.168409
10130 9.088828 8.983823  9.493544 8.255197 10.040715  9.959327 10.563274
3312  9.696461 9.782686  9.219330 8.553472 10.165793  9.924388 10.670421
         DCIS_2    DCIS_3    DCIS_4    DCIS_5    DCIS_6
1345  11.231271 12.547915  8.979639 13.470266 12.156052
5691   8.125168  7.782958 11.910352  7.956182  7.713826
51128  8.639654  8.740153  8.037517  9.060658  8.845460
2923   9.396683  9.176227  9.752254 10.529351 10.242883
10130  9.158080 10.215777  9.383801 10.444705 10.319854
3312  10.309618  9.811460  9.144538 11.019411 10.437532