essGene: Essential member genes in a gene set

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/essGene.R

Description

This function extracts data for essential member genes in a gene set. Essential genes are genes that have changes over noise level.

Usage

1
2
3
essGene(gs, exprs, ref = NULL, samp = NULL, gsets = NULL, compare
= "paired", use.fold = TRUE, rank.abs = FALSE, use.chi = FALSE, chi.p =
0.05, ...)

Arguments

gs

character, either the name of an interesting gene set in a gene set collection passed by gsets argument, or a vector of gene IDs. Make sure that the same gene ID system is used for both gs and exprs.

exprs

an expression matrix or matrix-like data structure, with genes as rows and samples as columns.

ref

a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments.

samp

a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments.

gsets

a named list, each element contains a gene set that is a character vector of gene IDs or symbols. For example, type head(kegg.gs). A gene set can also be a "smc" object defined in PGSEA package. Make sure that the same gene ID system is used for both gsets and exprs. Default to be NULL, then argument gs needs to be a vector of gene IDs.

compare

character, which comparison scheme to be used: 'paired', 'unpaired', '1ongroup', 'as.group'. 'paired' is the default, ref and samp are of equal length and one-on-one paired by the original experimental design; 'as.group', group-on-group comparison between ref and samp; 'unpaired' (used to be '1on1'), one-on-one comparison between all possible ref and samp combinations, although the original experimental design may not be one-on-one paired; '1ongroup', comparison between one samp column at a time vs the average of all ref columns.

use.fold

Boolean, whether the input gage results used fold changes or t-test statistics as per gene statistics. Default use.fold= TRUE.

rank.abs

boolean, whether to sort the essential gene data based on absoluate changes. Default to be FALSE.

use.chi

boolean, whether to use chi-square test to select the essential genes. Default to be FALSE, use the mean plus standard deviation of all gene changes instead. Check details for more information.

chi.p

numeric value between 0 and 1, cutoff p-value for the chi-square test to select the essential genes. Default to 0.05.

...

other arguments to be passed into the inside gagePrep function.

Details

There are two different criteria for essential gene selection. One uses a chi-square test to determin whether the change of a gene is more than noise. A second considers any changes beyond 1 standard deviation from mean of all genes as real.

Note that essential genes are different from core genes considered in esset.grp function. Essential genes may change in a different direction than the overall change of a gene set. But core genes need to change in the in the interesting direction(s) of the gene set test.

Value

A expression data matrix extracted for the essential genes, with similar structure as exprs.

Author(s)

Weijun Luo <luo_weijun@yahoo.com>

References

Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

See Also

gage the main function for GAGE analysis; geneData output and visualization of expression data for selected genes; esset.grp non-redundant signcant gene set list;

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)

#kegg test for 1-directional changes
data(kegg.gs)
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
rownames(gse16873.kegg.p$greater)[1:3]
gs=unique(unlist(kegg.gs[rownames(gse16873.kegg.p$greater)[1:3]]))
essData=essGene(gs, gse16873, ref =hn, samp =dcis)
head(essData)
ref1=1:6
samp1=7:12
#generated text file for data table, pdf files for heatmap and scatterplot
for (gs in rownames(gse16873.kegg.p$greater)[1:3]) {
    outname = gsub(" |:|/", "_", substr(gs, 10, 100))
    geneData(genes = kegg.gs[[gs]], exprs = essData, ref = ref1,
        samp = samp1, outname = outname, txt = TRUE, heatmap = TRUE,
        Colv = FALSE, Rowv = FALSE, dendrogram = "none", limit = 3, scatterplot = TRUE)
}

Example output

[1] "hsa04141 Protein processing in endoplasmic reticulum"
[2] "hsa00190 Oxidative phosphorylation"                  
[3] "hsa03050 Proteasome"                                 
          HN_1     HN_2      HN_3     HN_4      HN_5      HN_6    DCIS_1
1345  9.109413 9.373454 10.988181 9.161435 11.032016 11.231293 12.675099
5691  8.283191 7.716745  7.553621 8.381538  7.768811  7.635745  8.840405
51128 7.424312 7.970012  8.034436 6.806669  8.508019  8.523295  8.636513
2923  9.362371 9.150221  8.537944 8.828966  9.890736  9.980784 11.168409
10130 9.088828 8.983823  9.493544 8.255197 10.040715  9.959327 10.563274
3312  9.696461 9.782686  9.219330 8.553472 10.165793  9.924388 10.670421
         DCIS_2    DCIS_3    DCIS_4    DCIS_5    DCIS_6
1345  11.231271 12.547915  8.979639 13.470266 12.156052
5691   8.125168  7.782958 11.910352  7.956182  7.713826
51128  8.639654  8.740153  8.037517  9.060658  8.845460
2923   9.396683  9.176227  9.752254 10.529351 10.242883
10130  9.158080 10.215777  9.383801 10.444705 10.319854
3312  10.309618  9.811460  9.144538 11.019411 10.437532

gage documentation built on Dec. 13, 2020, 2:01 a.m.