tsea_mGSEA: Target Set Enrichment Analysis (TSEA) with mGSEA Algorithm

Description Usage Arguments Details Value Column description References See Also Examples

View source: R/tsea_mGSEA.R

Description

The tsea_mGSEA function performs a Modified Gene Set Enrichment Analysis (mGSEA) that supports test sets (e.g. genes or protein IDs) with duplications. The duplication support is achieved by a weighting method for duplicated items, where the weighting is proportional to the frequency of the items in the test set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
tsea_mGSEA(
  drugs,
  type = "GO",
  ont = "MF",
  nPerm = 1000,
  exponent = 1,
  pAdjustMethod = "BH",
  pvalueCutoff = 0.05,
  minGSSize = 5,
  maxGSSize = 500,
  verbose = FALSE,
  dt_anno = "all",
  readable = FALSE
)

Arguments

drugs

character vector containing drug identifiers used for functional enrichment testing. This can be the top ranking drugs from a GESS result. Internally, drug test sets are translated to the corresponding target protein test sets based on the drug-target annotations provided under the dt_anno argument.

type

one of 'GO', 'KEGG' or 'Reactome'

ont

character(1). If type is 'GO', assign ont (ontology) one of 'BP','MF', 'CC' or 'ALL'. If type is 'KEGG' or 'Reactome', ont is ignored.

nPerm

integer defining the number of permutation iterations for calculating p-values

exponent

integer value used as exponent in GSEA algorithm. It defines the weight of the items in the item set S.

pAdjustMethod

p-value adjustment method, one of 'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr'

pvalueCutoff

double, p-value cutoff

minGSSize

integer, minimum size of each gene set in annotation system

maxGSSize

integer, maximum size of each gene set in annotation system

verbose

TRUE or FALSE, print message or not

dt_anno

drug-target annotation source. Currently, one of 'DrugBank', 'CLUE', 'STITCH' or 'all'. If 'dt_anno' is 'all', the targets from the DrugBank, CLUE and STITCH databases will be combined. Usually, it is recommended to set the 'dt_anno' to 'all' since it provides the most complete drug-target annotations. Choosing a single annotation source results in sparser drug-target annotations (particularly CLUE), and thus less complete enrichment results.

readable

TRUE or FALSE, it applies when type is 'KEGG' or 'Reactome' indicating whether to convert gene Entrez ids to gene Symbols in the 'itemID' column in the result table.

Details

The original GSEA method proposed by Subramanian et at., 2005 uses predefined gene sets S defined by functional annotation systems such as GO and KEGG. The goal is to determine whether the genes in S are randomly distributed throughout a ranked test gene list L (e.g. all genes ranked by log2 fold changes) or enriched at the top or bottom of the test list. This is expressed by an Enrichment Score (ES) reflecting the degree to which a set S is overrepresented at the extremes of L.

For TSEA, the query is a target protein set where duplicated entries need to be maintained. To perform GSEA with duplication support, here referred to as mGSEA, the target set is transformed to a score ranked target list L_tar of all targets provided by the corresponding annotation system. For each target in the query target set, its frequency is divided by the number of targets in the target set, which is the weight of that target. For targets present in the annotation system but absent in the target set, their scores are set to 0. Thus, every target in the annotation system will be assigned a score and then sorted decreasingly to obtain L_tar.

In case of TSEA, the original GSEA method cannot be used directly since a large portion of zeros exists in L_tar. If the scores of the genes in set S are all zeros, N_R (sum of scores of genes in set S) will be zero, which cannot be used as the denominator. In this case, ES is set to -1. If only some genes in set S have scores of zeros then N_R is set to a larger number to decrease the weight of the genes in S that have non-zero scores.

The reason for this modification is that if only one gene in gene set S has a non-zero score and this gene ranks high in L_tar, the weight of this gene will be 1 resulting in an ES(S) close to 1. Thus, the original GSEA method will score the gene set S as significantly enriched. However, this is undesirable because in this example only one gene is shared among the target set and the gene set S. Therefore, giving small weights (lowest non-zero score in L_tar) to genes in S that have zero scores could decrease the weight of the genes in S that have non-zero scores, thereby decreasing the false positive rate. To favor truly enriched functional categories (gene set S) at the top of L_tar, only gene sets with positive ES are selected.

Value

feaResult object, the result table contains the enriched functional categories (e.g. GO terms or KEGG pathways) ranked by the corresponding enrichment statistic.

Column description

The TSEA results (including tsea_mGSEA) stored in the feaResult object can be returned with the result method in tabular format, here tibble. The columns of this tibble are described below.

Additional columns are described under the 'result' slot of the feaResult object.

References

GSEA algorithm: Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. URL: https://doi.org/10.1073/pnas.0506580102

See Also

feaResult, fea

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(drugs10)
## GO annotation system
#res1 <- tsea_mGSEA(drugs=drugs10, type="GO", ont="MF", exponent=1, 
#                   nPerm=1000, pvalueCutoff=1, minGSSize=5)
#result(res1)
#res2 <- tsea_mGSEA(drugs=drugs10, type="KEGG", exponent=1, 
#                   nPerm=100, pvalueCutoff=1, minGSSize=5)
#result(res2)
## Reactome annotation system
#res3 <- tsea_mGSEA(drugs=drugs10, type="Reactome", pvalueCutoff=1)
#result(res3)

signatureSearch documentation built on April 16, 2021, 6 p.m.