Description Usage Arguments Details Value Column description References See Also Examples
The tsea_mGSEA
function performs a Modified Gene Set Enrichment
Analysis (mGSEA) that supports test sets (e.g. genes or protein IDs) with
duplications. The duplication support is
achieved by a weighting method for duplicated items, where the weighting is
proportional to the frequency of the items in the test set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
drugs |
character vector containing drug identifiers used for functional
enrichment testing. This can be the top ranking drugs from a GESS result.
Internally, drug test sets are translated to the corresponding target protein
test sets based on the drug-target annotations provided under the
|
type |
one of 'GO', 'KEGG' or 'Reactome' |
ont |
character(1). If type is 'GO', assign |
nPerm |
integer defining the number of permutation iterations for calculating p-values |
exponent |
integer value used as exponent in GSEA algorithm. It defines the weight of the items in the item set S. |
pAdjustMethod |
p-value adjustment method, one of 'holm', 'hochberg', 'hommel', 'bonferroni', 'BH', 'BY', 'fdr' |
pvalueCutoff |
double, p-value cutoff |
minGSSize |
integer, minimum size of each gene set in annotation system |
maxGSSize |
integer, maximum size of each gene set in annotation system |
verbose |
TRUE or FALSE, print message or not |
dt_anno |
drug-target annotation source. Currently, one of 'DrugBank', 'CLUE', 'STITCH' or 'all'. If 'dt_anno' is 'all', the targets from the DrugBank, CLUE and STITCH databases will be combined. Usually, it is recommended to set the 'dt_anno' to 'all' since it provides the most complete drug-target annotations. Choosing a single annotation source results in sparser drug-target annotations (particularly CLUE), and thus less complete enrichment results. |
readable |
TRUE or FALSE, it applies when type is 'KEGG' or 'Reactome' indicating whether to convert gene Entrez ids to gene Symbols in the 'itemID' column in the result table. |
The original GSEA method proposed by Subramanian et at., 2005 uses predefined gene sets S defined by functional annotation systems such as GO and KEGG. The goal is to determine whether the genes in S are randomly distributed throughout a ranked test gene list L (e.g. all genes ranked by log2 fold changes) or enriched at the top or bottom of the test list. This is expressed by an Enrichment Score (ES) reflecting the degree to which a set S is overrepresented at the extremes of L.
For TSEA, the query is a target protein set where duplicated entries need to be maintained. To perform GSEA with duplication support, here referred to as mGSEA, the target set is transformed to a score ranked target list L_tar of all targets provided by the corresponding annotation system. For each target in the query target set, its frequency is divided by the number of targets in the target set, which is the weight of that target. For targets present in the annotation system but absent in the target set, their scores are set to 0. Thus, every target in the annotation system will be assigned a score and then sorted decreasingly to obtain L_tar.
In case of TSEA, the original GSEA method cannot be used directly since a large portion of zeros exists in L_tar. If the scores of the genes in set S are all zeros, N_R (sum of scores of genes in set S) will be zero, which cannot be used as the denominator. In this case, ES is set to -1. If only some genes in set S have scores of zeros then N_R is set to a larger number to decrease the weight of the genes in S that have non-zero scores.
The reason for this modification is that if only one gene in gene set S has a non-zero score and this gene ranks high in L_tar, the weight of this gene will be 1 resulting in an ES(S) close to 1. Thus, the original GSEA method will score the gene set S as significantly enriched. However, this is undesirable because in this example only one gene is shared among the target set and the gene set S. Therefore, giving small weights (lowest non-zero score in L_tar) to genes in S that have zero scores could decrease the weight of the genes in S that have non-zero scores, thereby decreasing the false positive rate. To favor truly enriched functional categories (gene set S) at the top of L_tar, only gene sets with positive ES are selected.
feaResult
object, the result table contains the
enriched functional categories (e.g. GO terms or KEGG pathways) ranked by
the corresponding enrichment statistic.
The TSEA results (including tsea_mGSEA
) stored in the feaResult
object can be returned with the result
method in tabular format,
here tibble
. The columns of this tibble
are described below.
enrichmentScore: ES from the GSEA algorithm (Subramanian et al., 2005). The score is calculated by walking down the gene list L, increasing a running-sum statistic when we encounter a gene in S and decreasing when it is not. The magnitude of the increment depends on the gene scores. The ES is the maximum deviation from zero encountered in the random walk. It corresponds to a weighted Kolmogorov-Smirnov-like statistic.
NES: Normalized enrichment score. The positive and negative
enrichment scores are normalized separately by permutating the
composition of the gene list L nPerm
times, and dividing the
enrichment score by the mean of the permutation ES with the same sign.
pvalue: The nominal p-value of the ES is calculated using a permutation test. Specifically, the composition of the gene list L is permuted and the ES of the gene set is recomputed for the permutated data generating a null distribution for the ES. The p-value of the observed ES is then calculated relative to this null distribution.
leadingEdge: Genes in the gene set S (functional category) that appear in the ranked list L at, or before, the point where the running sum reaches its maximum deviation from zero. It can be interpreted as the core of a gene set that accounts for the enrichment signal.
ledge_rank: Ranks of genes in 'leadingEdge' in gene list L.
Additional columns are described under the 'result' slot of the
feaResult
object.
GSEA algorithm: Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. URL: https://doi.org/10.1073/pnas.0506580102
1 2 3 4 5 6 7 8 9 10 11 | data(drugs10)
## GO annotation system
#res1 <- tsea_mGSEA(drugs=drugs10, type="GO", ont="MF", exponent=1,
# nPerm=1000, pvalueCutoff=1, minGSSize=5)
#result(res1)
#res2 <- tsea_mGSEA(drugs=drugs10, type="KEGG", exponent=1,
# nPerm=100, pvalueCutoff=1, minGSSize=5)
#result(res2)
## Reactome annotation system
#res3 <- tsea_mGSEA(drugs=drugs10, type="Reactome", pvalueCutoff=1)
#result(res3)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.