knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # knitr knits in a new session with an empty global workspace after setting its # working directory to ./vignettes. To make your package functions available in # the vignette, you have to load the library. The following two lines should # accomplish this without manual intervention: pkgName <- trimws(gsub("^Package:", "", readLines("../DESCRIPTION")[1])) library(pkgName, character.only = TRUE)
This vignette [@steipe-rptPlus] demonstrates the use of domainEnrichment()
function in BCB420.2019.ESA
package.
The function interpret the enriched domains in desired system. The input is the system code and the output is a plot and return a data frame with data.
The enriched domains could be further used to indicate enriched gene funcitons, or help define subsystems with same domains, or identify genes with very special role in the system or identify anomalies.
This function use data from InterPro[@pmid30398656] and fetched by fetchData()
.
InterPor data is two large lists that store the mapping from HGNC symbol to domain ID and domain ID to HGNC symbol for genes. It is fetched by Dr. Steipe.
Fisher exact test is used to calculate p-value. Contingency table are generated and p-value is calculated by stats::fisher.test()
[@72556].
Example contingency table for a domain:
x <- matrix(c("x","k-x","k", "m-x", "n-(k-x)", "m+n-k", "m", "n", "m+n"),nrow=3, ncol=3) nuc <- c("A", "G", "C", "T") colnames(x) <- c("Have domain", "Not have domain", "Total") rownames(x) <- c("In System", "Not in System", "Total") knitr::kable(x, caption = "Example Contingency Table", align = "c")
Multiple test is used to controlling the false discovery rate (FDR). This funciton uses Benjamini-Hochberg control.
First of all, we need to know the exist system code for us to analyze:
names(SyDBgetRootSysIDs(fetchData("SysDB")))
Then, user can choose on system code as input for domainEnrichment()
. Here I use "PHALY" as example:
exampleOutcome <- domainEnrichment("PHALY") # alpha value is 0.05 by default
The header of the output data frame:
head(exampleOutcome)
The example outcome looks like this:
(Temporarily removed to reduce vignette size)
The return value for the function is a data frame that contains domain ID, description, and data analysis information.
The output plot contains all the domains that pass the Benjamini-Hochberg control, adjust p-value is smaller than alpha. The null hypothsis is that categories are independent. The smaller p-value means that they reject null hypothesis and the categories are not independent. Thus the domain is more enriched in the system.
The enriched domain could be used to interpret more about the system:
The enriched domains can indicate the enriched gene functions in the system and have the sense of possible mechanism in the system
If generate a large map with the enriched domain of lots of systems, is it possible to predict what kinds of systems a new gene may involved by knowing the domains in it.
If the system is too big, the enriched domains may be useful to define subsystems
If the system do have very specific enriched domains, genes that not have enriched domains but still involved in the system may have very specific role in the system or maybe anomalies
This release of the BCB420.2019.ESA
package was produced in the following context of supporting packages:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.