cisTarget | R Documentation |
Identifies DNA motifs significantly over-represented in a gene-set.
This is the main function to run RcisTarget. It includes on the following steps:
1. Motif enrichment analysis (calcAUC)
2. Motif-TF annotation (addMotifAnnotation)
3. Selection of significant genes (addSignificantGenes)
cisTarget(
geneSets,
motifRankings,
motifAnnot = NULL,
motifAnnot_highConfCat = c("directAnnotation", "inferredBy_Orthology"),
motifAnnot_lowConfCat = c("inferredBy_MotifSimilarity",
"inferredBy_MotifSimilarity_n_Orthology"),
highlightTFs = NULL,
nesThreshold = 3,
aucMaxRank = 0.05 * ncol(motifRankings),
geneErnMethod = "aprox",
geneErnMaxRank = 5000,
nCores = 1,
verbose = TRUE
)
geneSets |
List of gene-sets to analyze.
The gene-sets should be provided as |
motifRankings |
Database of the appropiate organism and search-space
(i.e. 10kbp around- or 500bp upstream the TSS).
These objects are provided in separate files,
which can be imported with
See |
motifAnnot |
Motif annotation database containing the annotations of the motif to transcription factors. |
motifAnnot_highConfCat |
Categories considered as source for 'high confidence' annotations. By default, "directAnnotation" (annotated in the source database), and "inferredBy_Orthology" (the motif is annotated to an homologous/ortologous gene). |
motifAnnot_lowConfCat |
Categories considered 'lower confidence' source for annotations. By default, the annotations inferred based on motif similarity ("inferredBy_MotifSimilarity", "inferredBy_MotifSimilarity_n_Orthology"). |
highlightTFs |
Character. If a list of transcription factors is provided, the column TFinDB in the otuput table will indicate whether any of those TFs are included within the 'high-confidence' annotation (two asterisks, **) or 'low-confidence' annotation (one asterisk, *) of the motif. The vector can be named to indicate which TF to highlight for each gene-set. Otherwise, all TFs will be used for all geneSets. |
nesThreshold |
Numeric. NES threshold to calculate the motif significant (3.0 by default). The NES is calculated -for each motif- based on the AUC distribution of all the motifs for the gene-set [(x-mean)/sd]. The motifs are considered significantly enriched if they pass the the Normalized Enrichment Score (NES) threshold. |
aucMaxRank |
Threshold to calculate the AUC. In a simplified way, the AUC value represents the fraction of genes -within the top X genes in the ranking- that are included in the signature. The parameter 'aucThresholdPERC' allows to modify the percentage of genes (of the top of the ranking) that is used to perform this computation. By default it is set to 5% of the total number of genes in the rankings. Common values range from 1 to 10%. |
geneErnMethod |
"iCisTarget" or "aprox". Method to identify the highly ranked genes (see addSignificantGenes for details). |
geneErnMaxRank |
Maximum rank to take into account for the gene enrichment recovery curve (see addSignificantGenes for details). |
nCores |
Number of cores to use for computation. Note: In general, using a higher number of cores (e.g. processes) decreases overall running time. However, it also deppends on the available memory and overall system load. Setting nCores too high might also decrease performance. |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
data.table
containing the over-represented
motifs (according to the selected NES threshold),
their statistics, annotation to transcription factors and the genes with
high enrichment of the motif.
See the package vignette for examples and more details:
vignette("RcisTarget")
# Example for running RcisTarget using cisTarget() function (workflow wrapper)
## Not run:
##################################################
### Load your gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
"hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])
### Load databases
# Motif rankings: Select according to organism and distance around TSS
# (See the vignette for URLs to download)
motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")
## Motif - TF annotation:
data(motifAnnotations_hgnc_v9) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc_v9
##################################################
# Run (R)cisTarget
motifEnrichmentTable_wGenes <- cisTarget(geneLists, motifRankings,
motifAnnot_direct=hg19_direct_motifAnnotation,
nesThreshold=3.5, geneErnMethod="aprox", nCores=2)
## End(Not run)
# Load results from analysis
load(paste(file.path(system.file('examples', package='RcisTarget')),
"motifEnrichmentTable_wGenes.RData", sep="/"))
### Exploring the output:
# Note: If using the fake-database, the results are not meaningful
# Number of enriched motifs (Over the given NES threshold)
nrow(motifEnrichmentTable_wGenes)
# Available info (columns)
colnames(motifEnrichmentTable_wGenes)
# The object returned is a data.table (for faster computation),
# which has a diferent syntax from the standard data.frame or matrix
# Feel free to convert it to a data.frame (as.data.frame())
class(motifEnrichmentTable_wGenes)
motifEnrichmentTable_wGenes[,1:5]
# Enriched genes
enrGenes <- as.character(motifEnrichmentTable_wGenes[1,"enrichedGenes"])
strsplit(enrGenes, ";")
# Interactive exploration
motifEnrichmentTable_wGenes <- addLogo(motifEnrichmentTable_wGenes)
DT::datatable(motifEnrichmentTable_wGenes[,1:9], escape = FALSE, filter="top",
options=list(pageLength=5))
# Note: If using the fake database, the results of this analysis are meaningless
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.