RcisTarget: cisTarget

cisTargetR Documentation

cisTarget

Description

Identifies DNA motifs significantly over-represented in a gene-set.

This is the main function to run RcisTarget. It includes on the following steps:

  • 1. Motif enrichment analysis (calcAUC)

  • 2. Motif-TF annotation (addMotifAnnotation)

  • 3. Selection of significant genes (addSignificantGenes)

Usage

cisTarget(
  geneSets,
  motifRankings,
  motifAnnot = NULL,
  motifAnnot_highConfCat = c("directAnnotation", "inferredBy_Orthology"),
  motifAnnot_lowConfCat = c("inferredBy_MotifSimilarity",
    "inferredBy_MotifSimilarity_n_Orthology"),
  highlightTFs = NULL,
  nesThreshold = 3,
  aucMaxRank = 0.05 * ncol(motifRankings),
  geneErnMethod = "aprox",
  geneErnMaxRank = 5000,
  nCores = 1,
  verbose = TRUE
)

Arguments

geneSets

List of gene-sets to analyze. The gene-sets should be provided as GeneSet, GeneSetCollection or character list (see examples).

motifRankings

Database of the appropiate organism and search-space (i.e. 10kbp around- or 500bp upstream the TSS). These objects are provided in separate files, which can be imported with importRankings():

See vignette("RcisTarget") for an exhaustive list of databases.

motifAnnot

Motif annotation database containing the annotations of the motif to transcription factors.

motifAnnot_highConfCat

Categories considered as source for 'high confidence' annotations. By default, "directAnnotation" (annotated in the source database), and "inferredBy_Orthology" (the motif is annotated to an homologous/ortologous gene).

motifAnnot_lowConfCat

Categories considered 'lower confidence' source for annotations. By default, the annotations inferred based on motif similarity ("inferredBy_MotifSimilarity", "inferredBy_MotifSimilarity_n_Orthology").

highlightTFs

Character. If a list of transcription factors is provided, the column TFinDB in the otuput table will indicate whether any of those TFs are included within the 'high-confidence' annotation (two asterisks, **) or 'low-confidence' annotation (one asterisk, *) of the motif. The vector can be named to indicate which TF to highlight for each gene-set. Otherwise, all TFs will be used for all geneSets.

nesThreshold

Numeric. NES threshold to calculate the motif significant (3.0 by default). The NES is calculated -for each motif- based on the AUC distribution of all the motifs for the gene-set [(x-mean)/sd]. The motifs are considered significantly enriched if they pass the the Normalized Enrichment Score (NES) threshold.

aucMaxRank

Threshold to calculate the AUC. In a simplified way, the AUC value represents the fraction of genes -within the top X genes in the ranking- that are included in the signature. The parameter 'aucThresholdPERC' allows to modify the percentage of genes (of the top of the ranking) that is used to perform this computation. By default it is set to 5% of the total number of genes in the rankings. Common values range from 1 to 10%.

geneErnMethod

"iCisTarget" or "aprox". Method to identify the highly ranked genes (see addSignificantGenes for details).

geneErnMaxRank

Maximum rank to take into account for the gene enrichment recovery curve (see addSignificantGenes for details).

nCores

Number of cores to use for computation. Note: In general, using a higher number of cores (e.g. processes) decreases overall running time. However, it also deppends on the available memory and overall system load. Setting nCores too high might also decrease performance.

verbose

Should the function show progress messages? (TRUE / FALSE)

Value

data.table containing the over-represented motifs (according to the selected NES threshold), their statistics, annotation to transcription factors and the genes with high enrichment of the motif.

See Also

See the package vignette for examples and more details: vignette("RcisTarget")

Examples



# Example for running RcisTarget using cisTarget() function (workflow wrapper)

## Not run: 

##################################################
### Load your gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
                 "hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])

### Load databases
# Motif rankings: Select according to organism and distance around TSS
# (See the vignette for URLs to download)
motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")

## Motif - TF annotation:
data(motifAnnotations_hgnc_v9) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc_v9
##################################################

# Run (R)cisTarget
motifEnrichmentTable_wGenes <- cisTarget(geneLists, motifRankings,
  motifAnnot_direct=hg19_direct_motifAnnotation,
  nesThreshold=3.5, geneErnMethod="aprox", nCores=2)


## End(Not run)

# Load results from analysis
load(paste(file.path(system.file('examples', package='RcisTarget')),
           "motifEnrichmentTable_wGenes.RData", sep="/"))


### Exploring the output:
# Note: If using the fake-database, the results are not meaningful

# Number of enriched motifs (Over the given NES threshold)
nrow(motifEnrichmentTable_wGenes)

# Available info (columns)
colnames(motifEnrichmentTable_wGenes)

# The object returned is a data.table (for faster computation),
# which has a diferent syntax from the standard data.frame or matrix
# Feel free to convert it to a data.frame (as.data.frame())
class(motifEnrichmentTable_wGenes)
motifEnrichmentTable_wGenes[,1:5]

# Enriched genes
enrGenes <- as.character(motifEnrichmentTable_wGenes[1,"enrichedGenes"])
strsplit(enrGenes, ";")


# Interactive exploration
motifEnrichmentTable_wGenes <- addLogo(motifEnrichmentTable_wGenes)
DT::datatable(motifEnrichmentTable_wGenes[,1:9], escape = FALSE, filter="top",
              options=list(pageLength=5))
# Note: If using the fake database, the results of this analysis are meaningless


aertslab/RcisTarget documentation built on March 7, 2024, 11:21 p.m.