addMotifAnnotation: Add motif annotation

View source: R/02_addMotifAnnotation.R

addMotifAnnotationR Documentation

Add motif annotation

Description

Select significant motifs and/or annotate motifs to genes or transcription factors. The motifs are considered significantly enriched if they pass the the Normalized Enrichment Score (NES) threshold.

Usage

addMotifAnnotation(
  auc,
  nesThreshold = 3,
  digits = 3,
  motifAnnot = NULL,
  motifAnnot_highConfCat = c("directAnnotation", "inferredBy_Orthology"),
  motifAnnot_lowConfCat = c("inferredBy_MotifSimilarity",
    "inferredBy_MotifSimilarity_n_Orthology"),
  idColumn = "motif",
  highlightTFs = NULL,
  keepAnnotationCategory = TRUE
)

Arguments

auc

Output from calcAUC.

nesThreshold

Numeric. NES threshold to calculate the motif significant (3.0 by default). The NES is calculated -for each motif- based on the AUC distribution of all the motifs for the gene-set [(x-mean)/sd].

digits

Integer. Number of digits for the AUC and NES in the output table.

motifAnnot

Motif annotation database containing the annotations of the motif to transcription factors. The names should match the ranking column names.

motifAnnot_highConfCat

Categories considered as source for 'high confidence' annotations. By default, "directAnnotation" (annotated in the source database), and "inferredBy_Orthology" (the motif is annotated to an homologous/ortologous gene).

motifAnnot_lowConfCat

Categories considered 'lower confidence' source for annotations. By default, the annotations inferred based on motif similarity ("inferredBy_MotifSimilarity", "inferredBy_MotifSimilarity_n_Orthology").

idColumn

Annotation column containing the ID (e.g. motif, accession)

highlightTFs

Character. If a list of transcription factors is provided, the column TFinDB in the otuput table will indicate whether any of those TFs are included within the 'high-confidence' annotation (two asterisks, **) or 'low-confidence' annotation (one asterisk, *) of the motif. The vector can be named to indicate which TF to highlight for each gene-set. Otherwise, all TFs will be used for all geneSets.

keepAnnotationCategory

Include annotation type in the TF information?

Value

data.table with the folowing columns:

  • geneSet: Name of the gene set

  • motif: ID of the motif (colnames of the ranking, it might be other kind of feature)

  • NES: Normalized enrichment score of the motif in the gene-set

  • AUC: Area Under the Curve (used to calculate the NES)

  • TFinDB: Indicates whether the highlightedTFs are included within the high-confidence annotation (two asterisks, **) or lower-confidence annotation (one asterisk, *)

  • TF_highConf: Transcription factors annotated to the motif based on high-confidence annotations.

  • TF_lowConf: Transcription factors annotated to the motif according to based on lower-confidence annotations.

See Also

Next step in the workflow: addSignificantGenes.

Previous step in the workflow: calcAUC.

See the package vignette for examples and more details: vignette("RcisTarget")

Examples


##################################################
# Setup & previous steps in the workflow:

#### Gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
                 "hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])

#### Databases
## Motif rankings: Select according to organism and distance around TSS
## (See the vignette for URLs to download)
# motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")

## For this example we will use a SUBSET of the ranking/motif databases:
library(RcisTarget.hg19.motifDBs.cisbpOnly.500bp)
data(hg19_500bpUpstream_motifRanking_cispbOnly)
motifRankings <- hg19_500bpUpstream_motifRanking_cispbOnly

## Motif - TF annotation:
data(motifAnnotations_hgnc_v9) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc_v9

### Run RcisTarget
# Step 1. Calculate AUC
motifs_AUC <- calcAUC(geneLists, motifRankings)

##################################################

### (This step: Step 2)
# Before starting: Setup the paralell computation
library(BiocParallel); register(MulticoreParam(workers = 2)) 
# Select significant motifs, add TF annotation & format as table
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,
                                           motifAnnot=motifAnnotation)

# Alternative: Modifying some options
motifEnrichment_wIndirect <- addMotifAnnotation(motifs_AUC, nesThreshold=2, 
        motifAnnot=motifAnnotation,
        highlightTFs = "HIF1A", 
        motifAnnot_highConfCat=c("directAnnotation"), 
        motifAnnot_lowConfCat=c("inferredBy_MotifSimilarity",
                                "inferredBy_MotifSimilarity_n_Orthology",
                                "inferredBy_Orthology"),
        digits=3)

# Getting TFs for a given TF:
motifs <- motifEnrichmentTable$motif[1:3]

getMotifAnnotation(motifs, motifAnnot=motifAnnotation)
getMotifAnnotation(motifs, motifAnnot=motifAnnotation, returnFormat="list")

### Exploring the output:
# Number of enriched motifs (Over the given NES threshold)
nrow(motifEnrichmentTable)

# Interactive exploration
motifEnrichmentTable <- addLogo(motifEnrichmentTable)
DT::datatable(motifEnrichmentTable, filter="top", escape=FALSE,
              options=list(pageLength=50))
# Note: If using the fake database, the results of this analysis are meaningless

# The object returned is a data.table (for faster computation),
# which has a diferent syntax from the standard data.frame or matrix
# Feel free to convert it to a data.frame (as.data.frame())
motifEnrichmentTable[,1:6]


##################################################
# Next step (step 3, optional):
## Not run: 
  motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
                                                     geneSets=geneLists,
                                                     rankings=motifRankings,
                                                     method="aprox")

## End(Not run)

aertslab/RcisTarget documentation built on March 7, 2024, 11:21 p.m.