calcAUC | R Documentation |
Calculates the Area Under the Curve (AUC) of each gene-set for each motif ranking. This measure is used in the following steps to identify the DNA motifs that are significantly over-represented in the gene-set.
calcAUC(
geneSets,
rankings,
nCores = 1,
aucMaxRank = 0.03 * getNumColsInDB(rankings),
verbose = TRUE
)
## S4 method for signature 'list'
calcAUC(
geneSets,
rankings,
nCores = 1,
aucMaxRank = 0.03 * getNumColsInDB(rankings),
verbose = TRUE
)
## S4 method for signature 'character'
calcAUC(
geneSets,
rankings,
nCores = 1,
aucMaxRank = 0.03 * getNumColsInDB(rankings),
verbose = TRUE
)
## S4 method for signature 'GeneSet'
calcAUC(
geneSets,
rankings,
nCores = 1,
aucMaxRank = 0.03 * getNumColsInDB(rankings),
verbose = TRUE
)
## S4 method for signature 'GeneSetCollection'
calcAUC(
geneSets,
rankings,
nCores = 1,
aucMaxRank = 0.03 * getNumColsInDB(rankings),
verbose = TRUE
)
geneSets |
List of gene-sets to analyze.
The gene-sets should be provided as |
rankings |
'Motif rankings' database for the required organism and
search-space (i.e. 10kbp around- or 500bp upstream the TSS).
These objects are provided in separate files,
which can be imported with
See Since the normalized enrichment score (NES) of the motif depends on the total number of motifs in the database, we highly recommend to use the full version of the databases (20k motifs). A smaller version of the human databases, containing only the 4.6k motifs from cisbp, are available in Bioconductor:
|
nCores |
Number of cores to use for computation. Note: In general, using a higher number of cores (e.g. processes) decreases overall running time. However, it also deppends on the available memory and overall system load. Setting nCores too high might also decrease performance. |
aucMaxRank |
Threshold to calculate the AUC.
In a simplified way, the AUC value represents the fraction of genes,
within the top X genes in the ranking, that are included in the signature.
The parameter 'aucMaxRank' allows to modify the number of genes
(maximum ranking) that is used to perform this computation.
By default it is set to 5% of the total number of genes in the rankings.
Common values range from 1 to 10%.
See |
verbose |
Should the function show progress messages? (TRUE / FALSE) |
aucScores
of gene-sets (columns) by motifs (rows)
with the value of AUC for each pair as content.
Next step in the workflow: addMotifAnnotation
.
See the package vignette for examples and more details:
vignette("RcisTarget")
# RcisTarget workflow for advanced users:
# Running the workflow steps individually
## Not run:
##################################################
#### Load your gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
"hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])
#### Load databases
## Motif rankings: Select according to organism and distance around TSS
## (See the vignette for URLs to download)
motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")
## Motif - TF annotation:
data(motifAnnotations_hgnc_v9) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc_v9
##################################################
#### Run RcisTarget
# Step 1. Calculate AUC
motifs_AUC <- calcAUC(geneLists, motifRankings)
# Step 2. Select significant motifs, add TF annotation & format as table
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,
motifAnnot=motifAnnotation)
# Step 3 (optional). Identify genes that have the motif significantly enriched
# (i.e. genes from the gene set in the top of the ranking)
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
geneSets=geneLists,
rankings=motifRankings,
method="aprox")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.