calcAUC: Calculate AUC

Description Usage Arguments Value See Also Examples

Description

Calculates the Area Under the Curve (AUC) of each gene-set for each motif ranking. This measure is used in the following steps to identify the DNA motifs that are significantly over-represented in the gene-set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
calcAUC(
  geneSets,
  rankings,
  nCores = 1,
  aucMaxRank = 0.03 * getNumColsInDB(rankings),
  verbose = TRUE
)

## S4 method for signature 'list'
calcAUC(
  geneSets,
  rankings,
  nCores = 1,
  aucMaxRank = 0.03 * getNumColsInDB(rankings),
  verbose = TRUE
)

## S4 method for signature 'character'
calcAUC(
  geneSets,
  rankings,
  nCores = 1,
  aucMaxRank = 0.03 * getNumColsInDB(rankings),
  verbose = TRUE
)

## S4 method for signature 'GeneSet'
calcAUC(
  geneSets,
  rankings,
  nCores = 1,
  aucMaxRank = 0.03 * getNumColsInDB(rankings),
  verbose = TRUE
)

## S4 method for signature 'GeneSetCollection'
calcAUC(
  geneSets,
  rankings,
  nCores = 1,
  aucMaxRank = 0.03 * getNumColsInDB(rankings),
  verbose = TRUE
)

Arguments

geneSets

List of gene-sets to analyze. The gene-sets should be provided as GeneSet, GeneSetCollection or character list (see examples).

rankings

'Motif rankings' database for the required organism and search-space (i.e. 10kbp around- or 500bp upstream the TSS). These objects are provided in separate files, which can be imported with importRankings():

See vignette("RcisTarget") for an exhaustive list of databases.

Since the normalized enrichment score (NES) of the motif depends on the total number of motifs in the database, we highly recommend to use the full version of the databases (20k motifs). A smaller version of the human databases, containing only the 4.6k motifs from cisbp, are available in Bioconductor:

  • RcisTarget.hg19.motifDBs.cisbpOnly.500bp (Human)

nCores

Number of cores to use for computation. Note: In general, using a higher number of cores (e.g. processes) decreases overall running time. However, it also deppends on the available memory and overall system load. Setting nCores too high might also decrease performance.

aucMaxRank

Threshold to calculate the AUC. In a simplified way, the AUC value represents the fraction of genes, within the top X genes in the ranking, that are included in the signature. The parameter 'aucMaxRank' allows to modify the number of genes (maximum ranking) that is used to perform this computation. By default it is set to 5% of the total number of genes in the rankings. Common values range from 1 to 10%. See vignette("RcisTarget") for examples and more details.

verbose

Should the function show progress messages? (TRUE / FALSE)

Value

aucScores of gene-sets (columns) by motifs (rows) with the value of AUC for each pair as content.

See Also

Next step in the workflow: addMotifAnnotation.

See the package vignette for examples and more details: vignette("RcisTarget")

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# RcisTarget workflow for advanced users:
# Running the workflow steps individually

## Not run: 
  
##################################################
#### Load your gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
               "hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])
  
#### Load databases
## Motif rankings: Select according to organism and distance around TSS
## (See the vignette for URLs to download)
motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")

## Motif - TF annotation:
data(motifAnnotations_hgnc) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc
##################################################

#### Run RcisTarget

# Step 1. Calculate AUC
motifs_AUC <- calcAUC(geneLists, motifRankings)

# Step 2. Select significant motifs, add TF annotation & format as table
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,
                         motifAnnot=motifAnnotation)

# Step 3 (optional). Identify genes that have the motif significantly enriched
# (i.e. genes from the gene set in the top of the ranking)
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
                                                   geneSets=geneLists,
                                                   rankings=motifRankings,
                                                   method="aprox")


## End(Not run)

RcisTarget documentation built on Nov. 8, 2020, 6:57 p.m.