addSignificantGenes: Add significant genes

Description Usage Arguments Details Value See Also Examples

Description

Identify which genes (of the gene-set) are highly ranked for each motif.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
addSignificantGenes(
  resultsTable,
  geneSets,
  rankings,
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = "geneList",
  method = "aprox",
  nMean = 50,
  nCores = 1
)

## S4 method for signature 'list'
addSignificantGenes(
  resultsTable,
  geneSets,
  rankings,
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = "geneList",
  method = "aprox",
  nMean = 50,
  nCores = 1
)

## S4 method for signature 'character'
addSignificantGenes(
  resultsTable,
  geneSets,
  rankings,
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = "geneList",
  method = "aprox",
  nMean = 50,
  nCores = 1
)

## S4 method for signature 'GeneSet'
addSignificantGenes(
  resultsTable,
  geneSets,
  rankings,
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = "geneList",
  method = "aprox",
  nMean = 50,
  nCores = 1
)

## S4 method for signature 'GeneSetCollection'
addSignificantGenes(
  resultsTable,
  geneSets,
  rankings,
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = "geneList",
  method = "aprox",
  nMean = 50,
  nCores = 1
)

getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

## S4 method for signature 'list'
getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

## S4 method for signature 'character'
getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

## S4 method for signature 'factor'
getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

## S4 method for signature 'GeneSet'
getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

## S4 method for signature 'GeneSetCollection'
getSignificantGenes(
  geneSet,
  rankings,
  signifRankingNames = NULL,
  method = "iCisTarget",
  maxRank = 5000,
  plotCurve = FALSE,
  genesFormat = c("geneList", "incidMatrix"),
  nCores = 1,
  digits = 3,
  nMean = 50
)

Arguments

resultsTable

[addSignificantGenes] Output table from addMotifAnnotation

geneSets

[addSignificantGenes] List of gene-sets which was analyzed.

rankings

Motif rankings used to analyze the gene list (They should be the same as used for calcAUC in this same analysis).

maxRank

Maximum rank to take into account for the recovery curve (Default: 5000).

plotCurve

Logical. Wether to plot the recovery curve (Default: FALSE).

genesFormat

"geneList" or "incidMatrix". Format to return the genes (Default: "geneList").

method

"iCisTarget" or "aprox". There are two methods to identify the highly ranked genes: (1) equivalent to the ones used in iRegulon and i-cisTarget (method="iCisTarget", recommended if running time is not an issue), and (2) a faster implementation based on an approximate distribution using the average at each rank (method="aprox", useful to scan multiple gene sets). (Default: "aprox")

nMean

Only used for "aprox" method: Interval to calculate the running mean and sd. Default: 50 (aprox. nGenesInRanking/400).

nCores

Number of cores to use for parallelization (Default: 1).

geneSet

[getSignificantGenes] Gene-set to analyze (Only one).

signifRankingNames

[getSignificantGenes] Motif ranking name.

digits

[getSignificantGenes] Number of digits to include in the output.

Details

The highly ranked genes are selected based on the distribution of the recovery curves of the gene set across all the motifs in the database. In the plot, the red line indicates the average of the recovery curves of all the motifs, the green line the average + standard deviation, and the blue line the recovery curve of the current motif. The point of maximum distance between the current motif and the green curve (mean+sd), is the rank selected as maximum enrichment. All the genes with lower rank will be considered enriched.

Depending on whether the method is "iCisTarget" or "aprox", the mean and SD at each rank are calculated slightly different. "iCisTarget" method calculates the recovery curves for all the motifs, and then calculates the average and SD at each rank. Due to the implementation of the function in R, this method is slower than just subsetting the ranks of the genes in for each motif, and calculating the average of the available ones at each position with a sliding window. Since there are over 18k motifs, the chances of getting several measures at each rank are very high and highly resemble the results calculated by iCisTarget, though they are often not exactly the same (hence the name: "aprox" method).

Value

Output from addMotifAnnotation adding the folowing columns:

If plotCurve=TRUE, the recovery curve is plotted.

See Also

Previous step in the workflow: addMotifAnnotation.

See the package vignette for examples and more details: vignette("RcisTarget")

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
##################################################
# Setup & previous steps in the workflow:

#### Gene sets
# As example, the package includes an Hypoxia gene set:
txtFile <- paste(file.path(system.file('examples', package='RcisTarget')),
                 "hypoxiaGeneSet.txt", sep="/")
geneLists <- list(hypoxia=read.table(txtFile, stringsAsFactors=FALSE)[,1])

#### Databases
## Motif rankings: Select according to organism and distance around TSS
## (See the vignette for URLs to download)
# motifRankings <- importRankings("hg19-500bp-upstream-7species.mc9nr.feather")

## For this example we will use a SUBSET of the ranking/motif databases:
library(RcisTarget.hg19.motifDBs.cisbpOnly.500bp)
data(hg19_500bpUpstream_motifRanking_cispbOnly)
motifRankings <- hg19_500bpUpstream_motifRanking_cispbOnly

## Motif - TF annotation:
data(motifAnnotations_hgnc) # human TFs (for motif collection 9)
motifAnnotation <- motifAnnotations_hgnc

### Run RcisTarget
# Step 1. Calculate AUC
motifs_AUC <- calcAUC(geneLists, motifRankings)
# Step 2. Select significant motifs, add TF annotation & format as table
motifEnrichmentTable <- addMotifAnnotation(motifs_AUC,
           motifAnnot=motifAnnotation)

##################################################

##################################################
# (This step: Step 3)
# Identify the genes that have the motif significantly enriched
# (i.e. genes from the gene set in the top of the ranking)
par(mfrow=c(1,2))
motifEnrichmentTable_wGenes <- addSignificantGenes(motifEnrichmentTable,
                                       genesFormat="geneList",
                                       plotCurve=TRUE,
                                       geneSets=geneLists,
                                       rankings=motifRankings,
                                       method="aprox")

#### Exploring the output:
# The object returned is a data.table
# Feel free to convert it to a data.frame:
motifEnrichmentTable_wGenes <- as.data.frame(motifEnrichmentTable_wGenes)

# Enriched genes
enrGenes <- motifEnrichmentTable_wGenes[1,"enrichedGenes"]
enrGenes
strsplit(enrGenes, ";")

# As incidence matrix
motifEnr_wIncidMat <- addSignificantGenes(motifEnrichmentTable,
                geneSets=geneLists, rankings=motifRankings,
                method="aprox",
                genesFormat = "incidMatrix")

motifEnr_wIncidMat <- as.data.frame(motifEnr_wIncidMat)
which(colnames(motifEnr_wIncidMat) == "rankAtMax")

incidMat <- motifEnr_wIncidMat[,8:ncol(motifEnr_wIncidMat)]
rownames(incidMat) <- motifEnr_wIncidMat[,"motif"]
incidMat <- incidMat[, colSums(incidMat)>0, drop=FALSE]

# Plot as network
par(mfrow=c(1,1))
library(igraph)
plot(graph.incidence(incidMat))

###############################################################
# Alternative method: getSignificantGenes()
selectedMotif <- rownames(incidMat)
onlyGenes <- getSignificantGenes(geneSet=geneLists$hypoxia,
                            signifRankingNames=selectedMotif,
                            genesFormat="incidMatrix",
                            plotCurve=TRUE,
                            rankings=motifRankings,
                            method="aprox")

aertslab/RcisTarget documentation built on April 21, 2021, 9:46 a.m.