Threshold possible binding sites by Score or FDR

Description

Threshold the possible binding sites based on score, or False Discovery Rate (FDR). To threshold on FDR, you must have computed an FDR/Score map using calc.fdr, and chosen an FDR threshold, for which makeFdrPlot() is helpful.

Usage

1
2
output.sites(seqsScores, scoreThreshold = NULL, fdrScoreMap = NULL,
  fdrThreshold = NULL)

Arguments

seqsScores

score.ms output representing scores for candidate binding sites

scoreThreshold

A numeric value giving the lower score boundary significance threshold. Sequences with scores higher than this boundary will be selected. (Not required if thresholding by FDR.)

fdrScoreMap

calc.fdr output giving mapping between score/FDR (only required if thresholding by FDR).

fdrThreshold

A numeric value between 0 and 1 giving upper FDR boundary- any site with a lower FDR score will be output. (only required if thresholding by FDR)

Value

Features object containing thresholded Transcription Factor Binding Sites, their locations, scores, strand, etc. If thresholding by score, this is equivalent to seqsScores[seqsScores$score > scoreThreshold,].

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
require("rtfbs")
exampleArchive <- system.file("extdata", "NRSF.zip", package="rtfbs")
seqFile <- "input.fas"
unzip(exampleArchive, seqFile)
# Read in FASTA file "input.fas" from the examples into an 
#   MS (multiple sequences) object
ms <- read.ms(seqFile);
pwmFile <- "pwm.meme"
unzip(exampleArchive, pwmFile)
# Read in Position Weight Matrix (PWM) from MEME file from
#  the examples into a Matrix object
pwm <- read.pwm(pwmFile)
# Build a 3rd order Markov Model to represent the sequences
#   in the MS object "ms".  The Model will be a list of
#   matrices  corrisponding in size to the order of the 
#   Markov Model
mm <- build.mm(ms, 3);
# Match the PWM against the sequences provided to find
#   possible transcription factor binding sites.  A 
#   Features object is returned, containing the location
#   of each possible binding site and an associated score.
#   Sites with a negative score are not returned unless 
#   we set threshold=-Inf as a parameter.
cs <- score.ms(ms, pwm, mm)
# Generate a sequence 1000 bases long using the supplied
#   Markov Model and random numbers
v <- simulate.ms(mm, 10000)
# Match the PWM against the sequences provided to find
#   possible transcription factor binding sites.  A 
#   Features object is returned, containing the location
#   of each possible binding site and an associated score.
#   Sites with a negative score are not returned unless 
#   we set threshold=-Inf as a parameter. Any identified
#   binding sites from simulated data are false positives
#   and used to calculate False Discovery Rate
xs <- score.ms(v, pwm, mm)
# Calculate the False Discovery Rate for each possible
#   binding site in the Features object CS.  Return
#   a mapping between each binding site score and the
#   associated FDR.
fdr <- calc.fdr(ms, cs, v, xs)
# Output identified transcription factor binding sites 
#   below a 0.5 FDR threshold
output.sites(cs, NULL, fdr, 0.5)
# OR
# Output identified transcription factor binding sites
#   above a 5.2 score threshold 
output.sites(cs, 5.2)