Calculate FDR

Share:

Description

Calculate False Discovery Rate (FDR) of possible binding sites. This function uses two sets of scores, realSeqsScores and simSeqsScores. realSeqsScores are scores for the sequences being scanned for binding sites. simSeqsScores are scores for the simulated sequence. The simulated sequences and simSeqsScores must be made using the same Markov Model as the realSeqsScores.

Usage

1
calc.fdr(realSeqs, realSeqsScores, simSeqs, simSeqsScores, interval = 0.01)

Arguments

realSeqs

MS object containing non-simulated sequences

realSeqsScores

Feat object obtained from scoring realSeqs

simSeqs

MS object containing simulated sequences

simSeqsScores

Feat object obtained from scoring simSeqs

interval

Float specifying distance between steps at which the FDR will be calculated (lower is better). If NULL, calculate FDR for each unique score.

Value

Data.Frame with two columns 'score' and 'FDR' mapping a single score to a single FDR. Data frame is sorted by score if any exist.

Note

realSeqsScores and simSeqsScores are both objects returned by score.ms; the same arguments (threshold, conservative, strand) should be used in both calls to score.ms or FDR will not be valid.

If calc.fdr returns an fdr of zero for all scores, then you can probably increase the number of significant results by re-running score.ms with a lower threshold for both simulated and real sequences.

See Also

score.ms

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
require("rtfbs")
exampleArchive <- system.file("extdata", "NRSF.zip", package="rtfbs")
seqFile <- "input.fas"
unzip(exampleArchive, seqFile)
# Read in FASTA file "input.fas" from the examples into an 
#   MS (multiple sequences) object
ms <- read.ms(seqFile);
pwmFile <- "pwm.meme"
unzip(exampleArchive, pwmFile)
# Read in Position Weight Matrix (PWM) from MEME file from
#  the examples into a Matrix object
pwm <- read.pwm(pwmFile)
# Build a 3rd order Markov Model to represent the sequences
#   in the MS object "ms".  The Model will be a list of
#   matrices  corrisponding in size to the order of the 
#   Markov Model
mm <- build.mm(ms, 3);
# Match the PWM against the sequences provided to find
#   possible transcription factor binding sites.  A 
#   Features object is returned, containing the location
#   of each possible binding site and an associated score.
#   Sites with a negative score are not returned unless 
#   we set threshold=-Inf as a parameter.
cs <- score.ms(ms, pwm, mm, threshold=-2)
# Generate a sequence 1000 bases long using the supplied
#   Markov Model and random numbers
v <- simulate.ms(mm, 100000)
# Match the PWM against the sequences provided to find
#   possible transcription factor binding sites.  A 
#   Features object is returned, containing the location
#   of each possible binding site and an associated score.
#   Sites with a negative score are not returned unless 
#   we set threshold=-Inf as a parameter. Any identified
#   binding sites from simulated data are false positives
#   and used to calculate False Discovery Rate
xs <- score.ms(v, pwm, mm, threshold=-2)
# Calculate the False Discovery Rate for each possible
#   binding site in the Features object CS.  Return
#   a mapping between each binding site score and the
#   associated FDR.
fdr <- calc.fdr(ms, cs, v, xs)
# Print the Data.Frame containing the FDR/Score mapping
fdr