PWM creating, matching, and related utilities
Description
Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  PWM(x, type = c("log2probratio", "prob"),
prior.params = c(A=0.25, C=0.25, G=0.25, T=0.25))
matchPWM(pwm, subject, min.score="80%", with.score=FALSE, ...)
countPWM(pwm, subject, min.score="80%", ...)
PWMscoreStartingAt(pwm, subject, starting.at=1)
## Utility functions for basic manipulation of the Position Weight Matrix
maxWeights(x)
minWeights(x)
maxScore(x)
minScore(x)
unitScale(x)
## S4 method for signature 'matrix'
reverseComplement(x, ...)

Arguments
x 
For For 
type 
The type of Position Weight Matrix, either "log2probratio" or "prob". See Details section for more information. 
prior.params 
A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information. 
pwm 
A Position Weight Matrix represented as a numeric matrix with row names A, C, G and T. 
subject 
Typically a DNAString object. A Views object on a DNAString subject, a MaskedDNAString object, or a single character string, are also supported. IUPAC ambiguity letters in 
min.score 
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.

with.score 

starting.at 
An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject. 
... 
Additional arguments for methods. 
Details
The PWM
function uses a multinomial model with a Dirichlet conjugate
prior to calculate the estimated probability of base b at position i. As
mentioned in the Arguments section, prior.params
supplies the
parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These
values result in a position independent initial estimate of the probabilities
for the bases to be
priorProbs = prior.params/sum(prior.params)
and the
posterior (data infused) estimate for the probabilities for the bases in each
of the positions to be
postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params))
.
When type = "log2probratio"
, the PWM = unitScale(log2(postProbs/priorProbs))
.
When type = "prob"
, the PWM = unitScale(postProbs)
.
Value
A numeric matrix representing the Position Weight Matrix for PWM
.
A numeric vector containing the Position Weight Matrixbased scores
for PWMscoreStartingAt
.
An XStringViews object for matchPWM
.
A single integer for countPWM
.
A vector containing the max weight for each position in pwm
for maxWeights
.
A vector containing the min weight for each position in pwm
for minWeights
.
The highest possible score for a given Position Weight Matrix for
maxScore
.
The lowest possible score for a given Position Weight Matrix for
minScore
.
The modified numeric matrix given by
(x  minScore(x)/ncol(x))/(maxScore(x)  minScore(x))
for
unitScale
.
A PWM obtained by reverting the column order in PWM x
and by
reassigning each row to its complementary nucleotide
for reverseComplement
.
Author(s)
H. Pagès and P. Aboyoun
References
Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):27687.
See Also
consensusMatrix
,
matchPattern
,
reverseComplement
,
DNAStringclass,
XStringViewsclass
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  ## Data setup:
data(HNF4alpha)
library(BSgenome.Dmelanogaster.UCSC.dm3)
chr3R < Dmelanogaster$chr3R
chr3R
## Create a PWM from a PFM or directly from a rectangular
## DNAStringSet object:
pfm < consensusMatrix(HNF4alpha)
pwm < PWM(pfm) # same as 'PWM(HNF4alpha)'
## Perform some general routines on the PWM:
round(pwm, 2)
maxWeights(pwm)
maxScore(pwm)
reverseComplement(pwm)
## Score the first 5 positions:
PWMscoreStartingAt(pwm, chr3R, starting.at=1:5)
## Match the plus strand:
hits < matchPWM(pwm, chr3R)
nhit < countPWM(pwm, chr3R) # same as 'length(hits)'
## Use 'with.score=TRUE' to get the scores of the hits:
hits < matchPWM(pwm, chr3R, with.score=TRUE)
head(mcols(hits)$score)
min(mcols(hits)$score / maxScore(pwm)) # should be >= 0.8
## The scores can also easily be postcalculated:
scores < PWMscoreStartingAt(pwm, subject(hits), start(hits))
## Match the minus strand:
matchPWM(reverseComplement(pwm), chr3R)
