PWM creating, matching, and related utilities
Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PWM(x, type = c("log2probratio", "prob"), prior.params = c(A=0.25, C=0.25, G=0.25, T=0.25)) matchPWM(pwm, subject, min.score="80%", with.score=FALSE, ...) countPWM(pwm, subject, min.score="80%", ...) PWMscoreStartingAt(pwm, subject, starting.at=1) ## Utility functions for basic manipulation of the Position Weight Matrix maxWeights(x) minWeights(x) maxScore(x) minScore(x) unitScale(x) ## S4 method for signature 'matrix' reverseComplement(x, ...)
The type of Position Weight Matrix, either "log2probratio" or "prob". See Details section for more information.
A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information.
A Position Weight Matrix represented as a numeric matrix with row names A, C, G and T.
Typically a DNAString object. A Views object on a DNAString subject, a MaskedDNAString object, or a single character string, are also supported.
IUPAC ambiguity letters in
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.
An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject.
Additional arguments for methods.
PWM function uses a multinomial model with a Dirichlet conjugate
prior to calculate the estimated probability of base b at position i. As
mentioned in the Arguments section,
prior.params supplies the
parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These
values result in a position independent initial estimate of the probabilities
for the bases to be
priorProbs = prior.params/sum(prior.params) and the
posterior (data infused) estimate for the probabilities for the bases in each
of the positions to be
postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params)).
type = "log2probratio", the PWM =
type = "prob", the PWM =
A numeric matrix representing the Position Weight Matrix for
A numeric vector containing the Position Weight Matrix-based scores
An XStringViews object for
A single integer for
A vector containing the max weight for each position in
A vector containing the min weight for each position in
The highest possible score for a given Position Weight Matrix for
The lowest possible score for a given Position Weight Matrix for
The modified numeric matrix given by
(x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x)) for
A PWM obtained by reverting the column order in PWM
x and by
reassigning each row to its complementary nucleotide
H. Pagès and P. Aboyoun
Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):276-87.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## Data setup: data(HNF4alpha) library(BSgenome.Dmelanogaster.UCSC.dm3) chr3R <- Dmelanogaster$chr3R chr3R ## Create a PWM from a PFM or directly from a rectangular ## DNAStringSet object: pfm <- consensusMatrix(HNF4alpha) pwm <- PWM(pfm) # same as 'PWM(HNF4alpha)' ## Perform some general routines on the PWM: round(pwm, 2) maxWeights(pwm) maxScore(pwm) reverseComplement(pwm) ## Score the first 5 positions: PWMscoreStartingAt(pwm, chr3R, starting.at=1:5) ## Match the plus strand: hits <- matchPWM(pwm, chr3R) nhit <- countPWM(pwm, chr3R) # same as 'length(hits)' ## Use 'with.score=TRUE' to get the scores of the hits: hits <- matchPWM(pwm, chr3R, with.score=TRUE) head(mcols(hits)$score) min(mcols(hits)$score / maxScore(pwm)) # should be >= 0.8 ## The scores can also easily be post-calculated: scores <- PWMscoreStartingAt(pwm, subject(hits), start(hits)) ## Match the minus strand: matchPWM(reverseComplement(pwm), chr3R)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.