WeightMatrix-class | R Documentation |
Class for storing weight matrices that VariantFiltering uses to score potential cryptic splice sites.
## S4 method for signature 'WeightMatrix'
width(x)
## S4 method for signature 'WeightMatrix'
conservedPositions(x)
## S4 method for signature 'WeightMatrix'
wmName(x)
## S4 method for signature 'WeightMatrix'
wmFilename(x)
## S4 method for signature 'WeightMatrix'
wmLocations(x)
## S4 method for signature 'WeightMatrix'
wmStrictLocations(x)
## S4 method for signature 'WeightMatrix'
show(object)
## S4 method for signature 'WeightMatrix,DNAStringSet'
wmScore(object, dnaseqs)
## S4 method for signature 'WeightMatrix,character'
wmScore(object, dnaseqs)
## S4 method for signature 'character,DNAStringSet'
wmScore(object, dnaseqs, locations, strictLocations)
## S4 method for signature 'character,character'
wmScore(object, dnaseqs, locations, strictLocations)
x |
A |
object |
A |
dnaseqs |
Either a vector of character strings a |
locations |
Character vector of the annotated locations to variants under which the weight matrix will be used for scoring binding sites.
The possible values can be obtained by typing |
strictLocations |
Logical vector flagging whether the weight matrix should be scoring binding sites strictly within the boundaries of the given locations. |
The WeightMatrix
class and associated methods serve the purpose of enabling the VariantFiltering
package
to score synonymous and intronic genetic variants for potential cryptic splice sites. The class and the methods,
however, are exposed to the end user since they could be useful for other analysis purposes.
The VariantFiltering
package contains two weight matrices, one for 5'ss and another for 3'ss, which have been built
by a statistical method that accounts for dependencies between the splice site positions, minimizing the rate of
false positive predictions. The method concretely builds these models by inclusion-driven learning of Bayesian
networks and further details can be found in the paper of Castelo and Guigo (2004).
The function readWm()
reads a weight matrix stored in a text file in a particular format and returns
a WeightMatrix
object. See the .ibn
files located in the extdata
folder of the VariantFiltering
package, as an example of this format that is specifically designed to enable the storage of weights that may
depend on the occurrence of nucleotides at other positions on the matrix.
Next to this specific weight matrix format, the function readWm()
can also read the MEME motif format specified at
http://meme-suite.org/doc/meme-format.html. Under this format, this function reads only currently one matrix per file
and when values correspond to probabilities (specified under the motif letter-probability matrix
line) they are
automatically converted to weights by either using the background frequencies specified in the background letter frequencies
line or using an uniform distribution otherwise.
The method wmScore()
scores one or more sequences of nucleotides using the input WeightMatrix
object.
When the input object is the file name of a weight matrix, the function readWm()
is called to read first that
weight matrix and internally create a WeightMatrix
object. This way to call wmScore()
is required when
using it in parallel since currently WeightMatrix
objects are not serializable.
If the sequences are longer than the width of the weight matrix, this function will score every possible site
within those sequences. It returns a list where each element is a vector with the calculated scores of the corresponding
DNA sequence. When the scores cannot be calculated
because of a conserved position that does not occur in the sequence (i.e., absence of a GT dinucleotide with the
5'ss weight matrix), it returns NA
as corresponding score value.
The method width()
takes a WeightMatrix
object as input and returns the number of positions of the
weight matrix.
The method conservedPositions()
takes a WeightMatrix
object as input and returns the number of
fully conserved positions in the weight matrix.
.
R. Castelo
Castelo, R and Guigo, R. Splice site identification by idlBNs. Bioinformatics, 20(1):i69-i76, 2004.
wm <- readWm(file.path(system.file("extdata", package="VariantFiltering"), "hsap.donors.hcmc10_15_1.ibn"),
locations="fiveSpliceSite", strictLocations=TRUE)
wm
wmFilename(wm)
width(wm)
wmName(wm)
wmLocations(wm)
wmStrictLocations(wm)
conservedPositions(wm)
wmScore(wm, "CAGGTAGGA")
wmScore(wm, "CAGGAAGGA")
wmScore(wm, "CAGGTCCTG")
wmScore(wm, "CAGGTCGTGGAG")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.