WeightMatrix-class: Weight matrix class
In rcastelo/VariantFiltering: Filtering of coding and non-coding genetic variants

WeightMatrix-class

R Documentation

Weight matrix class

Description

Class for storing weight matrices that VariantFiltering uses to score potential cryptic splice sites.

Usage

## S4 method for signature 'WeightMatrix'
width(x)
## S4 method for signature 'WeightMatrix'
conservedPositions(x)
## S4 method for signature 'WeightMatrix'
wmName(x)
## S4 method for signature 'WeightMatrix'
wmFilename(x)
## S4 method for signature 'WeightMatrix'
wmLocations(x)
## S4 method for signature 'WeightMatrix'
wmStrictLocations(x)
## S4 method for signature 'WeightMatrix'
show(object)
## S4 method for signature 'WeightMatrix,DNAStringSet'
wmScore(object, dnaseqs)
## S4 method for signature 'WeightMatrix,character'
wmScore(object, dnaseqs)
## S4 method for signature 'character,DNAStringSet'
wmScore(object, dnaseqs, locations, strictLocations)
## S4 method for signature 'character,character'
wmScore(object, dnaseqs, locations, strictLocations)

Arguments

`x`	A `WeightMatrix` object.
`object`	A `WeightMatrix` object or the file name of a weight matrix.
`dnaseqs`	Either a vector of character strings a `DNAStringSet` object, both of which store nucleotide sequences to be scored using the input `WeightMatrix` object.
`locations`	Character vector of the annotated locations to variants under which the weight matrix will be used for scoring binding sites. The possible values can be obtained by typing `variantLocations()`.
`strictLocations`	Logical vector flagging whether the weight matrix should be scoring binding sites strictly within the boundaries of the given locations.

Details

The WeightMatrix class and associated methods serve the purpose of enabling the VariantFiltering package to score synonymous and intronic genetic variants for potential cryptic splice sites. The class and the methods, however, are exposed to the end user since they could be useful for other analysis purposes.

The VariantFiltering package contains two weight matrices, one for 5'ss and another for 3'ss, which have been built by a statistical method that accounts for dependencies between the splice site positions, minimizing the rate of false positive predictions. The method concretely builds these models by inclusion-driven learning of Bayesian networks and further details can be found in the paper of Castelo and Guigo (2004).

The function readWm() reads a weight matrix stored in a text file in a particular format and returns a WeightMatrix object. See the .ibn files located in the extdata folder of the VariantFiltering package, as an example of this format that is specifically designed to enable the storage of weights that may depend on the occurrence of nucleotides at other positions on the matrix.

Next to this specific weight matrix format, the function readWm() can also read the MEME motif format specified at http://meme-suite.org/doc/meme-format.html. Under this format, this function reads only currently one matrix per file and when values correspond to probabilities (specified under the motif letter-probability matrix line) they are automatically converted to weights by either using the background frequencies specified in the background letter frequencies line or using an uniform distribution otherwise.

The method wmScore() scores one or more sequences of nucleotides using the input WeightMatrix object. When the input object is the file name of a weight matrix, the function readWm() is called to read first that weight matrix and internally create a WeightMatrix object. This way to call wmScore() is required when using it in parallel since currently WeightMatrix objects are not serializable.

If the sequences are longer than the width of the weight matrix, this function will score every possible site within those sequences. It returns a list where each element is a vector with the calculated scores of the corresponding DNA sequence. When the scores cannot be calculated because of a conserved position that does not occur in the sequence (i.e., absence of a GT dinucleotide with the 5'ss weight matrix), it returns NA as corresponding score value.

The method width() takes a WeightMatrix object as input and returns the number of positions of the weight matrix.

The method conservedPositions() takes a WeightMatrix object as input and returns the number of fully conserved positions in the weight matrix.

Value

Author(s)

R. Castelo

References

Castelo, R and Guigo, R. Splice site identification by idlBNs. Bioinformatics, 20(1):i69-i76, 2004.

Examples

wm <- readWm(file.path(system.file("extdata", package="VariantFiltering"), "hsap.donors.hcmc10_15_1.ibn"),
             locations="fiveSpliceSite", strictLocations=TRUE)
wm
wmFilename(wm)
width(wm)
wmName(wm)
wmLocations(wm)
wmStrictLocations(wm)
conservedPositions(wm)
wmScore(wm, "CAGGTAGGA")
wmScore(wm, "CAGGAAGGA")
wmScore(wm, "CAGGTCCTG")
wmScore(wm, "CAGGTCGTGGAG")

rcastelo/VariantFiltering documentation built on July 5, 2025, 5:38 a.m.