WeightMatrix-class | R Documentation |

Class for storing weight matrices that VariantFiltering uses to score potential cryptic splice sites.

```
## S4 method for signature 'WeightMatrix'
width(x)
## S4 method for signature 'WeightMatrix'
conservedPositions(x)
## S4 method for signature 'WeightMatrix'
wmName(x)
## S4 method for signature 'WeightMatrix'
wmFilename(x)
## S4 method for signature 'WeightMatrix'
wmLocations(x)
## S4 method for signature 'WeightMatrix'
wmStrictLocations(x)
## S4 method for signature 'WeightMatrix'
show(object)
## S4 method for signature 'WeightMatrix,DNAStringSet'
wmScore(object, dnaseqs)
## S4 method for signature 'WeightMatrix,character'
wmScore(object, dnaseqs)
## S4 method for signature 'character,DNAStringSet'
wmScore(object, dnaseqs, locations, strictLocations)
## S4 method for signature 'character,character'
wmScore(object, dnaseqs, locations, strictLocations)
```

`x` |
A |

`object` |
A |

`dnaseqs` |
Either a vector of character strings a |

`locations` |
Character vector of the annotated locations to variants under which the weight matrix will be used for scoring binding sites.
The possible values can be obtained by typing |

`strictLocations` |
Logical vector flagging whether the weight matrix should be scoring binding sites strictly within the boundaries of the given locations. |

The `WeightMatrix`

class and associated methods serve the purpose of enabling the `VariantFiltering`

package
to score synonymous and intronic genetic variants for potential cryptic splice sites. The class and the methods,
however, are exposed to the end user since they could be useful for other analysis purposes.

The `VariantFiltering`

package contains two weight matrices, one for 5'ss and another for 3'ss, which have been built
by a statistical method that accounts for dependencies between the splice site positions, minimizing the rate of
false positive predictions. The method concretely builds these models by inclusion-driven learning of Bayesian
networks and further details can be found in the paper of Castelo and Guigo (2004).

The function `readWm()`

reads a weight matrix stored in a text file in a particular format and returns
a `WeightMatrix`

object. See the `.ibn`

files located in the `extdata`

folder of the `VariantFiltering`

package, as an example of this format that is specifically designed to enable the storage of weights that may
depend on the occurrence of nucleotides at other positions on the matrix.

Next to this specific weight matrix format, the function `readWm()`

can also read the MEME motif format specified at
http://meme-suite.org/doc/meme-format.html. Under this format, this function reads only currently one matrix per file
and when values correspond to probabilities (specified under the motif `letter-probability matrix`

line) they are
automatically converted to weights by either using the background frequencies specified in the `background letter frequencies`

line or using an uniform distribution otherwise.

The method `wmScore()`

scores one or more sequences of nucleotides using the input `WeightMatrix`

object.
When the input object is the file name of a weight matrix, the function `readWm()`

is called to read first that
weight matrix and internally create a `WeightMatrix`

object. This way to call `wmScore()`

is required when
using it in parallel since currently `WeightMatrix`

objects are not serializable.

If the sequences are longer than the width of the weight matrix, this function will score every possible site
within those sequences. It returns a list where each element is a vector with the calculated scores of the corresponding
DNA sequence. When the scores cannot be calculated
because of a conserved position that does not occur in the sequence (i.e., absence of a GT dinucleotide with the
5'ss weight matrix), it returns `NA`

as corresponding score value.

The method `width()`

takes a `WeightMatrix`

object as input and returns the number of positions of the
weight matrix.

The method `conservedPositions()`

takes a `WeightMatrix`

object as input and returns the number of
fully conserved positions in the weight matrix.

.

R. Castelo

Castelo, R and Guigo, R. Splice site identification by idlBNs. Bioinformatics, 20(1):i69-i76, 2004.

```
wm <- readWm(file.path(system.file("extdata", package="VariantFiltering"), "hsap.donors.hcmc10_15_1.ibn"),
locations="fiveSpliceSite", strictLocations=TRUE)
wm
wmFilename(wm)
width(wm)
wmName(wm)
wmLocations(wm)
wmStrictLocations(wm)
conservedPositions(wm)
wmScore(wm, "CAGGTAGGA")
wmScore(wm, "CAGGAAGGA")
wmScore(wm, "CAGGTCCTG")
wmScore(wm, "CAGGTCGTGGAG")
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.