convert_type: Convert universalmotif type.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

convert_type

R Documentation

Convert universalmotif type.

Description

Switch between position count matrix (PCM), position probability matrix (PPM), position weight matrix (PWM), and information count matrix (ICM) types. See the "Introduction to sequence motifs" vignette for details. Please also note that type conversion occurs implicitly throughout the universalmotif package, so there is generally no need to perform this manual conversion. Also please be aware that the message concerning pseudocount-adjusting motifs can be disabled via options(pseudocount.warning=FALSE).

Usage

convert_type(motifs, type, pseudocount, nsize_correction = FALSE,
  relative_entropy = FALSE)

Arguments

`motifs`	See `convert_motifs()` for acceptable formats.
`type`	`character(1)` One of `c('PCM', 'PPM', 'PWM', 'ICM')`.
`pseudocount`	`numeric(1)` Correction to be applied to prevent `-Inf` from appearing in PWM matrices. If missing, the pseudocount stored in the universalmotif 'pseudocount' slot will be used.
`nsize_correction`	`logical(1)` If true, the ICM at each position will be corrected to account for small sample sizes. Only used if `relative_entropy = FALSE`.
`relative_entropy`	`logical(1)` If true, the ICM will be calculated as relative entropy. See details.

Details

PCM

Position count matrix (PCM), also known as position frequency matrix (PFM). For n sequences from which the motif was built, each position is represented by the numbers of each letter at that position. In theory all positions should have sums equal to n, but not all databases are this consistent. If converting from another type to PCM, column sums will be equal to the 'nsites' slot. If empty, 100 is used.

PPM

Position probability matrix (PPM), also known as position frequency matrix (PFM). At each position, the probability of individual letters is calculated by dividing the count for that letter by the total sum of counts at that position (letter_count / position_total). As a result, each position will sum to 1. Letters with counts of 0 will thus have a probability of 0, which can be undesirable when searching for motifs in a set of sequences. To avoid this a pseudocount can be added ((letter_count + pseudocount) / (position_total + pseudocount)).

PWM

Position weight matrix (PWM; Stormo et al. (1982)), also known as position-specific weight matrix (PSWM), position-specific scoring matrix (PSSM), or log-odds matrix. At each position, each letter is represented by it's log-likelihood (log2(letter_probability / background_probility)), which is normalized using the background letter frequencies. A PWM matrix is constructed from a PPM. If any position has 0-probability letters to which pseudocounts were not added, then the final log-likelihood of these letters will be -Inf.

ICM

Information content matrix (ICM; Schneider and Stephens 1990). An ICM is a PPM where each letter probability is multiplied by the total information content at that position. The information content of each position is determined as: totalIC - Hi, where the total information totalIC

totalIC <- log2(alphabet_length), and the Shannon entropy (Shannon 1948) for a specific position (Hi)

⁠Hi <- -sum(sapply(alphabet_frequencies, function(x) x * log(2))⁠.

As a result, the total sum or height of each position is representative of it's sequence conservation, measured in the unit 'bits', which is a unit of energy (Schneider 1991; see https://fr-s-schneider.ncifcrf.gov/logorecommendations.html for more information). However not all programs will calculate information content the same. Some will 'correct' the total information content at each position using a correction factor as described by Schneider et al. (1986). This correction can applied by setting nsize_correction = TRUE, however it will only be applied if the 'nsites' slot is not empty. This is done using TFBSTools:::schneider_correction (Tan and Lenhard 2016). As such, converting from an ICM to which some form of correction has been applied will result in a PCM/PPM/PWM with slight inaccuracies.

Another method of calculating information content is calculating the relative entropy, also known as Kullback-Leibler divergence (Kullback and Leibler 1951). This accounts for background frequencies, which can be useful for genomes with a heavy imbalance in letter frequencies. For each position, the individual letter frequencies are calculated as letter_freq * log2(letter_freq / bkg_freq). When calculating information content using Shannon entropy, the maximum content for each position will always be log2(alphabet_length). This does not hold for information content calculated as relative entropy. Please note that conversion from ICM assumes the information content was not calculated as relative entropy.

Value

See convert_motifs() for possible output motif objects.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

References

Kullback S, Leibler RA (1951). “On information and sufficiency.” The Annals of Mathematical Statistics, 22, 79-86.

Nishida K, Frith MC, Nakai K (2009). “Pseudocounts for transcription factor binding sites.” Nucleic Acids Research, 37, 939-944.

Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986). “Information content of binding sites on nucleotide sequences.” Journal of Molecular Biology, 188, 415-431.

Schneider TD, Stephens RM (1990). “Sequence Logos: A New Way to Display Consensus Sequences.” Nucleic Acids Research, 18, 6097-6100.

Schneider TD (1991). “Theory of Molecular Machines. II. Energy Dissipation from Molecular Machines.” Journal of Theoretical Biology, 148, 125-137.

Shannon CE (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27, 379-423.

Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982). “Use of the Perceptron algorithm to distinguish translational initiation sites in E. coli.” Nucleic Acids Research, 10, 2997-3011.

Tan G, Lenhard B (2016). “TFBSTools: an R/Bioconductor package for transcription factor binding site analysis.” Bioinformatics, 32, 1555-1556. doi: 10.1093/bioinformatics/btw024.

Examples

jaspar.pcm <- read_jaspar(system.file("extdata", "jaspar.txt",
                                      package = "universalmotif"))

## The motifs pseudocounts are 1: these will be used in the PCM->PPM
## calculation
jaspar.pwm <- convert_type(jaspar.pcm, type = "PPM")

## Setting pseudocount to 0 will prevent any correction from being
## applied to PPM/PWM matrices, overriding the motifs own pseudocounts
jaspar.pwm <- convert_type(jaspar.pcm, type = "PWM", pseudocount = 0)

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.

bjmt/universalmotif index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bjmt/universalmotif
Import, Modify, and Export Motifs with R

convert_type: Convert universalmotif type.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Convert universalmotif type.

Description

Usage

Arguments

Details

PCM

PPM

PWM

ICM

Value

Author(s)

References

See Also

Examples

Related to convert_type in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif Import, Modify, and Export Motifs with R

convert_type: Convert universalmotif type. In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Convert universalmotif type.

Description

Usage

Arguments

Details

PCM

PPM

PWM

ICM

Value

Author(s)

References

See Also

Examples

Related to convert_type in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif
Import, Modify, and Export Motifs with R

convert_type: Convert universalmotif type.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R