motif_pvalue: Motif P-value and scoring utility
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

motif_pvalue

R Documentation

Motif P-value and scoring utility

Description

For calculating P-values and logodds scores from P-values for any number of motifs.

Usage

motif_pvalue(motifs, score, pvalue, bkg.probs, use.freq = 1, k = 8,
  nthreads = 1, rand.tries = 10, rng.seed = sample.int(10000, 1),
  allow.nonfinite = FALSE, method = c("dynamic", "exhaustive"))

Arguments

`motifs`	See `convert_motifs()` for acceptable motif formats.
`score`	`numeric`, `list` Get a P-value for a motif from a logodds score. See details for an explanation of how to vectorize the calculation for `method = "dynamic"`.
`pvalue`	`numeric`, `list` Get a logodds score for a motif from a P-value. See details for an explanation of how to vectorize the calculation for `method = "dynamic"`.
`bkg.probs`	`numeric`, `list` A vector background probabilities. If supplying individual background probabilities for each motif, a list of such vectors. If missing, retrieves the background from the motif `bkg` slot. Note that this option is only used when `method = "dynamic"`, or when `method = "exhaustive"` and providing a P-value and returning a score; for the inverse, the motifs are first converted to PWMs via `convert_type()`, which uses the motif `bkg` slot for background adjustment.
`use.freq`	`numeric(1)` By default uses the regular motif matrix; otherwise uses the corresponding `multifreq` matrix. Max is 3 when `method = "exhaustive"`.
`k`	`numeric(1)` For speed, scores/P-values can be approximated after subsetting the motif every `k` columns when `method = "exhaustive"`. If `k` is a value equal or higher to the size of input motif(s), then the calculations are exact. The default, 8, is recommended to those looking for a good tradeoff between speed and accuracy for jobs requiring repeated calculations. Note that this is ignored when `method = "dynamic"`, as subsetting is not required.
`nthreads`	`numeric(1)` Run `motif_pvalue()` in parallel with `nthreads` threads. `nthreads = 0` uses all available threads. Currently only applied when `method = "exhaustive"`.
`rand.tries`	`numeric(1)` When `ncol(motif) < k` and `method = "exhaustive"`, an approximation is used. This involves randomly approximating the overall motif score distribution. To increase accuracy, the distribution is approximated `rand.tries` times and the final scores averaged. Note that this is ignored when `method = "dynamic"`, as subsetting is not required.
`rng.seed`	`numeric(1)` In order to allow `motif_pvalue()` to perform C++ level parallelisation, it must work independently from R. This means it cannot communicate with R to get/set the R RNG state. To get around this, the RNG seed used by the C++ function can be set with `rng.seed`. To make sure each thread gets a different seed however, the seed is multiplied with the iteration count. For example: when working with two motifs, the second motif gets the following seed: `rng.seed * 2`. The default is to pick a random number as chosen by `sample()`, which effectively makes `motif_pvalue()` dependent on the R RNG state. Note that this is ignored when `method = "dynamic"`, as the random subsetting is only used for `method = "exhaustive"`.
`allow.nonfinite`	`logical(1)` If `FALSE`, then apply a pseudocount if non-finite values are found in the PWM. Note that if the motif has a pseudocount greater than zero and the motif is not currently of type PWM, then this parameter has no effect as the pseudocount will be applied automatically when the motif is converted to a PWM internally. Note that this option is incompatible with `method = "dynamic"`. A message will be printed if a pseudocount is applied. To disable this, set `options(pseudocount.warning=FALSE)`.
`method`	`character(1)` One of `c("dynamic", "exhaustive")`. Algorithm used for calculating P-values. The `"exhaustive"` method involves finding all possible motif matches at or above the specified score using a branch-and-bound algorithm, which can be computationally intensive (Hartman et al., 2013). Additionally, the computation must be repeated for each hit. The `"dynamic"` method calculates the distribution of possible motif scores using a much faster dynamic programming algorithm, and can be recycled for multiple scores (Grant et al., 2011). The only disadvantage is the inability to use `allow.nonfinite = TRUE`.

Details

Regarding vectorization

A note regarding vectorizing the calculation when method = "dynamic" (no vectorization is possible with method = "exhaustive"): to avoid performing the P-value/score calculation repeatedly for individual motifs, provide the score/pvalue arguments as a list, with each entry corresponding to the scores/P-values to be calculated for the respective motifs provided to motifs. If you simply provide a list of repeating motifs and a single numeric vector of corresponding input scores/P-values, then motif_pvalue() will not vectorize. See the Examples section.

The dynamic method

One of the algorithms available to motif_pvalue() to calculate scores or P-values is the dynamic programming algorithm used by FIMO (Grant et al., 2011). In this method, a small range of possible scores from the possible miminum and maximum is created and the cumulative probability of each score in this distribution is incrementally calculated using the logodds scores and the background probabilities. This distribution of scores and associated P-values can be used to calculate P-values or scores for any input, any number of times. This method scales well with large motifs, and multifreq representations. The only downside is that it is incompatible with allow.nonfinite = TRUE, as this would not allow for the creation of the initial range of scores. Although described for a different purpose, the basic premise of the dynamic programming algorithm is also described in Gupta et al. (2007).

The exhaustive method

Calculating P-values exhaustively for motifs can be very computationally intensive. This is due to how P-values must be calculated: for a given score, all possible sequences which score equal or higher must be found, and the probability for each of these sequences (based on background probabilities) summed. For a DNA motif of length 10, the number of possible unique sequences is 4^10 = 1,048,576. Finding all possible sequences higher than a given score can be done very efficiently and quickly with a branch-and-bound algorithm, but as the motif length increases even this calculation becomes impractical. To get around this, the P-value calculation can be approximated.

In order to calculate P-values for longer motifs, this function uses the approximation proposed by Hartmann et al. (2013), where the motif is subset, P-values calculated for the subsets, and finally combined for a total P-value. The smaller the size of the subsets, the faster the calculation; but also, the bigger the approximation. This can be controlled by setting k. In fact, for smaller motifs (< 13 positions) calculating exact P-values can be done individually in reasonable time by setting k = 12.

To calculate a score from a P-value, all possible scores are calculated and the (1 - pvalue) * 100 nth percentile score returned. When k < ncol(motif), the complete set of scores is instead approximated by randomly adding up all possible scores from each subset. Note that this approximation can actually be potentially quite expensive at times and even slower than the exact version; for jobs requiring lots of repeat calculations, a bit of benchmarking beforehand can be useful to find the optimal settings.

Please note that bugs are more likely to occur when using the exhaustive method, as the algorithm contains several times more code compared to the dynamic method. Unless you have a strong need to use allow.nonfinite = TRUE then avoid using this method.

Value

numeric, list A vector or list of vectors of scores/P-values.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

References

Grant CE, Bailey TL, Noble WS (2011). "FIMO: scanning for occurrences of a given motif." Bioinformatics, 27, 1017-1018.

Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS (2007). "Quantifying similarity between motifs." Genome Biology, 8, R24.

Hartmann H, Guthohrlein EW, Siebert M, Soding SLJ (2013). “P-value-based regulatory motif discovery using positional weight matrices.” Genome Research, 23, 181-194.

Examples

if (R.Version()$arch != "i386") {

## P-value/score calculations are performed using the PWM version of the
## motif
data(examplemotif)

## Get a minimum score based on a P-value
motif_pvalue(examplemotif, pvalue = 0.001)

## Get the probability of a particular sequence hit
motif_pvalue(examplemotif, score = 0)

## The calculations can be performed for multiple motifs
motif_pvalue(c(examplemotif, examplemotif), pvalue = c(0.001, 0.0001))

## Compare score thresholds and P-value:
scores <- motif_score(examplemotif, c(0.6, 0.7, 0.8, 0.9))
motif_pvalue(examplemotif, scores)

## Calculate the probability of getting a certain match or better:
TATATAT <- score_match(examplemotif, "TATATAT")
TATATAG <- score_match(examplemotif, "TATATAG")
motif_pvalue(examplemotif, TATATAT)
motif_pvalue(examplemotif, TATATAG)

## Get all possible matches by P-value:
get_matches(examplemotif, motif_pvalue(examplemotif, pvalue = 0.0001))

## Vectorize the calculation for multiple motifs and scores/P-values:
m <- create_motif()
motif_pvalue(c(examplemotif, m), list(1:5, 2:3))
## The non-vectorized equivalent:
motif_pvalue(
  c(rep(list(examplemotif), 5), rep(list(m), 2)), c(1:5, 2:3)
)
}

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.

bjmt/universalmotif index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bjmt/universalmotif
Import, Modify, and Export Motifs with R

motif_pvalue: Motif P-value and scoring utility
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Motif P-value and scoring utility

Description

Usage

Arguments

Details

Regarding vectorization

The dynamic method

The exhaustive method

Value

Author(s)

References

See Also

Examples

Related to motif_pvalue in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif Import, Modify, and Export Motifs with R

motif_pvalue: Motif P-value and scoring utility In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Motif P-value and scoring utility

Description

Usage

Arguments

Details

Regarding vectorization

The dynamic method

The exhaustive method

Value

Author(s)

References

See Also

Examples

Related to motif_pvalue in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif
Import, Modify, and Export Motifs with R

motif_pvalue: Motif P-value and scoring utility
In bjmt/universalmotif: Import, Modify, and Export Motifs with R