scan_sequences: Scan sequences for matches to input motifs.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

scan_sequences

R Documentation

Scan sequences for matches to input motifs.

Description

For sequences of any alphabet, scan them using the PWM matrices of a set of input motifs.

Usage

scan_sequences(motifs, sequences, threshold = 1e-04,
  threshold.type = c("pvalue", "qvalue", "logodds", "logodds.abs"),
  RC = FALSE, use.freq = 1, verbose = 0, nthreads = 1,
  motif_pvalue.k = 8, use.gaps = TRUE, allow.nonfinite = FALSE,
  warn.NA = TRUE, calc.pvals = TRUE, return.granges = FALSE,
  no.overlaps = FALSE, no.overlaps.by.strand = FALSE,
  no.overlaps.strat = c("score", "order"), respect.strand = FALSE,
  motif_pvalue.method = c("dynamic", "exhaustive"),
  calc.qvals = calc.pvals, calc.qvals.method = c("fdr", "BH",
  "bonferroni"))

Arguments

`motifs`	See `convert_motifs()` for acceptable motif formats.
`sequences`	`XStringSet` Sequences to scan. Alphabet should match motif.
`threshold`	`numeric(1)` See details.
`threshold.type`	`character(1)` One of `c('pvalue', 'qvalue', 'logodds', 'logodds.abs')`. See details.
`RC`	`logical(1)` If `TRUE`, check reverse complement of the input sequences. Only available for DNA/RNA.
`use.freq`	`numeric(1)` The default, 1, uses the motif matrix (from the `motif['motif']` slot) to search for sequences. If a higher number is used, then the matching k-let matrix from the `motif['multifreq']` slot is used. See `add_multifreq()`.
`verbose`	`numeric(1)` Describe progress, from none (`0`) to verbose (`3`).
`nthreads`	`numeric(1)` Run `scan_sequences()` in parallel with `nthreads` threads. `nthreads = 0` uses all available threads. Note that no speed up will occur for jobs with only a single motif and sequence.
`motif_pvalue.k`	`numeric(1)` Control `motif_pvalue()` approximation. See `motif_pvalue()`. Only used if `motif_pvalue.method = "exhaustive"`.
`use.gaps`	`logical(1)` Set this to `FALSE` to ignore motif gaps, if present.
`allow.nonfinite`	`logical(1)` If `FALSE`, then apply a pseudocount if non-finite values are found in the PWM. Note that if the motif has a pseudocount greater than zero and the motif is not currently of type PWM, then this parameter has no effect as the pseudocount will be applied automatically when the motif is converted to a PWM internally. This value is set to `FALSE` by default in order to stay consistent with pre-version 1.8.0 behaviour. Also note that this parameter is not compatible with `motif_pvalue.method = "dynamic"`. A message will be printed if a pseudocount is applied. To disable this, set `options(pseudocount.warning=FALSE)`.
`warn.NA`	`logical(1)` Whether to warn about the presence of non-standard letters in the input sequence, such as those in masked sequences.
`calc.pvals`	`logical(1)` Calculate P-values for each hit. This is a convenience option which simply gives `motif_pvalue()` the input motifs and the scores of each hit. Be careful about setting this to `TRUE` if you anticipate getting thousands of hits and are using `motif_pvalue.method = "exhaustive"`: expect to wait a few seconds or minutes for the calculations to finish. Increasing the `nthreads` value can help greatly here. See Details for more information on P-value calculation. If `motif_pvalue.method = "dynamic"`, then this is usually not an issue.
`return.granges`	`logical(1)` Return the results as a `GRanges` object. Requires the `GenomicRanges` package to be installed.
`no.overlaps`	`logical(1)` Remove overlapping hits from the same motifs. Overlapping hits from different motifs are preserved. Please note that the current implementation of this feature can add significantly to the run time for large inputs.
`no.overlaps.by.strand`	`logical(1)` Whether to discard overlapping hits from the opposite strand (`TRUE`), or to only discard overlapping hits on the same strand (`FALSE`).
`no.overlaps.strat`	`character(1)` One of `c("score", "order")`. The former option keeps the highest scoring overlapping hit (and the first of these within ties), and the latter simply keeps the first overlapping hit.
`respect.strand`	`logical(1)` If motifs are DNA/RNA, then setting this option to `TRUE` will make `scan_sequences()` only scan the strands of the input sequences as indicated in the motif `strand` slot.
`motif_pvalue.method`	`character(1)` One of `c("dynamic", "exhaustive")`. Algorithm used for calculating P-values. The `"exhaustive"` method involves finding all possible motif matches at or above the specified score using a branch-and-bound algorithm, which can be computationally intensive (Hartman et al., 2013). Additionally, the computation must be repeated for each hit. The `"dynamic"` method calculates the distribution of possible motif scores using a much faster dynamic programming algorithm, and can be recycled for multiple scores (Grant et al., 2011). The only disadvantage is the inability to use `allow.nonfinite = TRUE`. See `motif_pvalue()` for details.
`calc.qvals`	`logical(1)` Whether to also calculate adjusted P-values. Only valid if `calc.pvals = TRUE`.
`calc.qvals.method`	`character(1)` One of `c("fdr", "BH", "bonferroni")`. The method for calculating adjusted P-values. These are described in depth in the Sequence Searches vignette. Also see Noble (2009).

Details

Logodds scoring

Similar to Biostrings::matchPWM(), the scanning method uses logodds scoring. (To see the scoring matrix for any motif, simply run convert_type(motif, "PWM"). For a multifreq scoring matrix: apply(motif["multifreq"][["2"]], 2, ppm_to_pwm)). In order to score a sequence, at each position within a sequence of length equal to the length of the motif, the scores for each base are summed. If the score sum is above the desired threshold, it is kept.

Thresholds

If threshold.type = 'logodds', then the threshold value is multiplied by the maximum possible motif scores. To calculate the maximum possible scores a motif (of type PWM) manually, run motif_score(motif, 1). If threshold.type = 'pvalue', then threshold logodds scores are generated using motif_pvalue(). Finally, if threshold.type = 'logodds.abs', then the exact values provided will be used as thresholds. Finally, if threshold.type = 'qvalue', then the threshold is calculated as if threshold.type = 'pvalue' and the final set of hits are filtered based on their calculated Q-value. (Note: this means that the thresh.score column will be incorrect!) This is done since most Q-values cannot be calculated prior to scanning. If you are running a very large job, it may be wise to use a P-value threshold followed by manually filtering by Q-value; this will avoid the scanning have to parse the larger number of hits from the internally-lowered threshold.

Non-standard letters

Non-standard letters (such as "N", "+", "-", ".", etc in DNAString objects) will be safely ignored, resulting only in a warning and a very minor performance cost. This can used to scan masked sequences. See Biostrings::mask() for masking sequences (generating MaskedXString objects), and Biostrings::injectHardMask() to recover masked XStringSet objects for use with scan_sequences(). There is also a provided wrapper function which performs both steps: mask_seqs().

Value

DataFrame, GRanges with each row representing one hit. If the input sequences are DNAStringSet or RNAStringSet, then an additional column with the strand is included. Function args are stored in the metadata slot. If return.granges = TRUE then a GRanges object is returned.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

References

Grant CE, Bailey TL, Noble WS (2011). "FIMO: scanning for occurrences of a given motif." Bioinformatics, 27, 1017-1018.

Hartmann H, Guthohrlein EW, Siebert M, Soding SLJ (2013). “P-value-based regulatory motif discovery using positional weight matrices.” Genome Research, 23, 181-194.

Noble WS (2009). "How does multiple testing work?" Nature Biotechnology, 27, 1135-1137.

Examples

## any alphabet can be used
## Not run: 
set.seed(1)
alphabet <- paste(c(letters), collapse = "")
motif <- create_motif("hello", alphabet = alphabet)
sequences <- create_sequences(alphabet, seqnum = 1000, seqlen = 100000)
scan_sequences(motif, sequences)

## End(Not run)

## Sequence masking:
if (R.Version()$arch != "i386") {
library(Biostrings)
data(ArabidopsisMotif)
data(ArabidopsisPromoters)
seq <- mask_seqs(ArabidopsisPromoters, "AAAAA")
scan_sequences(ArabidopsisMotif, seq)
# A warning regarding the presence of non-standard letters will be given,
# but can be safely ignored in this case.
}

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.

bjmt/universalmotif index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bjmt/universalmotif
Import, Modify, and Export Motifs with R

scan_sequences: Scan sequences for matches to input motifs.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Scan sequences for matches to input motifs.

Description

Usage

Arguments

Details

Logodds scoring

Thresholds

Non-standard letters

Value

Author(s)

References

See Also

Examples

Related to scan_sequences in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif Import, Modify, and Export Motifs with R

scan_sequences: Scan sequences for matches to input motifs. In bjmt/universalmotif: Import, Modify, and Export Motifs with R

Scan sequences for matches to input motifs.

Description

Usage

Arguments

Details

Logodds scoring

Thresholds

Non-standard letters

Value

Author(s)

References

See Also

Examples

Related to scan_sequences in bjmt/universalmotif...

R Package Documentation

Browse R Packages

We want your feedback!

bjmt/universalmotif
Import, Modify, and Export Motifs with R

scan_sequences: Scan sequences for matches to input motifs.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R