merge_similar: Identify and merge similar motifs within a collection of...
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

merge_similar

R Documentation

Identify and merge similar motifs within a collection of motifs (or simply cluster motifs).

Description

Given a list of motifs, merge_similar() will identify similar motifs with compare_motifs(), and merge similar ones with merge_motifs().

Usage

merge_similar(motifs, threshold = 0.95, threshold.type = "score.abs",
  method = "PCC", use.type = "PPM", min.overlap = 6, min.mean.ic = 0,
  tryRC = TRUE, relative_entropy = FALSE, normalise.scores = FALSE,
  min.position.ic = 0, score.strat.compare = "a.mean",
  score.strat.merge = "sum", nthreads = 1, return.clusters = FALSE)

Arguments

`motifs`	See `convert_motifs()` for acceptable motif formats.
`threshold`	`numeric(1)` The minimum (for similarity metrics) or maximum (for distance metrics) threshold score for merging.
`threshold.type`	`character(1)` Type of score used for thresholding. Currently unused.
`method`	`character(1)` One of PCC, EUCL, SW, KL, BHAT, HELL, SEUCL, MAN, WEUCL, WPCC. See `compare_motifs()`. (The ALLR and ALLR_LL methods cannot be used for distance matrix construction.)
`use.type`	`character(1)` One of `'PPM'` and `'ICM'`. The latter allows for taking into account the background frequencies if `relative_entropy = TRUE`. Note that `'ICM'` is not allowed when `method = c("ALLR", "ALLR_LL")`.
`min.overlap`	`numeric(1)` Minimum overlap required when aligning the motifs. Setting this to a number higher then the width of the motifs will not allow any overhangs. Can also be a number between 0 and 1, representing the minimum fraction that the motifs must overlap.
`min.mean.ic`	`numeric(1)` Minimum mean information content between the two motifs for an alignment to be scored. This helps prevent scoring alignments between low information content regions of two motifs. Note that this can result in some comparisons failing if no alignment passes the mean IC threshold. Use `average_ic()` to filter out low IC motifs to get around this if you want to avoid getting `NA`s in your output.
`tryRC`	`logical(1)` Try the reverse complement of the motifs as well, report the best score.
`relative_entropy`	`logical(1)` Change the ICM calculation affecting `min.position.ic` and `min.mean.ic`. See `convert_type()`.
`normalise.scores`	`logical(1)` Favour alignments which leave fewer unaligned positions, as well as alignments between motifs of similar length. Similarity scores are multiplied by the ratio of aligned positions to the total number of positions in the larger motif, and the inverse for distance scores.
`min.position.ic`	`numeric(1)` Minimum information content required between individual alignment positions for it to be counted in the final alignment score. It is recommended to use this together with `normalise.scores = TRUE`, as this will help punish scores resulting from only a fraction of an alignment.
`score.strat.compare`	`character(1)` The `score.strat` parameter used by `compare_motifs()`. For clustering purposes, the `"sum"` option cannot be used.
`score.strat.merge`	`character(1)` The `score.strat` parameter used by `merge_motifs()`. As discussed in `merge_motifs()`, the `"sum"` option is recommended over `"a.mean"` to maximize the overlap between motifs.
`nthreads`	`numeric(1)` Run `compare_motifs()` in parallel with `nthreads` threads. `nthreads = 0` uses all available threads.
`return.clusters`	`logical(1)` Return the clusters instead of merging.

Details

See compare_motifs() for more info on comparison parameters, and merge_motifs() for more info on motif merging.

Value

See convert_motifs() for available output formats.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

Examples

## Not run: 
library(MotifDb)
motifs <- filter_motifs(MotifDb, family = "bHLH")[1:50]
length(motifs)
motifs <- merge_similar(motifs)
length(motifs)

## End(Not run)

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.