count_multigrams: Detect and count multiple n-grams in sequences
In biogram: N-Gram Analysis of Biological Sequences

Description Usage Arguments Details Value Examples

A convinient wrapper around count_ngrams for counting multiple values of n and d.

count_multigrams(
  ns,
  ds = rep(0, length(ns)),
  seq,
  u,
  pos = FALSE,
  scale = FALSE,
  threshold = 0
)

`ns`	`numeric` vector of n-grams' sizes. See Details.
`ds`	`list` of distances between elements of n-grams. Each element of the list is a vector used as distance for the respective n-gram size given by the `ns` parameter.
`seq`	a vector or matrix describing sequence(s).
`u`	`integer`, `numeric` or `character` vector of all possible unigrams.
`pos`	`logical`, if `TRUE` position-specific n_grams are counted.
`scale`	`logical`, if `TRUE` output data is normalized. May be applied only to the counts of n-grams without position information. See `Details`.
`threshold`	`integer`, if not equal to 0, data is binarized into two groups (larger or equal to threshold vs. smaller than threshold).

ns vector and ds vector must have equal length. Elements of ds vector are used as equivalents of d parameter for respective values of ns. For example, if ns is c(4, 4, 4), the ds must be a list of length 3. Each element of the ds list must have length 3 or 1, as appropriate for a d parameter in count_ngrams function.

An integer matrix with named columns. The naming conventions are the same as in count_ngrams.

seqs <- matrix(sample(1L:4, 600, replace = TRUE), ncol = 50)
count_multigrams(c(3, 1), list(c(1, 0), 0), seqs, 1L:4, pos = TRUE)
# if ds parameter is not present, n-grams are calculated for distance 0
count_multigrams(c(3, 1), seq = seqs, u = 1L:4)

# calculate three times n-gram with the same length, but different distances between
# elements
count_multigrams(c(4, 4, 4), list(c(2, 0, 1), c(2, 1, 0), c(0, 1, 2)), 
                 seqs, 1L:4, pos = TRUE)