kmers: Generates genome kmers

View source: R/RcppExports.R

kmersR Documentation

Generates genome kmers

Description

Generates genome kmers

Usage

kmers(
  x,
  k = 3L,
  simplify = FALSE,
  canonical = TRUE,
  squeeze = FALSE,
  anchor = TRUE,
  clean_up = TRUE,
  key_as_int = FALSE,
  starting_index = 1L
)

Arguments

x

genome in string format

k

kmer length

simplify

returns a numeric vector of kmer counts, without associated string. This is useful to save memory, but should always be used with anchor = true.

canonical

only record canonical kmers (i.e., the lexicographically smaller of a kmer and its reverse complement)

squeeze

remove non-canonical kmers

anchor

includes unobserved kmers (with counts of 0). This is useful when generating a dense matrix where kmers of different genomes align.

clean_up

only include valid bases (ACTG) in kmer counts (excludes non-coding results such as N)

key_as_int

return kmer index (as "kmer_index") rather than the full kmer string. Useful for index-coded data structures such as libsvm.

starting_index

the starting index, only used if key_as_int = TRUE.

Value

list of kmer values, either as a list of a single vector (if simplify = TRUE), or as a named list containing "kmer_string" and "kmer_value".

Examples

kmers("ATCGCAGT")

MIC documentation built on April 12, 2025, 2:26 a.m.