genome_to_libsvm: Converts a genome to kmers stored in libsvm format on disk

View source: R/RcppExports.R

genome_to_libsvmR Documentation

Converts a genome to kmers stored in libsvm format on disk

Description

This function converts a single genome to a libsvm file containing kmer counts. The libsvm format will be as follows:

  label 1:count 2:count 3:count ...

Label is optional and defaults to 0. The kmer counts are indexed by the kmer index, which is the lexicographically sorted index of the kmer. Libsvm is a sparse format.

Usage

genome_to_libsvm(
  x,
  target_path,
  label = as.character(c("0")),
  k = 3L,
  canonical = TRUE,
  squeeze = FALSE
)

Arguments

x

genome in string format

target_path

path to store libsvm file (.txt)

label

libsvm label

k

kmer length

canonical

only record canonical kmers (i.e., the lexicographically smaller of a kmer and its reverse complement)

squeeze

remove non-canonical kmers

Value

boolean indicating success

See Also

For multiple genomes in a directory, processed in parallel, see genomes_to_kmer_libsvm()

For more details on libsvm format, see https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html

Examples

temp_libsvm_path <- tempfile(fileext = ".txt")
genome_to_libsvm("ATCGCAGT", temp_libsvm_path)
readLines(temp_libsvm_path)

MIC documentation built on April 12, 2025, 2:26 a.m.