Description Usage Arguments Details Value See Also Examples
This is an in-memory, probabilistic, highly-optimized, and multi-threaded implementation of k-mer counting algorithm.
The function supports
several types of k-mers (contiguous, gapped, and positional variants)
all biological sequences (in particular, nucleic acids and proteins)
two common in-memory representations of sequences, i.e., string vectors and list of string vectors
Moreover, several extra features are provided
(for more information see details
'):
configurable k-mer alphabet (i.e., which elements of a sequence should be considered during the k-mer counting procedure)
verbose mode
configurable batch size (i.e., how many sequences are processed in a single step)
configurable dimension of the hash value of a k-mer
possibility to compute k-mers with or without their frequencies
possibility to compute a result k-mer matrix with or without human-readable k-mer (column) names
1 2 3 4 5 6 7 8 9 10 11 12 | count_kmers(
sequences,
k = length(kmer_gaps) + 1,
kmer_alphabet = getOption("seqR_kmer_alphabet_default"),
positional = getOption("seqR_positional_default"),
kmer_gaps = c(),
with_kmer_counts = getOption("seqR_with_kmer_counts_default"),
with_kmer_names = getOption("seqR_with_kmer_names_default"),
batch_size = getOption("seqR_batch_size_default"),
hash_dim = getOption("seqR_hash_dim_default"),
verbose = getOption("seqR_verbose_default")
)
|
sequences |
input sequences of one of two supported types,
either |
k |
an |
kmer_alphabet |
a |
positional |
a single |
kmer_gaps |
an |
with_kmer_counts |
a single |
with_kmer_names |
a single |
batch_size |
a single |
hash_dim |
a single |
verbose |
a single |
The comprehensive description of supported features is available
in vignette("features-overview", package = "seqR")
.
a Matrix
value that represents a result k-mer matrix.
The result is a sparse matrix in order to reduce memory consumption.
The i-th row of the matrix represents k-mers found in the i-th input sequence.
Each column represents a distinct k-mer.
The names of columns conform to human-readable schema for k-mers,
if parameter with_kmer_names = TRUE
Function that counts many k-mer variants in the single invocation: count_multimers
Function that merges several k-mer matrices (rbind): rbind_columnwise
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | batch_size <- 1
# Counting 1-mers af two DNA sequences
count_kmers(
c("ACAT", "ACC"),
batch_size=batch_size)
# Counting 2-mers of two DNA sequences
count_kmers(
c("ACAT", "ACC"),
k=2,
batch_size=batch_size)
# Counting positional 2-mers of two DNA sequences
count_kmers(
c("ACAT", "ACC"),
k=2,
positional=TRUE,
batch_size=batch_size)
# Counting positional 2-mers of two DNA sequences (second representation)
count_kmers(
list(c("A", "C", "A", "T"), c("A", "C", "C")),
k=2,
positional=TRUE,
batch_size=batch_size)
# Counting 3-mers of two DNA sequences, considering only A and C elements
count_kmers(
c("ACAT", "ACC"),
k=2,
kmer_alphabet=c("A", "C"),
batch_size=batch_size)
# Counting gapped 3-mers with lengths of gaps 1 and 2
count_kmers(
c("ACATACTAT", "ACCCCCC"),
kmer_gaps=c(1,2),
batch_size=batch_size)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.