count_multimers: Count k-mers of various types for a given collection of...

Description Usage Arguments Details Value See Also Examples

View source: R/count_multimers.R

Description

This is a wrapper over count_kmers function in order to enable the computation of many types of k-mers in a single invocation of the function.

A user can input multiple k-mer configurations in the following way. Each parameter that is related to the configuration (i.e., k_vector, positional_vector, and kmer_gaps_list) is represented in a sequential form (i.e., a list or a vector). The i-th entry of each sequence corresponds to the i-th configuration.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
count_multimers(
  sequences,
  k_vector,
  kmer_alphabet = getOption("seqR_kmer_alphabet_default"),
  positional_vector = rep(getOption("seqR_positional_default"), length(k_vector)),
  kmer_gaps_list = rep(list(c()), length(k_vector)),
  with_kmer_counts = getOption("seqR_with_kmer_counts_default"),
  with_kmer_names = getOption("seqR_with_kmer_names_default"),
  batch_size = getOption("seqR_batch_size_default"),
  hash_dim = getOption("seqR_hash_dim_default"),
  verbose = getOption("seqR_verbose_default")
)

Arguments

sequences

input sequences of one of two supported types, either string vector or list of string vectors

k_vector

an integer vector that represents the lengths of k-mers. The i-th element corresponds to the value of k for the i-th k-mer configuration.

kmer_alphabet

a string vector representing the elements that should be used during the construction of k-mers. By default, all elements that are present in sequences are taking into account

positional_vector

a logical vector that consists of k-mer configurations related to the positional part. The i-th element corresponds to the i-th k-mer configuration (i.e., whether the k-mer is positional or not)

kmer_gaps_list

a list of integer vectors that represents the lengths of k-mer gaps for each configuration separately. The i-th element of the list corresponds to the lengths of gaps of the i-th k-mer configuration

with_kmer_counts

a single logical value that determines whether the result should contain k-mer frequencies

with_kmer_names

a single logical value that determines whether the result should contain human-readable k-mer names

batch_size

a single integer value that represents the number of sequences that are being processed in a single step

hash_dim

a single integer value (1 <= hash_dim <= 500) representing the length of a hash vector that is internally used in the algorithm

verbose

a single logical value that denotes whether a user wants to get extra information on the current state of computations

Details

The comprehensive description of supported features is available in vignette("features-overview", package = "seqR").

Value

a Matrix value that represents a result k-mer matrix. The result is a sparse matrix in order to reduce memory consumption. The i-th row of the matrix represents k-mers found in the i-th input sequence. Each column represents a distinct k-mer. The names of columns conform to human-readable schema for k-mers, if parameter with_kmer_names = TRUE

See Also

Function that count k-mers of one type: count_kmers

Function that merges several k-mer matrices (rbind): rbind_columnwise

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
batch_size <- 1

# Counting 1-mers
count_multimers(
   c("AAAACFVV", "AAAAAA", "AAAAD"),
   k_vector = c(1),
   batch_size=batch_size)

# Counting 1-mers and 2-mers
count_multimers(
    c("AAAACFVV", "AAAAAA", "AAAAD"),
    k_vector = c(1, 2),
    batch_size = batch_size)

# Counting 1-mers, 2-mers, and gapped 2-mers with the length of the gap = 1
count_multimers(
   c("AAAACFVV", "AAAAAA", "AAAAD"),
   k_vector = c(1, 2, 2),
   kmer_gaps = list(NULL, NULL, c(1)),
   batch_size=batch_size)

# Counting 3-mers, positional 3-mers, and positional gapped 2-mers with the length of the gap = 1
count_multimers(
   c("AAAACFVV", "AAAAAA", "AAAAD"),
   k_vector = c(3, 3, 2),
   kmer_gaps_list = list(NULL, NULL, c(1)),
   positional_vector = c(FALSE, TRUE, TRUE),
   batch_size=batch_size)

seqR documentation built on Oct. 6, 2021, 1:10 a.m.