generate_kmers: _k_-mer Counts for Sequence Set

View source: R/k-mer-based.R

generate_kmersR Documentation

k-mer Counts for Sequence Set

Description

Counts occurrences of k-mers of length k in the given set of sequences. Corrects for homopolymeric stretches.

Usage

generate_kmers(sequences, k)

Arguments

sequences

character vector of DNA or RNA sequences

k

length of k-mer, either 6 for hexamers or 7 for heptamers

Value

Returns a named numeric vector, where the elements are k-mer counts and the names are DNA k-mers.

Warning

generate_kmers always returns DNA k-mers, even if sequences contains RNA sequences. RNA sequences are internally converted to DNA sequences. It is not allowed to mix DNA and RNA sequences.

See Also

Other k-mer functions: calculate_kmer_enrichment(), check_kmers(), compute_kmer_enrichment(), count_homopolymer_corrected_kmers(), create_kmer_origin_list(), draw_volcano_plot(), estimate_significance(), estimate_significance_core(), generate_permuted_enrichments(), run_kmer_spma(), run_kmer_tsma()

Examples

# count hexamers in set of RNA sequences
rna_sequences <- c(
  "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU",
  "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
  "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU",
  "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
  "AUAGAC", "AGUUC", "CCAGUAA",
  "UUAUUUA", "AUCCUUUACA", "UUUUUUU", "UUUCAUCAUU",
  "CCACACAC", "CUCAUUGGAG", "ACUUUGGGACA", "CAGGUCAGCA"
)
hexamer_counts <- generate_kmers(rna_sequences, 6)


# count heptamers in set of DNA sequences
dna_sequences <- c(
  "CAACAGCCTTAATT", "CAGTCAAGACTCC", "CTTTGGGGAAT",
  "TCATTTTATTAAA", "AATTGGTGTCTGGATACTTCCCTGTACAT",
  "ATCAAATTA", "AGAT", "GACACTTAAAGATCCT",
  "TAGCATTAACTTAATG", "ATGGA", "GAAGAGTGCTCA",
  "ATAGAC", "AGTTC", "CCAGTAA",
  "TTATTTA", "ATCCTTTACA", "TTTTTTT", "TTTCATCATT",
  "CCACACAC", "CTCATTGGAG", "ACTTTGGGACA", "CAGGTCAGCA"
)
hexamer_counts <- generate_kmers(dna_sequences, 7)

kkrismer/transite documentation built on July 13, 2024, 8:01 a.m.