| gseq.kmer | R Documentation |
Counts exact matches of a k-mer in DNA sequences over a specified region of interest
(ROI). The ROI is defined by start_pos and end_pos (1-based, inclusive),
with optional extension controlled by extend.
gseq.kmer(
seqs,
kmer,
mode = c("count", "frac"),
strand = 0L,
start_pos = NULL,
end_pos = NULL,
extend = FALSE,
skip_gaps = TRUE,
gap_chars = c("-", ".")
)
seqs |
character vector of DNA sequences (A/C/G/T/N; case-insensitive) |
kmer |
single character string containing the k-mer to search for (A/C/G/T only) |
mode |
character; one of "count" or "frac" |
strand |
integer; 1=forward, -1=reverse, 0=both strands (default: 0) |
start_pos |
integer or NULL; 1-based inclusive start of ROI (default: 1) |
end_pos |
integer or NULL; 1-based inclusive end of ROI (default: sequence length) |
extend |
logical or integer; extension of allowed window starts (default: FALSE) |
skip_gaps |
logical; if TRUE, treat gap characters as holes and skip them while scanning. Windows are k consecutive non-gap bases (default: TRUE) |
gap_chars |
character vector; which characters count as gaps (default: c("-", ".")) |
This function counts k-mer occurrences in DNA sequences directly without requiring
a genomics database. For detailed documentation on k-mer counting parameters, see
gvtrack.create (functions "kmer.count" and "kmer.frac").
The ROI (region of interest) is defined by start_pos and end_pos.
The extend parameter controls whether k-mer matches can extend beyond the ROI boundaries.
For palindromic k-mers, use strand=1 or -1 to avoid double counting.
When skip_gaps=TRUE, characters specified in gap_chars are treated as gaps.
Windows are defined as k consecutive non-gap bases. The frac denominator counts the
number of possible logical starts (non-gap windows) in the region. start_pos and
end_pos are interpreted as physical coordinates on the full sequence.
Numeric vector with counts (for "count" mode) or fractions (for "frac" mode). Returns 0 when sequence is too short or ROI is invalid.
gvtrack.create for detailed k-mer parameter documentation
## Not run:
# Example sequences
seqs <- c("CGCGCGCGCG", "ATATATATAT", "ACGTACGTACGT")
# Count CG dinucleotides on both strands
gseq.kmer(seqs, "CG", mode = "count", strand = 0)
# Count on forward strand only
gseq.kmer(seqs, "CG", mode = "count", strand = 1)
# Get CG fraction
gseq.kmer(seqs, "CG", mode = "frac", strand = 0)
# Count in a specific region
gseq.kmer(seqs, "CG", mode = "count", start_pos = 2, end_pos = 8)
# Allow k-mer to extend beyond ROI boundaries
gseq.kmer(seqs, "CG", mode = "count", start_pos = 2, end_pos = 8, extend = TRUE)
# Calculate GC content by summing G and C fractions
g_frac <- gseq.kmer(seqs, "G", mode = "frac", strand = 1)
c_frac <- gseq.kmer(seqs, "C", mode = "frac", strand = 1)
gc_content <- g_frac + c_frac
gc_content
# Compare AT counts on different strands
at_forward <- gseq.kmer(seqs, "AT", mode = "count", strand = 1)
at_reverse <- gseq.kmer(seqs, "AT", mode = "count", strand = -1)
at_both <- gseq.kmer(seqs, "AT", mode = "count", strand = 0)
data.frame(forward = at_forward, reverse = at_reverse, both = at_both)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.