utils-sequence: Sequence-related utility functions.

utils-sequenceR Documentation

Sequence-related utility functions.

Description

Sequence-related utility functions.

Usage

calc_complexity(string, complexity.method = c("WoottonFederhen",
  "WoottonFederhenFast", "Trifonov", "TrifonovFast", "DUST"), alph = NULL,
  trifonov.max.word.size = 7)

calc_windows(n, window = 1, overlap = 0, return.incomp = TRUE)

count_klets(string, k = 1, alph)

get_klets(lets, k = 1)

mask_ranges(seqs, ranges, letter = "-")

mask_seqs(seqs, pattern, RC = FALSE, letter = "-")

meme_alph(core, file = stdout(), complements = NULL, ambiguity = NULL,
  like = NULL, alph.name = NULL, letter.names = NULL, colours = NULL)

shuffle_string(string, k = 1, method = c("euler", "linear", "markov"),
  rng.seed = sample.int(10000, 1))

slide_fun(string, FUN, FUN.VALUE, window = 1, overlap = 0,
  return.incomp = TRUE)

window_string(string, window = 1, overlap = 0, return.incomp = TRUE,
  nthreads = 1)

Arguments

string

character(1) A character vector containing a single string, with the exception of calc_complexity() where string can be a length greater than one.

complexity.method

character(1) Complexity algorithm. See sequence_complexity().

alph

character(1) A single character string with the desired sequence alphabet. If missing, finds the unique letters within each string.

trifonov.max.word.size

integer(1) Maximum word size for use in the Trifonov complexity methods. See sequence_complexity().

n

integer(1) Total size from which to calculate sliding windows.

window

integer(1) Window size to slide along.

overlap

integer(1) Overlap size between windows.

return.incomp

logical(1) Whether to return the last window if it is smaller then the requested window size.

k

integer(1) K-let size.

lets

character A character vector where each element will be considered a single unit.

seqs

XStringSet Sequences to mask. Cannot be BStringSet.

ranges

GRanges The ranges to mask. Must be a GRanges object from the GenomicRanges package.

letter

character(1) Character to use for masking.

pattern

character(1) Pattern to mask.

RC

logical(1) Whether to mask the reverse complement of the pattern.

core

character(1) Core alphabet symbols. If complements are also provided, then only half of the letters should be provided to this argument.

file

Output file.

complements

character(1), NULL Complementary letters to the core symbols.

ambiguity

character(1), NULL A named vector providing ambiguity codes for the custom alphabet.

like

character(1), NULL How to classify the custom alphabet. If not NULL, then one of c("DNA", "RNA", "PROTEIN").

alph.name

character(1), NULL Custom alphabet name.

letter.names

character, NULL Named vector of core symbol names.

colours

character, NULL Named vector of core symbol colours. MEME requires hex colours.

method

character(1) Shuffling method. One of c("euler", "linear", "markov"). See shuffle_sequences().

rng.seed

numeric(1) Set random number generator seed. Since shuffling in shuffle_sequences() can occur simultaneously in multiple threads using C++, it cannot communicate with the regular R random number generator state and thus requires an independent seed. Since shuffle_string() uses the same underlying code as shuffle_sequences(), it also requires a separate seed even if it is run in serial.

FUN

closure The function to apply per window. (See ?vapply.)

FUN.VALUE

The expected return type for FUN. (See ?vapply.)

nthreads

integer(1) Number of threads to use. Zero uses all available threads.

Value

For calc_complexity(): A vector of numeric values.

For calc_windows(): A data.frame with columns start and stop.

For count_klets(): A data.frame with columns lets and counts.

For get_klets(): A character vector of k-lets.

For mask_ranges(): The masked XStringSet object.

For mask_seqs(): The masked XStringSet object.

For meme_alph(): NULL, invisibly.

For shuffle_string(): A single character string.

For slide_fun(): A vector with type FUN.VALUE.

For window_string(): A character vector.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

See Also

create_sequences(), get_bkg(), sequence_complexity(), shuffle_sequences()

Examples

#######################################################################
## calc_complexity
## Calculate complexity for abitrary strings
calc_complexity("GTGCCCCGCGGGAACCCCGC", c = "WoottonFederhen")
calc_complexity("GTGCCCCGCGGGAACCCCGC", c = "WoottonFederhenFast")
calc_complexity("GTGCCCCGCGGGAACCCCGC", c = "Trifonov")
calc_complexity("GTGCCCCGCGGGAACCCCGC", c = "TrifonovFast")
calc_complexity("GTGCCCCGCGGGAACCCCGC", c = "DUST")

#######################################################################
## calc_windows
## Calculate window coordinates for any value 'n'.
calc_windows(100, 10, 5)

#######################################################################
## count_klets
## Count k-lets for any string of characters
count_klets("GCAAATGTACGCAGGGCCGA", k = 2)
## The default 'k' value (1) counts individual letters
count_klets("GCAAATGTACGCAGGGCCGA")

#######################################################################
## get_klets
## Generate all possible k-lets for a set of characters
get_klets(c("A", "C", "G", "T"), 3)
## Note that each element in 'lets' is considered a single unit;
## see:
get_klets(c("AA", "B"), k = 2)

#######################################################################
## mask_ranges
## Mask arbitrary ranges
if (requireNamespace("GenomicRanges", quiet = TRUE)) {
ranges <- GenomicRanges::GRanges("A", IRanges::IRanges(1, 5))
seq <- Biostrings::DNAStringSet(c(A = "ATGACTGATTACTTATA"))
mask_ranges(seq, ranges, "-")
}

#######################################################################
## mask_seqs
## Mask repetitive seqeuences
data(ArabidopsisPromoters)
mask_seqs(ArabidopsisPromoters, "AAAAAA")

#######################################################################
## meme_alph
## Create MEME custom alphabet definition files
meme_alph("ACm", complements = "TGM", alph.name = "MethDNA",
  letter.names = c(A = "Adenine", C = "Cytosine", G = "Guanine",
    T = "Thymine", m = "Methylcytosine", M = "mC:Guanine"),
  like = "DNA", ambiguity = c(N = "ACGTmM"))

#######################################################################
## shuffle_string
## Shuffle any string of characters
shuffle_string("ASDADASDASDASD", k = 1)

#######################################################################
## slide_fun
## Apply a function to a character vector along sliding windows
FUN <- function(x) grepl("[GC]", x)
data.frame(
  Window = window_string("ATGCATCTATGCA", 2, 1),
  HasGC = slide_fun("ATGCATCTATGCA", FUN, logical(1), 2, 1)
)

#######################################################################
## window_string
## Get sliding windows for a string of characters
window_string("ABCDEFGHIJ", 2, 1)


bjmt/universalmotif documentation built on Nov. 16, 2024, 7:38 a.m.