get_start_ind | R Documentation |
Helper function for data generators. Computes start positions in sequence where samples can be extracted, given maxlen, step size and ambiguous nucleotide constraints.
get_start_ind(
seq_vector,
length_vector,
maxlen,
step,
train_mode = "label",
discard_amb_nuc = FALSE,
vocabulary = c("A", "C", "G", "T")
)
seq_vector |
Vector of character sequences. |
length_vector |
Length of sequences in |
maxlen |
Length of one predictor sequence. |
step |
Distance between samples from one entry in |
train_mode |
Either |
discard_amb_nuc |
Whether to discard all samples that contain characters outside vocabulary. |
vocabulary |
Vector of allowed characters. Characters outside vocabulary get encoded as specified in |
A numeric vector.
seq_vector <- c("AAACCCNNNGGGTTT")
get_start_ind(
seq_vector = seq_vector,
length_vector = nchar(seq_vector),
maxlen = 4,
step = 2,
train_mode = "label",
discard_amb_nuc = TRUE,
vocabulary = c("A", "C", "G", "T"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.