seq_encoding_label | R Documentation |
Returns encoding for integer or character sequence.
seq_encoding_label(
sequence = NULL,
maxlen,
vocabulary,
start_ind,
ambiguous_nuc = "zero",
nuc_dist = NULL,
quality_vector = NULL,
use_coverage = FALSE,
max_cov = NULL,
cov_vector = NULL,
n_gram = NULL,
n_gram_stride = 1,
masked_lm = NULL,
char_sequence = NULL,
tokenizer = NULL,
adjust_start_ind = FALSE,
return_int = FALSE
)
sequence |
Sequence of integers. |
maxlen |
Length of predictor sequence. |
vocabulary |
Vector of allowed characters. Characters outside vocabulary get encoded as specified in |
start_ind |
Start positions of samples in |
ambiguous_nuc |
How to handle nucleotides outside vocabulary, either |
nuc_dist |
Nucleotide distribution. |
quality_vector |
Vector of quality probabilities. |
use_coverage |
Integer or |
max_cov |
Biggest coverage value. Only applies if |
cov_vector |
Vector of coverage values associated to the input. |
n_gram |
Integer, encode target not nucleotide wise but combine n nucleotides at once. For example for |
n_gram_stride |
Step size for n-gram encoding. For AACCGGTT with |
masked_lm |
If not
|
char_sequence |
A character string. |
tokenizer |
A keras tokenizer. |
adjust_start_ind |
Whether to shift values in |
return_int |
Whether to return integer encoding or one-hot encoding. |
A list of 2 tensors.
# use integer sequence as input
x <- seq_encoding_label(sequence = c(1,0,5,1,3,4,3,1,4,1,2),
maxlen = 5,
vocabulary = c("a", "c", "g", "t"),
start_ind = c(1,3),
ambiguous_nuc = "equal")
x[1,,] # 1,0,5,1,3
x[2,,] # 5,1,3,4,
# use character string as input
x <- seq_encoding_label(maxlen = 5,
vocabulary = c("a", "c", "g", "t"),
start_ind = c(1,3),
ambiguous_nuc = "equal",
char_sequence = "ACTaaTNTNaZ")
x[1,,] # actaa
x[2,,] # taatn
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.