View source: R/run_classifier.R
truncate_seq_pair | R Documentation |
Truncates a sequence pair to the maximum length. This is a simple heuristic which will always truncate the longer sequence one token at a time (or the first sequence in case of a tie -JDB). This makes more sense than truncating an equal percent of tokens from each, since if one sequence is very short then each token that's truncated likely contains more information than a longer sequence.
truncate_seq_pair(tokens_a, tokens_b, max_length)
tokens_a |
Character; a vector of tokens in the first input sequence. |
tokens_b |
Character; a vector of tokens in the second input sequence. |
max_length |
Integer; the maximum total length of the two sequences. |
The python code truncated the sequences in place, using the pass-by-reference functionality of python. In R, we return the truncated sequences in a list.
A list containing two character vectors: trunc_a and trunc_b.
## Not run: tokens_a <- c("a", "b", "c", "d") tokens_b <- c("w", "x", "y", "z") truncate_seq_pair(tokens_a, tokens_b, 5) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.