truncate_seq_pair: Truncate a sequence pair to the maximum length.

View source: R/run_classifier.R

truncate_seq_pairR Documentation

Truncate a sequence pair to the maximum length.

Description

Truncates a sequence pair to the maximum length. This is a simple heuristic which will always truncate the longer sequence one token at a time (or the first sequence in case of a tie -JDB). This makes more sense than truncating an equal percent of tokens from each, since if one sequence is very short then each token that's truncated likely contains more information than a longer sequence.

Usage

truncate_seq_pair(tokens_a, tokens_b, max_length)

Arguments

tokens_a

Character; a vector of tokens in the first input sequence.

tokens_b

Character; a vector of tokens in the second input sequence.

max_length

Integer; the maximum total length of the two sequences.

Details

The python code truncated the sequences in place, using the pass-by-reference functionality of python. In R, we return the truncated sequences in a list.

Value

A list containing two character vectors: trunc_a and trunc_b.

Examples

## Not run: 
tokens_a <- c("a", "b", "c", "d")
tokens_b <- c("w", "x", "y", "z")
truncate_seq_pair(tokens_a, tokens_b, 5)

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.