Description Usage Arguments Value Examples
Term Indices: Convert text to integer indices
1 2 3 4 5 6 7 8 9 10 | tix_seq(corpus, vocab, keep_unknown = nbuckets > 0,
nbuckets = attr(vocab, "nbuckets"), reverse = FALSE)
tix_df(corpus, vocab, keep_unknown = nbuckets > 0,
nbuckets = attr(vocab, "nbuckets"), reverse = FALSE,
as_factor = FALSE)
tix_mat(corpus, vocab, maxlen = 100, pad_right = TRUE,
trunc_right = TRUE, keep_unknown = nbuckets > 0,
nbuckets = attr(vocab, "nbuckets"), reverse = FALSE)
|
corpus |
text corpus; see |
vocab |
data frame produced by |
keep_unknown |
logical. If |
nbuckets |
integer. How many buckets to hash unknowns into. |
reverse |
logical. Should each sequence be reversed in the final
output? Reversion happens after |
as_factor |
if TRUE, the returned index column will be a factor instead
of an integer vector. Will throw an error when |
maxlen |
integer. Maximum length of each sequence. |
pad_right |
logical. Should 0-padding of shorter than |
trunc_right |
logical. Should truncation of longer than |
tix_seq()
returns a list of integer vectors, tix_df()
produces a
flat index data.frame()
with two columns, tix_mat()
returns an integer
matrix, one row per sequence.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | corpus <- list(a = c("The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"),
b = c("the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog",
"the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"))
v <- vocab(corpus["b"]) # "The" is unknown
v
tix_seq(corpus, v)
tix_seq(corpus, v, keep_unknown = TRUE)
tix_seq(corpus, v, nbuckets = 1)
tix_seq(corpus, v, nbuckets = 3)
tix_mat(corpus, v, maxlen = 12)
tix_mat(corpus, v, maxlen = 12, keep_unknown = TRUE)
tix_mat(corpus, v, maxlen = 12, nbuckets = 1)
tix_mat(corpus, v, maxlen = 12, nbuckets = 1, reverse = TRUE)
tix_mat(corpus, v, maxlen = 12, pad_right = FALSE, nbuckets = 1)
tix_mat(corpus, v, maxlen = 12, trunc_right = FALSE, nbuckets = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.