unnest_words_keep_context: Tokenized by word but produce a segment column to keep the...
In lvaudor/textoteR: Format text corpora

unnest_words_keep_context

R Documentation

Tokenized by word but produce a segment column to keep the context of the words.

Tokenized by word but produce a segment column to keep the context of the words.

unnest_words_keep_context(data, input, nwords = 20)

`data`	data to tokenize (as provided to tidytext::unnest_tokens)
`input`	the column of the data that contains the text to be tokenized
`nwords`	the number of words that constitute a segment

the tokenized data with a segment column as word context.

corpus_data=iramuteq_to_rtibble(from_dir="data-raw",filename="alltexts_iramuteq.txt")
unnest_words_keep_context(corpus_data,input="text",nwords=15)

lvaudor/textoteR documentation built on April 5, 2025, 3:03 a.m.

lvaudor/textoteR index

rdrr.io home R language documentation Run R code online

Note that we can't provide technical support on individual packages. You should contact the package authors for that.