unnest_words_keep_context: Tokenized by word but produce a segment column to keep the...

View source: R/unnest_words_keep_context.R

unnest_words_keep_contextR Documentation

Tokenized by word but produce a segment column to keep the context of the words.

Description

Tokenized by word but produce a segment column to keep the context of the words.

Usage

unnest_words_keep_context(data, input, nwords = 20)

Arguments

data

data to tokenize (as provided to tidytext::unnest_tokens)

input

the column of the data that contains the text to be tokenized

nwords

the number of words that constitute a segment

Value

the tokenized data with a segment column as word context.

Examples

corpus_data=iramuteq_to_rtibble(from_dir="data-raw",filename="alltexts_iramuteq.txt")
unnest_words_keep_context(corpus_data,input="text",nwords=15)

lvaudor/textoteR documentation built on April 5, 2025, 3:03 a.m.