tokens_chunk: Segment tokens object by chunks of a given size
In koheiw/quanteda.core: Quantitative Analysis of Textual Data

Description Usage Arguments Value See Also Examples

Segment tokens into new documents of equally sized token lengths, with the possibility of overlapping the chunks.

1	tokens_chunk(x, size, overlap = 0, use_docvars = TRUE)

`x`	tokens object whose token elements will be segmented into chunks
`size`	integer; the token length of the chunks
`overlap`	integer; the number of tokens in a chunk to be taken from the last `overlap` tokens from the preceding chunk
`use_docvars`	if `TRUE`, repeat the docvar values for each chunk; if `FALSE`, drop the docvars in the chunked tokens

A tokens object whose documents have been split into chunks of length size.

tokens_segment()

txts <- c(doc1 = "Fellow citizens, I am again called upon by the voice of
                  my country to execute the functions of its Chief Magistrate.",
          doc2 = "When the occasion proper for it shall arrive, I shall
                  endeavor to express the high sense I entertain of this
                  distinguished honor.")
toks <- tokens(txts)
tokens_chunk(toks, size = 5)
tokens_chunk(toks, size = 5, overlap = 4)