generate_corpus: Generate text corpus

Description Usage Arguments Value

View source: R/generate_corpus.R

Description

Generates and tokenizes a text corpus.

Usage

1
generate_corpus(text, type, sub_token, label = NULL, clean = TRUE)

Arguments

text

A vector of character strings.

type

Type of texts to be sampled. Possible values are texts, paragraphs, sentences, words, and characters.

sub_token

A string specifying the text unit for filtering texts by length via min_length and max_length.

label

A vector of labels.

clean

If true, the texts are cleaned before text sampling. Default is true.

Value

Text corpus.


nproellochs/textsampler documentation built on Nov. 4, 2019, 10:10 p.m.