text_preprocessor: Create a pruned vocabulary from a token iterator

Description Usage Arguments Value

View source: R/text-preprocessor.R

Description

This function creates a vocabulary from a vector of documents. A vocabulary defines the domain of a natural language processing problem. Vocabularies are often used to create vectorisers, which allow novel pieces of text to be mapped to a vocabulary defined by a training set. To exclude frequently and infrequently occurring tokens, the vocabulary is often trimmed. This reduces the dimension of the problem to decrease training time and the potential for overfitting.

Usage

1

Arguments

x

Character. Text to be processed.

Value

The same character after processing.


mdneuzerling/ModelAsAPackage documentation built on Feb. 1, 2020, 12:57 a.m.