Description Usage Arguments Value
View source: R/create-vocabulary.R
This function creates a vocabulary from a vector of documents. A vocabulary defines the domain of a natural language processing problem. Vocabularies are often used to create vectorisers, which allow novel pieces of text to be mapped to a vocabulary defined by a training set. To exclude frequently and infrequently occurring tokens, the vocabulary is often trimmed. This reduces the dimension of the problem to decrease training time and the potential for overfitting.
1 | create_vocabulary(documents, doc_proportion_min = 0, doc_proportion_max = 1)
|
documents |
A vector of characters, often sentences or paragraphs. |
doc_proportion_min |
Optional. A number between 0 and 1 which specifies the minimum proportion of documents in which a token appears in order to be included in the vocabulary. Defaults to 0 (no effect). |
doc_proportion_max |
Optional. A number between 0 and 1 which specifies the maximum proportion of documents in which a token appears in order to be included in the vocabulary. Defaults to 1 (no effect). |
A vocabulary object used in the text2vec package
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.