README.md

Build Status CRAN RStudio mirror downloads CRAN version

Corpus and Vocabulary Preprocessing Utilities for Natural Language Pipelines (an R package)

The following two-step abstraction is provided by the package:

  1. The vocabulary object is first built from the entire corpus with the help of vocab(), update_vocab() and prune_vocab() functions.
  2. Then, the vocabulary is passed alongside the corpus to a variety of corpus pre-processing functions. Most of the mlvocab functions accept nbuckets argument for partial or full hashing of the corpus.

Current functionality includes:

Stability

Package is in alpha state. API changes are likely.



vspinu/mlvocab documentation built on June 11, 2021, 7:37 a.m.