Home

/

CRAN

/

mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Package overview README.md

Vignettes Man pages API and functions Files

Package details
Author	Vitalie Spinu [aut, cre]
Maintainer	Vitalie Spinu <spinuvit@gmail.com>
License	GPL-3
Version	0.1
URL	https://github.com/vspinu/mlvocab/
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("mlvocab")`