mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Package details

AuthorVitalie Spinu [aut, cre]
MaintainerVitalie Spinu <spinuvit@gmail.com>
LicenseGPL-3
Version0.1
URL https://github.com/vspinu/mlvocab/
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("mlvocab")

Try the mlvocab package in your browser

Any scripts or data that you put into this service are public.

mlvocab documentation built on Sept. 21, 2018, 6:35 p.m.