vspinu/mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Getting started

Package details

Maintainer
LicenseGPL-3
Version0.1
URL https://github.com/vspinu/mlvocab/
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("devtools")
library(devtools)
install_github("vspinu/mlvocab")
vspinu/mlvocab documentation built on Sept. 23, 2018, 5:16 a.m.