vspinu/mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Getting started

Package details

Maintainer
LicenseGPL-3
Version0.1
URL https://github.com/vspinu/mlvocab/
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("remotes")
remotes::install_github("vspinu/mlvocab")
vspinu/mlvocab documentation built on June 11, 2021, 7:37 a.m.