R/package.R
In mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

##' `mlvocab` package
##'
##' The following two-step abstraction is provided by the `mlvocab`
##' package. First, the vocabulary object is built from the entire corpus with
##' the help of [vocab()], [update_vocab()] and [prune_vocab()]
##' functions. Second, the vocabulary is passed alongside the corpus to a
##' variety of corpus pre-processing functions.
##'
##' Most of the `mlvocab` functions accept `nbuckets` argument for
##' partial or full hashing of the corpus.
##' 
##' Current functionality includes:
##' 
##' \itemize{
##'
##' \item{term index sequences}{[tix_seq()] and [tix_mat()] produce integer
##'   sequences suitable for direct consumption by various sequence models.}
##'
##' \item{term matrices}{[dtm()], [tdm()] and [tcm()] create document-term,
##' term-document and term-co-occurrence matrices respectively.}
##'
##' \item{vocabulary embedding}{given pre-trained word-vectors [prune_embeddings()]
##' creates smaller embedding matrices treating missing and unknown vocabulary
##' terms with grace.}
##'
##' \item{tfidf weighting}{[tfidf()] computes various versions of term
##' frequency, inverse document frequency weighting of `dtm` and `tdm`
##' matrices.}
##' 
##' }
##'
##' @author Vitalie Spinu (\email{spinuvit@gmail.com})
##' @import sparsepp
##' @importFrom digest digest
##' @importFrom Rcpp sourceCpp
##' @importFrom Matrix Diagonal t rowSums colSums
##' @importFrom methods new
##' @importFrom utils head tail
##' @useDynLib mlvocab, .registration=TRUE
"_PACKAGE"

Any scripts or data that you put into this service are public.

mlvocab documentation built on Sept. 21, 2018, 6:35 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlvocab
Vocabulary and Corpus Preprocessing for Natural Language Pipelines

R/package.R
In mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Try the mlvocab package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mlvocab Vocabulary and Corpus Preprocessing for Natural Language Pipelines

R/package.R In mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Try the mlvocab package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

mlvocab
Vocabulary and Corpus Preprocessing for Natural Language Pipelines

R/package.R
In mlvocab: Vocabulary and Corpus Preprocessing for Natural Language Pipelines