text2vec: Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Author
Dmitriy Selivanov [aut, cre], Lincoln Mullen [ctb]
Date of publication
2016-10-04 17:48:07
Maintainer
Dmitriy Selivanov <selivanov.dmitriy@gmail.com>
License
GPL (>= 2) | file LICENSE
Version
0.4.0
URLs

View on CRAN

Man pages

as.lda_c
Converts document-term matrix sparse matrix to 'lda_c' format
check_analogy_accuracy
Checks accuracy of word embeddings on the analogy task
create_corpus
Create a corpus
create_dtm
Document-term matrix construction
create_tcm
Term-co-occurence matrix construction
create_vocabulary
Creates a vocabulary of unique terms
distances
Pairwise Distance Matrix Computation
fit
Fits model to data
fit_transform
Fit model to data, then transform it
get_dtm
Extract document-term matrix
get_idf
Inverse document-frequency scaling matrix
get_tcm
Extract term-co-occurence matrix
get_tf
Term-frequency scaling matrix
GlobalVectors
Creates Global Vectors word-embeddings model.
glove
Fit a GloVe word-embedded model
ifiles
Creates iterator over text files from the disk
itoken
Iterators over input objects
LatentDirichletAllocation
Creates Latent Dirichlet Allocation model.
LatentSemanticAnalysis
Latent Semantic Analysis model
movie_review
IMDB movie reviews
normalize
Matrix normalization
prepare_analogy_questions
Prepares list of analogy questions
prune_vocabulary
Prune vocabulary
reexports
Objects exported from other packages
RelaxedWordMoversDistance
Creates model which can be used for calculation of "relaxed...
similarities
Pairwise Similarity Matrix Computation
split_into
Split a vector for parallel processing
text2vec
text2vec
TfIdf
TfIdf
tokenizers
Simple tokenization functions, which performs string...
transform
Transforms Matrix-like object using 'model'
transform_filter_commons
Remove terms from a document-term matrix
transform_tf
Scale a document-term matrix
vectorizers
Vocabulary and hash vectorizers

Files in this package

text2vec
text2vec/inst
text2vec/inst/doc
text2vec/inst/doc/glove.html
text2vec/inst/doc/glove.R
text2vec/inst/doc/text-vectorization.Rmd
text2vec/inst/doc/text-vectorization.R
text2vec/inst/doc/files-multicore.html
text2vec/inst/doc/files-multicore.R
text2vec/inst/doc/text-vectorization.html
text2vec/inst/doc/files-multicore.Rmd
text2vec/inst/doc/glove.Rmd
text2vec/tests
text2vec/tests/testthat.R
text2vec/tests/testthat
text2vec/tests/testthat/test-utils.R
text2vec/tests/testthat/test-tcm.R
text2vec/tests/testthat/utf8.r
text2vec/tests/testthat/test-distances.R
text2vec/tests/testthat/test-lsa.R
text2vec/tests/testthat/test-hash-corpus.R
text2vec/tests/testthat/test-iterators.R
text2vec/tests/testthat/not-test-doc2vec.R
text2vec/tests/testthat/test-s3-interface.R
text2vec/tests/testthat/test-vocab-high-level.R
text2vec/tests/testthat/test-vocab-corpus.R
text2vec/src
text2vec/src/Makevars
text2vec/src/Vocabulary.h
text2vec/src/matrix_utils.cpp
text2vec/src/VocabCorpus.h
text2vec/src/utils.cpp
text2vec/src/GloveFitter.cpp
text2vec/src/LDA_gibbs.cpp
text2vec/src/SparseTripletMatrix.h
text2vec/src/GloveFit.h
text2vec/src/Vocabulary.cpp
text2vec/src/text2vec.h
text2vec/src/HashCorpus.cpp
text2vec/src/uint_hash.cpp
text2vec/src/VocabCorpus.cpp
text2vec/src/Makevars.win
text2vec/src/RcppExports.cpp
text2vec/src/Corpus.h
text2vec/src/HashCorpus.h
text2vec/NAMESPACE
text2vec/NEWS.md
text2vec/data
text2vec/data/movie_review.RData
text2vec/data/datalist
text2vec/R
text2vec/R/utils.R
text2vec/R/vocabulary.R
text2vec/R/distance_RWMD.R
text2vec/R/model_GloVe.R
text2vec/R/model_LDA.R
text2vec/R/text2vec.R
text2vec/R/data.R
text2vec/R/vectorizers.R
text2vec/R/model_LSA.R
text2vec/R/RcppExports.R
text2vec/R/models_S3.R
text2vec/R/analogies.R
text2vec/R/models_R6.R
text2vec/R/tcm.R
text2vec/R/dtm.R
text2vec/R/tokenizers.R
text2vec/R/transformers.R
text2vec/R/iterators.R
text2vec/R/zzz.R
text2vec/R/model_tfidf.R
text2vec/R/distance.R
text2vec/vignettes
text2vec/vignettes/text-vectorization.Rmd
text2vec/vignettes/files-multicore.Rmd
text2vec/vignettes/glove.Rmd
text2vec/README.md
text2vec/MD5
text2vec/build
text2vec/build/vignette.rds
text2vec/DESCRIPTION
text2vec/man
text2vec/man/create_dtm.Rd
text2vec/man/transform.Rd
text2vec/man/transform_filter_commons.Rd
text2vec/man/fit.Rd
text2vec/man/distances.Rd
text2vec/man/vectorizers.Rd
text2vec/man/ifiles.Rd
text2vec/man/get_tf.Rd
text2vec/man/tokenizers.Rd
text2vec/man/split_into.Rd
text2vec/man/get_idf.Rd
text2vec/man/get_tcm.Rd
text2vec/man/create_vocabulary.Rd
text2vec/man/check_analogy_accuracy.Rd
text2vec/man/text2vec.Rd
text2vec/man/prepare_analogy_questions.Rd
text2vec/man/transform_tf.Rd
text2vec/man/TfIdf.Rd
text2vec/man/LatentSemanticAnalysis.Rd
text2vec/man/as.lda_c.Rd
text2vec/man/reexports.Rd
text2vec/man/GlobalVectors.Rd
text2vec/man/fit_transform.Rd
text2vec/man/create_corpus.Rd
text2vec/man/get_dtm.Rd
text2vec/man/itoken.Rd
text2vec/man/glove.Rd
text2vec/man/movie_review.Rd
text2vec/man/LatentDirichletAllocation.Rd
text2vec/man/create_tcm.Rd
text2vec/man/normalize.Rd
text2vec/man/similarities.Rd
text2vec/man/RelaxedWordMoversDistance.Rd
text2vec/man/prune_vocabulary.Rd
text2vec/LICENSE