text2vec: Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

AuthorDmitriy Selivanov [aut, cre], Lincoln Mullen [ctb]
Date of publication2016-10-04 17:48:07
MaintainerDmitriy Selivanov <selivanov.dmitriy@gmail.com>
LicenseGPL (>= 2) | file LICENSE
Version0.4.0
http://text2vec.org

View on CRAN

Man pages

as.lda_c: Converts document-term matrix sparse matrix to 'lda_c' format

check_analogy_accuracy: Checks accuracy of word embeddings on the analogy task

create_corpus: Create a corpus

create_dtm: Document-term matrix construction

create_tcm: Term-co-occurence matrix construction

create_vocabulary: Creates a vocabulary of unique terms

distances: Pairwise Distance Matrix Computation

fit: Fits model to data

fit_transform: Fit model to data, then transform it

get_dtm: Extract document-term matrix

get_idf: Inverse document-frequency scaling matrix

get_tcm: Extract term-co-occurence matrix

get_tf: Term-frequency scaling matrix

GlobalVectors: Creates Global Vectors word-embeddings model.

glove: Fit a GloVe word-embedded model

ifiles: Creates iterator over text files from the disk

itoken: Iterators over input objects

LatentDirichletAllocation: Creates Latent Dirichlet Allocation model.

LatentSemanticAnalysis: Latent Semantic Analysis model

movie_review: IMDB movie reviews

normalize: Matrix normalization

prepare_analogy_questions: Prepares list of analogy questions

prune_vocabulary: Prune vocabulary

reexports: Objects exported from other packages

RelaxedWordMoversDistance: Creates model which can be used for calculation of "relaxed...

similarities: Pairwise Similarity Matrix Computation

split_into: Split a vector for parallel processing

text2vec: text2vec

TfIdf: TfIdf

tokenizers: Simple tokenization functions, which performs string...

transform: Transforms Matrix-like object using 'model'

transform_filter_commons: Remove terms from a document-term matrix

transform_tf: Scale a document-term matrix

vectorizers: Vocabulary and hash vectorizers

Files in this package

text2vec
text2vec/inst
text2vec/inst/doc
text2vec/inst/doc/glove.html
text2vec/inst/doc/glove.R
text2vec/inst/doc/text-vectorization.Rmd
text2vec/inst/doc/text-vectorization.R
text2vec/inst/doc/files-multicore.html
text2vec/inst/doc/files-multicore.R
text2vec/inst/doc/text-vectorization.html
text2vec/inst/doc/files-multicore.Rmd
text2vec/inst/doc/glove.Rmd
text2vec/tests
text2vec/tests/testthat.R
text2vec/tests/testthat
text2vec/tests/testthat/test-utils.R
text2vec/tests/testthat/test-tcm.R
text2vec/tests/testthat/utf8.r
text2vec/tests/testthat/test-distances.R
text2vec/tests/testthat/test-lsa.R
text2vec/tests/testthat/test-hash-corpus.R
text2vec/tests/testthat/test-iterators.R
text2vec/tests/testthat/not-test-doc2vec.R
text2vec/tests/testthat/test-s3-interface.R
text2vec/tests/testthat/test-vocab-high-level.R
text2vec/tests/testthat/test-vocab-corpus.R
text2vec/src
text2vec/src/Makevars
text2vec/src/Vocabulary.h
text2vec/src/matrix_utils.cpp
text2vec/src/VocabCorpus.h
text2vec/src/utils.cpp
text2vec/src/GloveFitter.cpp
text2vec/src/LDA_gibbs.cpp
text2vec/src/SparseTripletMatrix.h
text2vec/src/GloveFit.h
text2vec/src/Vocabulary.cpp
text2vec/src/text2vec.h
text2vec/src/HashCorpus.cpp
text2vec/src/uint_hash.cpp
text2vec/src/VocabCorpus.cpp
text2vec/src/Makevars.win
text2vec/src/RcppExports.cpp
text2vec/src/Corpus.h
text2vec/src/HashCorpus.h
text2vec/NAMESPACE
text2vec/NEWS.md
text2vec/data
text2vec/data/movie_review.RData
text2vec/data/datalist
text2vec/R
text2vec/R/utils.R text2vec/R/vocabulary.R text2vec/R/distance_RWMD.R text2vec/R/model_GloVe.R text2vec/R/model_LDA.R text2vec/R/text2vec.R text2vec/R/data.R text2vec/R/vectorizers.R text2vec/R/model_LSA.R text2vec/R/RcppExports.R text2vec/R/models_S3.R text2vec/R/analogies.R text2vec/R/models_R6.R text2vec/R/tcm.R text2vec/R/dtm.R text2vec/R/tokenizers.R text2vec/R/transformers.R text2vec/R/iterators.R text2vec/R/zzz.R text2vec/R/model_tfidf.R text2vec/R/distance.R
text2vec/vignettes
text2vec/vignettes/text-vectorization.Rmd
text2vec/vignettes/files-multicore.Rmd
text2vec/vignettes/glove.Rmd
text2vec/README.md
text2vec/MD5
text2vec/build
text2vec/build/vignette.rds
text2vec/DESCRIPTION
text2vec/man
text2vec/man/create_dtm.Rd text2vec/man/transform.Rd text2vec/man/transform_filter_commons.Rd text2vec/man/fit.Rd text2vec/man/distances.Rd text2vec/man/vectorizers.Rd text2vec/man/ifiles.Rd text2vec/man/get_tf.Rd text2vec/man/tokenizers.Rd text2vec/man/split_into.Rd text2vec/man/get_idf.Rd text2vec/man/get_tcm.Rd text2vec/man/create_vocabulary.Rd text2vec/man/check_analogy_accuracy.Rd text2vec/man/text2vec.Rd text2vec/man/prepare_analogy_questions.Rd text2vec/man/transform_tf.Rd text2vec/man/TfIdf.Rd text2vec/man/LatentSemanticAnalysis.Rd text2vec/man/as.lda_c.Rd text2vec/man/reexports.Rd text2vec/man/GlobalVectors.Rd text2vec/man/fit_transform.Rd text2vec/man/create_corpus.Rd text2vec/man/get_dtm.Rd text2vec/man/itoken.Rd text2vec/man/glove.Rd text2vec/man/movie_review.Rd text2vec/man/LatentDirichletAllocation.Rd text2vec/man/create_tcm.Rd text2vec/man/normalize.Rd text2vec/man/similarities.Rd text2vec/man/RelaxedWordMoversDistance.Rd text2vec/man/prune_vocabulary.Rd
text2vec/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.