textTinyR: Text Processing for Small or Big Data Files

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

README.md Functionality of the textTinyR package Word vectors - doc2vec - text clustering

Vignettes Man pages API and functions Files

Package details
Author	Lampros Mouselimis [aut, cre] (<https://orcid.org/0000-0002-8024-1546>)
Maintainer	Lampros Mouselimis <mouselimislampros@gmail.com>
License	GPL-3
Version	1.1.8
URL	https://github.com/mlampros/textTinyR
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("textTinyR")`