corpus: Text Corpus Analysis

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

Package overview Chinese text handling Introduction to corpus Stemming Words Text data in Corpus and other packages

Vignettes Man pages API and functions Files

Package details
Author	Leslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)
Maintainer	Leslie Huang <lesliehuang@nyu.edu>
License	Apache License (== 2.0) \| file LICENSE
Version	0.10.2
URL	https://leslie-huang.github.io/r-corpus/ https://github.com/leslie-huang/r-corpus
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("corpus")`