corpus: Text Corpus Analysis

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

Package details

AuthorLeslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)
MaintainerLeslie Huang <lesliehuang@nyu.edu>
LicenseApache License (== 2.0) | file LICENSE
Version0.10.2
URL https://leslie-huang.github.io/r-corpus/ https://github.com/leslie-huang/r-corpus
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("corpus")

Try the corpus package in your browser

Any scripts or data that you put into this service are public.

corpus documentation built on May 2, 2021, 9:06 a.m.