cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.

AuthorTaylor B. Arnold
Date of publication2016-11-11 12:09:27
MaintainerTaylor B. Arnold <taylor.arnold@acm.org>
LicenseGPL-3
Version0.24

View on CRAN

Functions

annotate Man page
cleanNLP Man page
cleanNLP-package Man page
combine_annotators Man page
dep_frequency Man page
doc_id_reset Man page
download_clean_nlp Man page
get_coreference Man page
get_dependency Man page
get_document Man page
get_entity Man page
get_sentiment Man page
get_token Man page
get_triple Man page
init_clean_nlp Man page
obama Man page
pos_frequency Man page
print.annotation Man page
read_annotation Man page
set_language Man page
set_properties Man page
word_frequency Man page
write_annotation Man page

Files

cleanNLP
cleanNLP/inst
cleanNLP/inst/extdata
cleanNLP/inst/extdata/StanfordCoreNLP-arabic.properties
cleanNLP/inst/extdata/StanfordCoreNLP-french.properties
cleanNLP/inst/extdata/StanfordCoreNLP-chinese.properties
cleanNLP/inst/extdata/StanfordCoreNLP-german.properties
cleanNLP/inst/extdata/StanfordCoreNLP-spanish.properties
cleanNLP/inst/extdata/cleanNLP-0.1.jar
cleanNLP/inst/extdata/StanfordCoreNLP.properties
cleanNLP/inst/extdata/CoreNLP-to-HTML.xsl
cleanNLP/inst/extdata/StanfordCoreNLP-english-all.properties
cleanNLP/inst/extdata/StanfordCoreNLP-english-fast.properties
cleanNLP/NAMESPACE
cleanNLP/data
cleanNLP/data/dep_frequency.rda
cleanNLP/data/word_frequency.rda
cleanNLP/data/obama.rda
cleanNLP/data/datalist
cleanNLP/data/pos_frequency.rda
cleanNLP/R
cleanNLP/R/accessors.R cleanNLP/R/onLoad.R cleanNLP/R/pkg.R cleanNLP/R/data.R cleanNLP/R/download.R cleanNLP/R/annotate.R cleanNLP/R/engine.R
cleanNLP/README.md
cleanNLP/MD5
cleanNLP/java
cleanNLP/java/src
cleanNLP/java/src/main
cleanNLP/java/src/main/scripts
cleanNLP/java/src/main/scripts/store.sh
cleanNLP/java/src/main/java
cleanNLP/java/src/main/java/edu
cleanNLP/java/src/main/java/edu/richmond
cleanNLP/java/src/main/java/edu/richmond/nlp
cleanNLP/java/src/main/java/edu/richmond/nlp/AnnotationProcessor.java
cleanNLP/java/src/main/java/edu/richmond/nlp/ConsoleOutputCapturer.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDependencyDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDocumentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVCoreferenceDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVNamedEntityDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVSentimentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTripleDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTokenDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVCoreferenceOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTokenOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDocumentOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVNamedEntityOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTripleOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDependencyOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVSentimentOutputter.java
cleanNLP/java/pom.xml
cleanNLP/DESCRIPTION
cleanNLP/man
cleanNLP/man/print.annotation.Rd cleanNLP/man/annotate.Rd cleanNLP/man/pos_frequency.Rd cleanNLP/man/read_annotation.Rd cleanNLP/man/get_token.Rd cleanNLP/man/write_annotation.Rd cleanNLP/man/obama.Rd cleanNLP/man/get_triple.Rd cleanNLP/man/get_document.Rd cleanNLP/man/download_clean_nlp.Rd cleanNLP/man/word_frequency.Rd cleanNLP/man/set_language.Rd cleanNLP/man/get_dependency.Rd cleanNLP/man/get_entity.Rd cleanNLP/man/get_sentiment.Rd cleanNLP/man/dep_frequency.Rd cleanNLP/man/get_coreference.Rd cleanNLP/man/set_properties.Rd cleanNLP/man/init_clean_nlp.Rd cleanNLP/man/doc_id_reset.Rd cleanNLP/man/cleanNLP.Rd cleanNLP/man/combine_annotators.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.