cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.

AuthorTaylor B. Arnold
Date of publication2016-11-11 12:09:27
MaintainerTaylor B. Arnold <taylor.arnold@acm.org>
LicenseGPL-3
Version0.24

View on CRAN

Files in this package

cleanNLP
cleanNLP/inst
cleanNLP/inst/extdata
cleanNLP/inst/extdata/StanfordCoreNLP-arabic.properties
cleanNLP/inst/extdata/StanfordCoreNLP-french.properties
cleanNLP/inst/extdata/StanfordCoreNLP-chinese.properties
cleanNLP/inst/extdata/StanfordCoreNLP-german.properties
cleanNLP/inst/extdata/StanfordCoreNLP-spanish.properties
cleanNLP/inst/extdata/cleanNLP-0.1.jar
cleanNLP/inst/extdata/StanfordCoreNLP.properties
cleanNLP/inst/extdata/CoreNLP-to-HTML.xsl
cleanNLP/inst/extdata/StanfordCoreNLP-english-all.properties
cleanNLP/inst/extdata/StanfordCoreNLP-english-fast.properties
cleanNLP/NAMESPACE
cleanNLP/data
cleanNLP/data/dep_frequency.rda
cleanNLP/data/word_frequency.rda
cleanNLP/data/obama.rda
cleanNLP/data/datalist
cleanNLP/data/pos_frequency.rda
cleanNLP/R
cleanNLP/R/accessors.R cleanNLP/R/onLoad.R cleanNLP/R/pkg.R cleanNLP/R/data.R cleanNLP/R/download.R cleanNLP/R/annotate.R cleanNLP/R/engine.R
cleanNLP/README.md
cleanNLP/MD5
cleanNLP/java
cleanNLP/java/src
cleanNLP/java/src/main
cleanNLP/java/src/main/scripts
cleanNLP/java/src/main/scripts/store.sh
cleanNLP/java/src/main/java
cleanNLP/java/src/main/java/edu
cleanNLP/java/src/main/java/edu/richmond
cleanNLP/java/src/main/java/edu/richmond/nlp
cleanNLP/java/src/main/java/edu/richmond/nlp/AnnotationProcessor.java
cleanNLP/java/src/main/java/edu/richmond/nlp/ConsoleOutputCapturer.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDependencyDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDocumentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVCoreferenceDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVNamedEntityDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVSentimentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTripleDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTokenDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVCoreferenceOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTokenOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDocumentOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVNamedEntityOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTripleOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDependencyOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVSentimentOutputter.java
cleanNLP/java/pom.xml
cleanNLP/DESCRIPTION
cleanNLP/man
cleanNLP/man/print.annotation.Rd cleanNLP/man/annotate.Rd cleanNLP/man/pos_frequency.Rd cleanNLP/man/read_annotation.Rd cleanNLP/man/get_token.Rd cleanNLP/man/write_annotation.Rd cleanNLP/man/obama.Rd cleanNLP/man/get_triple.Rd cleanNLP/man/get_document.Rd cleanNLP/man/download_clean_nlp.Rd cleanNLP/man/word_frequency.Rd cleanNLP/man/set_language.Rd cleanNLP/man/get_dependency.Rd cleanNLP/man/get_entity.Rd cleanNLP/man/get_sentiment.Rd cleanNLP/man/dep_frequency.Rd cleanNLP/man/get_coreference.Rd cleanNLP/man/set_properties.Rd cleanNLP/man/init_clean_nlp.Rd cleanNLP/man/doc_id_reset.Rd cleanNLP/man/cleanNLP.Rd cleanNLP/man/combine_annotators.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.