cleanNLP: A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford's CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.

Author
Taylor B. Arnold
Date of publication
2016-11-11 12:09:27
Maintainer
Taylor B. Arnold <taylor.arnold@acm.org>
License
GPL-3
Version
0.24

View on CRAN

Man pages

annotate
Run the annotation pipeline on a set of documents
cleanNLP
cleanNLP: A Tidy Data Model for Natural Language Processing
combine_annotators
Combine a set of annotations
dep_frequency
Universal Dependency Frequencies
doc_id_reset
Reset document ids
download_clean_nlp
Download java files needed for cleanNLP
get_coreference
Access coreferences from an annotation object
get_dependency
Access dependencies from an annotation object
get_document
Access document meta data from an annotation object
get_entity
Access named entities from an annotation object
get_sentiment
Access sentiment scores from an annotation object
get_token
Access tokens from an annotation object
get_triple
Access triples from an annotation object
init_clean_nlp
Initialize the cleanNLP java object
obama
Annotation of Barack Obama's State of the Union Addresses
pos_frequency
Universal Part of Speech Code Frequencies
print.annotation
Print a summary of an annotation object
read_annotation
Read annotation files from disk
set_language
Easy interface for setting up the pipeline
set_properties
Set properties for the coreNLP pipeline
word_frequency
Most frequent English words
write_annotation
Write annotation files to disk

Files in this package

cleanNLP
cleanNLP/inst
cleanNLP/inst/extdata
cleanNLP/inst/extdata/StanfordCoreNLP-arabic.properties
cleanNLP/inst/extdata/StanfordCoreNLP-french.properties
cleanNLP/inst/extdata/StanfordCoreNLP-chinese.properties
cleanNLP/inst/extdata/StanfordCoreNLP-german.properties
cleanNLP/inst/extdata/StanfordCoreNLP-spanish.properties
cleanNLP/inst/extdata/cleanNLP-0.1.jar
cleanNLP/inst/extdata/StanfordCoreNLP.properties
cleanNLP/inst/extdata/CoreNLP-to-HTML.xsl
cleanNLP/inst/extdata/StanfordCoreNLP-english-all.properties
cleanNLP/inst/extdata/StanfordCoreNLP-english-fast.properties
cleanNLP/NAMESPACE
cleanNLP/data
cleanNLP/data/dep_frequency.rda
cleanNLP/data/word_frequency.rda
cleanNLP/data/obama.rda
cleanNLP/data/datalist
cleanNLP/data/pos_frequency.rda
cleanNLP/R
cleanNLP/R/accessors.R
cleanNLP/R/onLoad.R
cleanNLP/R/pkg.R
cleanNLP/R/data.R
cleanNLP/R/download.R
cleanNLP/R/annotate.R
cleanNLP/R/engine.R
cleanNLP/README.md
cleanNLP/MD5
cleanNLP/java
cleanNLP/java/src
cleanNLP/java/src/main
cleanNLP/java/src/main/scripts
cleanNLP/java/src/main/scripts/store.sh
cleanNLP/java/src/main/java
cleanNLP/java/src/main/java/edu
cleanNLP/java/src/main/java/edu/richmond
cleanNLP/java/src/main/java/edu/richmond/nlp
cleanNLP/java/src/main/java/edu/richmond/nlp/AnnotationProcessor.java
cleanNLP/java/src/main/java/edu/richmond/nlp/ConsoleOutputCapturer.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDependencyDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVDocumentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVCoreferenceDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVNamedEntityDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVSentimentDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTripleDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Writer/CSVTokenDocumentWriter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVCoreferenceOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTokenOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDocumentOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVNamedEntityOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVTripleOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVDependencyOutputter.java
cleanNLP/java/src/main/java/edu/richmond/nlp/Outputter/CSVSentimentOutputter.java
cleanNLP/java/pom.xml
cleanNLP/DESCRIPTION
cleanNLP/man
cleanNLP/man/print.annotation.Rd
cleanNLP/man/annotate.Rd
cleanNLP/man/pos_frequency.Rd
cleanNLP/man/read_annotation.Rd
cleanNLP/man/get_token.Rd
cleanNLP/man/write_annotation.Rd
cleanNLP/man/obama.Rd
cleanNLP/man/get_triple.Rd
cleanNLP/man/get_document.Rd
cleanNLP/man/download_clean_nlp.Rd
cleanNLP/man/word_frequency.Rd
cleanNLP/man/set_language.Rd
cleanNLP/man/get_dependency.Rd
cleanNLP/man/get_entity.Rd
cleanNLP/man/get_sentiment.Rd
cleanNLP/man/dep_frequency.Rd
cleanNLP/man/get_coreference.Rd
cleanNLP/man/set_properties.Rd
cleanNLP/man/init_clean_nlp.Rd
cleanNLP/man/doc_id_reset.Rd
cleanNLP/man/cleanNLP.Rd
cleanNLP/man/combine_annotators.Rd