desc <- suppressWarnings(readLines("DESCRIPTION")) regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)" loc <- grep(regex, desc) ver <- gsub(regex, "\\2", desc[loc]) verbadge <- sprintf('<a href="https://img.shields.io/badge/Version-%s-orange.svg"><img src="https://img.shields.io/badge/Version-%s-orange.svg" alt="Version"/></a></p>', ver, ver) verbadge <- '' ```` [![Build Status](https://travis-ci.org/trinker/parsent.svg?branch=master)](https://travis-ci.org/trinker/parsent) [![Coverage Status](https://coveralls.io/repos/trinker/parsent/badge.svg?branch=master)](https://coveralls.io/r/trinker/parsent?branch=master) `r verbadge` ```r library(knitr) knit_hooks$set(htmlcap = function(before, options, envir) { if(!before) { paste('<p class="caption"><b><em>',options$htmlcap,"</em></b></p>",sep="") } }) knitr::opts_knit$set(self.contained = TRUE, cache = FALSE) knitr::opts_chunk$set(fig.path = "tools/figure/")
parsent is a collection of tools used to parse sentences. The package is a wrapper for the NLP/openNLP packages that simplifies and extends the user experience.
Functions typically fall into the task category of (1) parsing, (2) converting, & (3) extracting. The main functions, task category, & descriptions are summarized in the table below:
| Function | Task | Description |
|---------------------------|------------|-----------------------------------------------------------|
| parser
| parsing | Parse sentences into phrases |
| parse_annotator
| parsing | Generate OpenNLP parser required by parser
function |
| as_tree
| converting | Convert parser
output into tree form |
| as_square_brace
| converting | Convert parser
output in square brace form (vs. round) |
| as_square_brace_latex
| converting | Convert parser
output LaTeX ready form |
| get_phrases
| extracting | Extract phrases from parser
output |
| get_phrase_type
| extracting | Extract phrases one step down the tree |
| get_phrase_type_regex
| extracting | Extract phrases at any level in the tree (uses regex) |
| get_leaves
| extracting | Extract the leaves (tokens or words) from a phrase |
| take
| extracting | Select indexed elements from a vector |
To download the development version of parsent:
Download the zip ball or tar ball, decompress and run R CMD INSTALL
on it, or use the pacman package to install the development version:
if (!require("pacman")) install.packages("pacman") pacman::p_load_gh(c( "trinker/textshape", "trinker/coreNLPsetup", "trinker/parsent" ))
You are welcome to: submit suggestions and bug-reports at: https://github.com/trinker/parsent/issues send a pull request on: https://github.com/trinker/parsent/ * compose a friendly e-mail to: tyler.rinker@gmail.com
if (!require("pacman")) install.packages("pacman") pacman::p_load(parsent, magrittr) txt <- c( "Really, I like chocolate because it is good. It smells great.", "Robots are rather evil and most are devoid of decency.", "He is my friend.", "Clifford the big red dog ate my lunch.", "Professor Johns can not teach", "", NA )
if(!exists('parse_ann')) { parse_ann <- parse_annotator() }
(x <- parser(txt, parse.annotator = parse_ann))
Note that the user may choose to use CoreNLP as a backend by setting engine = "coreNLP"
. To ensure that coreNLP is setup properly use check_setup
.
par(mar = c(0,0,0,.7) + 0.2) plot(x[[2]])
par( mfrow = c(3, 2), mar = c(0,0,1,1) + 0.1 ) invisible(lapply(x[1:5], plot))
get_phrase_type(x, "NP") %>% take() %>% get_leaves()
get_phrase_type_regex(x, "VP") %>% take() %>% get_phrase_type_regex("(VB|MD)") %>% take() %>% get_leaves()
get_phrase_type_regex(x, "VP") %>% take() %>% get_phrase_type_regex("NP") %>% take() %>% get_leaves()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.