parsent is a collection of tools used to parse sentences. The package is a wrapper for the NLP/openNLP packages that simplifies and extends the user experience.
Functions typically fall into the task category of (1) parsing, (2) converting, & (3) extracting. The main functions, task category, & descriptions are summarized in the table below:
Function Task Descriptionparser
parsing
Parse sentences into phrases
parse_annotator
parsing
Generate OpenNLP parser required by parser
function
as_tree
converting
Convert parser
output into tree form
as_square_brace
converting
Convert parser
output in square brace form (vs. round)
as_square_brace_latex
converting
Convert parser
output LaTeX ready form
get_phrases
extracting
Extract phrases from parser
output
get_phrase_type
extracting
Extract phrases one step down the tree
get_phrase_type_regex
extracting
Extract phrases at any level in the tree (uses regex)
get_leaves
extracting
Extract the leaves (tokens or words) from a phrase
take
extracting
Select indexed elements from a vector
To download the development version of parsent:
Download the zip
ball or tar
ball, decompress and
run R CMD INSTALL
on it, or use the pacman package to install the
development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(c(
"trinker/textshape",
"trinker/coreNLPsetup",
"trinker/parsent"
))
You are welcome to: - submit suggestions and bug-reports at: https://github.com/trinker/parsent/issues - send a pull request on: https://github.com/trinker/parsent/ - compose a friendly e-mail to: tyler.rinker@gmail.com
if (!require("pacman")) install.packages("pacman")
pacman::p_load(parsent, magrittr)
txt <- c(
"Really, I like chocolate because it is good. It smells great.",
"Robots are rather evil and most are devoid of decency.",
"He is my friend.",
"Clifford the big red dog ate my lunch.",
"Professor Johns can not teach",
"",
NA
)
if(!exists('parse_ann')) {
parse_ann <- parse_annotator()
}
(x <- parser(txt, parse.annotator = parse_ann))
## [[1]]
## [1] "(TOP (S (S (ADVP (RB Really))(, ,) (NP (PRP I)) (VP (VBP like) (NP (NN chocolate)) (SBAR (IN because) (S (NP (PRP it)) (VP (VBZ is) (ADJP (JJ good)))))))(. .) (NP (PRP It)) (VP (VBZ smells) (ADJP (JJ great)))(. .)))"
##
## [[2]]
## [1] "(TOP (S (S (NP (NNP Robots)) (VP (VBP are) (ADJP (RB rather) (JJ evil)))) (CC and) (S (NP (RBS most)) (VP (VBP are) (ADJP (JJ devoid) (PP (IN of) (NP (NN decency))))))(. .)))"
##
## [[3]]
## [1] "(TOP (S (NP (PRP He)) (VP (VBZ is) (NP (PRP$ my) (NN friend)))(. .)))"
##
## [[4]]
## [1] "(TOP (S (NP (NNP Clifford)) (NP (DT the) (JJ big) (JJ red) (NN dog)) (VP (VBD ate) (NP (PRP$ my) (NN lunch)))(. .)))"
##
## [[5]]
## [1] "(TOP (S (S (NP (NNP Professor) (NNP Johns)) (VP (MD can))) (RB not) (VB teach)))"
##
## [[6]]
## [1] NA
##
## [[7]]
## [1] NA
Note that the user may choose to use CoreNLP as a backend by setting
engine = "coreNLP"
. To ensure that coreNLP is setup properly use
check_setup
.
par(mar = c(0,0,0,.7) + 0.2)
plot(x[[2]])
par(
mfrow = c(3, 2),
mar = c(0,0,1,1) + 0.1
)
invisible(lapply(x[1:5], plot))
get_phrase_type(x, "NP") %>%
take() %>%
get_leaves()
## [[1]]
## [1] "I"
##
## [[2]]
## [1] "Robots"
##
## [[3]]
## [1] "He"
##
## [[4]]
## [1] "Clifford"
##
## [[5]]
## [1] "Professor" "Johns"
##
## [[6]]
## [1] NA
##
## [[7]]
## [1] NA
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("(VB|MD)") %>%
take() %>%
get_leaves()
## [[1]]
## [1] "like"
##
## [[2]]
## [1] "are"
##
## [[3]]
## [1] "is"
##
## [[4]]
## [1] "ate"
##
## [[5]]
## [1] "can"
##
## [[6]]
## [1] NA
##
## [[7]]
## [1] NA
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("NP") %>%
take() %>%
get_leaves()
## [[1]]
## [1] "chocolate"
##
## [[2]]
## NULL
##
## [[3]]
## [1] "my" "friend"
##
## [[4]]
## [1] "my" "lunch"
##
## [[5]]
## NULL
##
## [[6]]
## [1] NA
##
## [[7]]
## [1] NA
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.