get_phrase_type_regex: Regex Grab of Top Nest Phrases

Description Usage Arguments Value Author(s) References Examples

Description

Uses a regex grab of phrases and corresponding sub-phrases and words. For example, x <- "(NP, x)(NP, (VP a)(NP y))(VB z)" will extract "(NP, x)" & "(NP, (VP a)(NP y))" but not (NP y) within the "(NP, (VP a)(NP y))". This function is useful over get_phrase_type for certain parsing tasks in that is can be used at any level of parse.

Usage

1

Arguments

x

A parsed character string or list (see parser).

phrase

A phrase type to extract phrases and corresponding words (see http://www.surdeanu.info/mihai/teaching/ista555-spring15/readings/PennTreebankConstituents.html for more on phrase types).

Value

Returns a list of character vectors of extracted phrases.

Author(s)

Jason Gray and Tyler Rinker <tyler.rinker@gmail.com>.

References

http://stackoverflow.com/a/32899764/1000343

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## Not run: 
txt <- c(
    "Really, I like chocolate because it is good. It smells great.",
    "Robots are rather evil and most are devoid of decency.",
    "He is my friend.",
    "Clifford the big red dog ate my lunch.",
    "Professor Johns can not teach",
    "",
    NA
)

if(!exists('parse_ann')) {
    parse_ann <- parse_annotator()
}
(x <- parser(txt, parse_ann))

get_phrase_type_regex(x, "VP")
get_phrase_type_regex(x, "NP")
get_phrase_type_regex(x, "VBZ")
get_phrase_type_regex(x, "V")

## get the words
get_leaves(get_phrase_type_regex(x, "NP"))

## As a dplyr chain
library(dplyr)
x %>%
    get_phrase_type_regex("NP") %>%
    get_leaves()

## With `get_phrase_type` as a dplyr chain
library(dplyr)
x %>%
    get_phrase_type("NP") %>%
    lapply(get_phrase_type_regex, "(PRP|NN)") %>%
    lapply(unlist)

## Subject
get_phrase_type(x, "NP") %>%
    take() %>%
    get_leaves()

## Predicate Verb
get_phrase_type_regex(x, "VP") %>%
    take() %>%
    get_phrase_type_regex("(VB|MD)") %>%
    take() %>%
    get_leaves()

## Direct Object
get_phrase_type_regex(x, "VP") %>%
    take() %>%
    get_phrase_type_regex("NP") %>%
    take() %>%
    get_leaves()

## End(Not run)

trinker/parsent documentation built on May 31, 2019, 9:41 p.m.