get_leaves: Extract Tokens from a Phrase

Description Usage Arguments Value Examples

Description

Extract the tokens from a phrase.

Usage

1
get_leaves(x, regex = "@tokens")

Arguments

x

A list/vetor of phrases

regex

A regular expression to extract tokens. Default extracts tokens: "(?<=\s)[A-Za-z'-]+(?=\))". Use "(?<=\s)[A-Za-z'-]+(?=\))" to extract words.

Value

Returns a list of vectors of extracted tokens.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Not run: 
txt <- c(
    "Really, I like chocolate because it is good. It smells great.",
    "Robots are rather evil and most are devoid of decency.",
    "He is my friend.",
    "Clifford the big red dog ate my lunch.",
    "Professor Johns can not teach",
    "",
    NA
)

parse_ann <- parse_annotator()
(x <- parser(txt, parse_ann))

get_leaves(get_phrase_type_regex(x, "NP"))

## As a dplyr chain
library(dplyr)
x %>%
    get_phrase_type_regex("NP") %>%
    get_leaves()

## Just words (in this case no difference)
x %>%
    get_phrase_type_regex("NP") %>%
    get_leaves("@words")

## End(Not run)

trinker/parser documentation built on May 31, 2019, 9:41 p.m.