tag_pos: Tag Text with Parts of Speech

Description Usage Arguments Value Examples

Description

A wrapper for NLP and openNLP to easily tag text with parts of speech. The openNLP annotator "computes Penn Treebank parse annotations using the Apache OpenNLP chunking parser for English."

Usage

1
2
tag_pos(text.var, engine = "openNLP", element.chunks = floor(2000 *
  (23.5/mean(sapply(text.var, nchar), na.rm = TRUE))), ...)

Arguments

text.var

The text string variable.

engine

The backend pat of speech tagger, either "openNLP" or "coreNLP". The default "openNLP" uses the openNLP package. If the user has the Stanford CoreNLP suite (‘http://stanfordnlp.github.io/CoreNLP/’) installed this can be used as the tagging backend instead.

element.chunks

The number of elements to include in a chunk. Chunks are passed through an lapply and size is kept within a tolerance because of memory allocation in the tagging process with Java.

...

Other arguments passed to tagger:::core_tagger including stanford.tagger = stansent::coreNLP_loc() and java.path, the path to CoreNLP and Java respectively. Use check_setup to check that Java is installed and of correct version and that Stanford's CoreNLP is installed and in root.

Value

Returns a list of part of speech tagged vectors. The pretty printing does not indicated this feature, but the words and parts of speech are easily accessible through indexing.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
(x <- tag_pos("They refuse to permit us to obtain the refuse permit"))
c(x) ## The true structure of a `tag_pos` object

(out1 <- tag_pos(sam_i_am))
tidy_pos(out1)
as_word_tag(out1)
count_tags(out1)
as_basic(out1)
as_universal(out1)
plot(out1)
## Not run: 
(out2 <- tag_pos(presidential_debates_2012$dialogue)) # ~40 sec run time
count_tags(out2)
count_tags(out2, by = presidential_debates_2012$person)
with(presidential_debates_2012, count_tags(out2, by = list(person, time)))
plot(out2)

## CoreNLP
tag_pos(sam_i_am, engine = 'coreNLP')

## End(Not run)

trinker/tagger documentation built on May 31, 2019, 10:42 p.m.