pos: Parts of Speech Tagging

Description Usage Arguments Value References See Also Examples

Description

pos - Apply part of speech tagger to transcript(s).

pos.by - Apply part of speech tagger to transcript(s) by zero or more grouping variable(s).

pos.tags - Useful for interpreting the parts of speech tags created by pos and pos.by.

Usage

1
2
3
4
5
6
7
8
  pos(text.var, parallel = FALSE, na.omit = FALSE,
    digits = 1, progress.bar = TRUE, percent = TRUE,
    zero.replace = 0, gc.rate = 10)

  pos.by(text.var, grouping.var = NULL, digits = 1,
    percent = TRUE, zero.replace = 0, ...)

  pos.tags(type = "pretty")

Arguments

text.var

The text variable

parallel

logical. If TRUE attempts to run the function on multiple cores. Note that this may not mean a speed boost if you have one core or if the data set is smaller as the cluster takes time to create.

na.omit

logical. If TRUE missing values (NA) will be omitted.

digits

Integer; number of decimal places to round when printing.

progress.bar

logical. If TRUE attempts to provide a OS appropriate progress bar. If parallel is TRUE this argument is ignored. Note that setting this argument to TRUE may slow down the function.

percent

logical. If TRUE output given as percent. If FALSE the output is proportion.

zero.replace

Value to replace 0 values with.

gc.rate

An integer value. This is a necessary argument because of a problem with the garbage collection in the openNLP function that pos wraps. Consider adjusting this argument upward if the error java.lang.OutOfMemoryError occurs.

grouping.var

The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

...

Other argument supplied to pos.

type

An optional character string giving the output of the pos tags. This must be one of the strings "pretty" (a left justified version of the output optimized for viewing but not good for export), "matrix" (a matrix version of the output), "dataframe"\ "df" (a dataframe version of the output), "all" (a list of all three of the previous output types).

Value

pos returns a list of 4:

text

The original text

POStagged

The original words replaced with parts of speech in context.

POSprop

Dataframe of the proportion of parts of speech by row.

POSfreq

Dataframe of the frequency of parts of speech by row.

POSrnp

Dataframe of the frequency and proportions of parts of speech by row.

percent

The value of percent used for plotting purposes.

zero.replace

The value of zero.replace used for plotting purposes.

pos.by returns a list of 6:

text

The original text

POStagged

The original words replaced with parts of speech in context.

POSprop

Dataframe of the proportion of parts of speech by row.

POSfreq

Dataframe of the frequency of parts of speech by row.

POSrnp

Dataframe of the frequency and proportions of parts of speech by row.

pos.by.prop

Dataframe of the proportion of parts of speech by grouping variable.

pos.by.freq

Dataframe of the frequency of parts of speech by grouping variable.

pos.by.rnp

Dataframe of the frequency and proportions of parts of speech by grouping variable.

percent

The value of percent used for plotting purposes.

zero.replace

The value of zero.replace used for plotting purposes.

References

http:/opennlp.apache.org

See Also

tagPOS

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
posdat <- pos(DATA$state)
ltruncdf(posdat, 7, 4)
## str(posdat)
names(posdat)
posdat$text           #original text
posdat$POStagged      #words replaced with parts of speech
posdat$POSprop[, 1:8] #proportion of parts of speech by row
posdat$POSfreq        #frequency of parts of speech by row

out1 <- pos(DATA$state, parallel = TRUE) # not always useful
ltruncdf(out1, 7, 4)

#use pos.tags to interpret part of speech tags used by pos & pos.by
pos.tags()[1:10, ]
pos.tags("matrix")[1:10, ]
pos.tags("dataframe")[1:10, ]
pos.tags("df")[1:10, ]
ltruncdf(pos.tags("all"), 3)

posbydat <- with(DATA, pos.by(state, sex))
names(posbydat)
ltruncdf(posbydat, 7, 4)
truncdf(posbydat$pos.by.prop, 4)

POSby <- with(DATA, pos.by(state, list(adult, sex)))
plot(POSby, values = TRUE, digits = 2)
#or more quickly - reuse the output from before
out2 <- with(DATA, pos.by(posbydat, list(adult, sex)))

trinker/qdap2 documentation built on May 31, 2019, 9:47 p.m.