kRp.POS.tags: Get elaborated word tag definitions

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/kRp.POS.tags.R

Description

This function can be used to get a set of part-of-speech (POS) tags for a given language. These tag sets should conform with the ones used by TreeTagger.

Usage

1
2
kRp.POS.tags(lang = get.kRp.env(lang = TRUE), list.classes = FALSE,
  list.tags = FALSE, tags = c("words", "punct", "sentc"))

Arguments

lang

A character string defining a language (see details for valid choices).

list.classes

Logical, if TRUE only the known word classes for the chosen language will me returned.

list.tags

Logical, if TRUE only the POS tags for the chosen language will me returned.

tags

A character vector with at least one of "words", "punct" or "sentc".

Details

Currently supported languages are:

For the internal tokenizer a small subset of tags is also defined, available through lang="kRp". If you don't know the language your text was written in, the function guess.lang should be able to detect it.

With the element tags you can specify if you want all tag definitions, or a subset, e.g. tags only for punctuation and sentence endings (that is, you need to call for both "punct" and "sentc" to get all punctuation tags).

The function is not so much intended to be used directly, but it is called by several other functions internally. However, it can still be useful to directly examine available POS tags.

Value

If list.classes=FALSE and list.tags=FALSE returns a matrix with word tag definitions of the given language. The matrix has three columns:

tag:

Word tag

class:

Respective word class

desc:

"Human readable" description of what the tag stands for

Otherwise a vector with the known word classes or POS tags for the chosen language (and probably tag subset) will be returned. If both list.classes and list.tags are TRUE, still only the POS tags will be returned.

Author(s)

m.eik michalke [email protected], support for Spanish contributed by Earl Brown [email protected], support for Italian contributed by Alberto Mirisola.

References

Santorini, B. (1991). Part-of-Speech Tagging Guidelines for the Penn Treebank Project. URL: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Penn-Treebank-Tagset.pdf

Schiller, A., Teufel, S., Stockert, C. & Thielen, C. (1995). Vorl\"aufge Guidelines f\"ur das Tagging deutscher Textcorpora mit STTS. URL: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/stts_guide.pdf

Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A. & Divjak, D. (2008). Designing and evaluating Russian tagsets. In: Proc. LREC 2008, Marrakech. URL: http://corpus.leeds.ac.uk/mocky/

See Also

get.kRp.env

Examples

1
tags.de <- kRp.POS.tags("de")

Example output

Loading required package: data.table

koRpus documentation built on May 30, 2017, 12:47 a.m.