tagsets | R Documentation |
Tag sets frequently used in Natural Language Processing.
Penn_Treebank_POS_tags
Brown_POS_tags
Universal_POS_tags
Universal_POS_tags_map
Penn_Treebank_POS_tags
and Brown_POS_tags
provide,
respectively, the Penn Treebank POS tags
(https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2)
and the POS tags used for the Brown corpus
(https://en.wikipedia.org/wiki/Brown_Corpus),
both as data frames with the following variables:
a character vector with the POS tags
a character vector with short descriptions of the tags
a character vector with examples for the tags
Universal_POS_tags
provides the universal POS tagset introduced
by Slav Petrov, Dipanjan Das, and Ryan McDonald
(\Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.1104.2086")}), as a data frame with character
variables entry
and description
.
Universal_POS_tags_map
is a named list of mappings from
language and treebank specific POS tagsets to the universal POS tags,
with elements named ‘en-ptb’ and ‘en-brown’ giving the
mappings, respectively, for the Penn Treebank and Brown POS tags.
https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, http://www.nltk.org/nltk_data/, https://github.com/slavpetrov/universal-pos-tags.
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))
## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))
## Universal POS tags
Universal_POS_tags
## Available mappings to universal POS tags
names(Universal_POS_tags_map)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.