tagsets: NLP Tag Sets

tagsetsR Documentation

NLP Tag Sets

Description

Tag sets frequently used in Natural Language Processing.

Usage

Penn_Treebank_POS_tags
Brown_POS_tags
Universal_POS_tags
Universal_POS_tags_map

Details

Penn_Treebank_POS_tags and Brown_POS_tags provide, respectively, the Penn Treebank POS tags (https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2) and the POS tags used for the Brown corpus (https://en.wikipedia.org/wiki/Brown_Corpus), both as data frames with the following variables:

entry

a character vector with the POS tags

description

a character vector with short descriptions of the tags

examples

a character vector with examples for the tags

Universal_POS_tags provides the universal POS tagset introduced by Slav Petrov, Dipanjan Das, and Ryan McDonald (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.1104.2086")}), as a data frame with character variables entry and description.

Universal_POS_tags_map is a named list of mappings from language and treebank specific POS tagsets to the universal POS tags, with elements named ‘⁠en-ptb⁠’ and ‘⁠en-brown⁠’ giving the mappings, respectively, for the Penn Treebank and Brown POS tags.

Source

https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, http://www.nltk.org/nltk_data/, https://github.com/slavpetrov/universal-pos-tags.

Examples

## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))

## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))

## Universal POS tags
Universal_POS_tags

## Available mappings to universal POS tags
names(Universal_POS_tags_map)

NLP documentation built on Sept. 11, 2024, 6:59 p.m.

Related to tagsets in NLP...