NLP Tag Sets

Share:

Description

Tag sets frequently used in Natural Language Processing.

Usage

1
2
3
4

Details

Penn_Treebank_POS_tags and Brown_POS_tags provide, respectively, the Penn Treebank POS tags (http://www.cis.upenn.edu/~treebank) and the POS tags used for the Brown corpus (http://www.hit.uib.no/icame/brown/bcm.html), both as data frames with the following variables:

entry

a character vector with the POS tags

description

a character vector with short descriptions of the tags

examples

a character vector with examples for the tags

Universal_POS_tags provides the universal POS tagset introduced by Slav Petrov, Dipanjan Das, and Ryan McDonald (http://arxiv.org/abs/1104.2086), as a data frame with character variables entry and description.

Universal_POS_tags_map is a named list of mappings from language and treebank specific POS tagsets to the universal POS tags, with elements named en-ptb and en-brown giving the mappings, respectively, for the Penn Treebank and Brown POS tags.

Source

http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html, http://www.comp.leeds.ac.uk/ccalas/tagsets/brown.html, https://code.google.com/p/universal-pos-tags/.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))

## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))

## Universal POS tags
Universal_POS_tags

## Available mappings to universal POS tags
names(Universal_POS_tags_map)