tagsets: NLP Tag Sets

Description Usage Details Source Examples

Description

Tag sets frequently used in Natural Language Processing.

Usage

1
2
3
4

Details

Penn_Treebank_POS_tags and Brown_POS_tags provide, respectively, the Penn Treebank POS tags (https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2) and the POS tags used for the Brown corpus (http://www.hit.uib.no/icame/brown/bcm.html), both as data frames with the following variables:

entry

a character vector with the POS tags

description

a character vector with short descriptions of the tags

examples

a character vector with examples for the tags

Universal_POS_tags provides the universal POS tagset introduced by Slav Petrov, Dipanjan Das, and Ryan McDonald (https://arxiv.org/abs/1104.2086), as a data frame with character variables entry and description.

Universal_POS_tags_map is a named list of mappings from language and treebank specific POS tagsets to the universal POS tags, with elements named en-ptb and en-brown giving the mappings, respectively, for the Penn Treebank and Brown POS tags.

Source

https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, http://www.hit.uib.no/icame/brown/bcm.html, https://github.com/slavpetrov/universal-pos-tags.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))

## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))

## Universal POS tags
Universal_POS_tags

## Available mappings to universal POS tags
names(Universal_POS_tags_map)

Example output

[1] 45  3
entry: $
description: dollar
examples: $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$

entry: ``
description: opening quotation mark
examples: ` ``

entry: ''
description: closing quotation mark
examples: ' ''

entry: (
description: opening parenthesis
examples: ( [ {

entry: )
description: closing parenthesis
examples: ) ] }

entry: ,
description: comma
examples: ,

entry: -
description: dash
examples: -

entry: .
description: sentence terminator
examples: . ! ?

entry: :
description: colon or ellipsis
examples: : ; ...

entry: CC
description: conjunction, coordinating
examples: & 'n and both but either et for less minus neither nor or
        plus so therefore times v. versus vs. whether yet

entry: CD
description: numeral, cardinal
examples: mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one
        forty-seven 1987 twenty '79 zero two 78-degrees eighty-four IX
        '60s .025 fifteen 271,124 dozen quintillion DM2,000 ...

entry: DT
description: determiner
examples: all an another any both del each either every half la many
        much nary neither no some such that the them these this those

entry: EX
description: existential there
examples: there

entry: FW
description: foreign word
examples: gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si
        vous lutihaw alai je jour objets salutaris fille quibusdam pas
        trop Monte terram fiche oui corporis ...

entry: IN
description: preposition or conjunction, subordinating
examples: astride among uppon whether out inside pro despite on by
        throughout below within for towards near behind atop around if
        like until below next into if beside ...

entry: JJ
description: adjective or numeral, ordinal
examples: third ill-mannered pre-war regrettable oiled calamitous first
        separable ectoplasmic battery-powered participatory fourth
        still-to-be-named multilingual multi-disciplinary ...

entry: JJR
description: adjective, comparative
examples: bleaker braver breezier briefer brighter brisker broader
        bumper busier calmer cheaper choosier cleaner clearer closer
        colder commoner costlier cozier creamier crunchier cuter ...

entry: JJS
description: adjective, superlative
examples: calmest cheapest choicest classiest cleanest clearest closest
        commonest corniest costliest crassest creepiest crudest cutest
        darkest deadliest dearest deepest densest dinkiest ...

entry: LS
description: list item marker
examples: A A. B B. C C. D E F First G H I J K One SP-44001 SP-44002
        SP-44005 SP-44007 Second Third Three Two * a b c d first five
        four one six three two

entry: MD
description: modal auxiliary
examples: can cannot could couldn't dare may might must need ought
        shall should shouldn't will would
[1] 226   3
entry: (
description: opening parenthesis
examples: (

entry: )
description: closing parenthesis
examples: )

entry: *
description: negator
examples: not n't

entry: ,
description: comma
examples: ,

entry: --
description: dash
examples: --

entry: .
description: sentence terminator
examples: . ? ; ! :

entry: :
description: colon
examples: :

entry: ABL
description: determiner/pronoun, pre-qualifier
examples: quite such rather

entry: ABN
description: determiner/pronoun, pre-quantifier
examples: all half many nary

entry: ABX
description: determiner/pronoun, double conjunction or pre-quantifier
examples: both

entry: AP
description: determiner/pronoun, post-determiner
examples: many other next more last former little several enough most
        least only very few fewer past same Last latter less single
        plenty 'nough lesser certain various manye next-to-last
        particular final previous present nuf

entry: AP$
description: determiner/pronoun, post-determiner, genitive
examples: other's

entry: AP+AP
description: determiner/pronoun, post-determiner, hyphenated pair
examples: many-much

entry: AT
description: article
examples: the an no a every th' ever' ye

entry: BE
description: verb "to be", infinitive or imperative
examples: be

entry: BED
description: verb "to be", past tense, 2nd person singular or all
        persons plural
examples: were

entry: BED*
description: verb "to be", past tense, 2nd person singular or all
        persons plural, negated
examples: weren't

entry: BEDZ
description: verb "to be", past tense, 1st and 3rd person singular
examples: was

entry: BEDZ*
description: verb "to be", past tense, 1st and 3rd person singular,
        negated
examples: wasn't

entry: BEG
description: verb "to be", present participle or gerund
examples: being
   entry                                  description
1   VERB                 verbs (all tenses and modes)
2   NOUN                    nouns (common and proper)
3   PRON                                     pronouns
4    ADJ                                   adjectives
5    ADV                                      adverbs
6    ADP adpositions (prepositions and postpositions)
7   CONJ                                 conjunctions
8    DET                                  determiners
9    NUM                             cardinal numbers
10   PRT            particles or other function words
11     X   other: foreign words, typos, abbreviations
12     .                                  punctuation
 [1] "ar-padt"       "bg-btb"        "ca-cat3lb"     "cs-pdt"       
 [5] "da-ddt"        "de-negra"      "de-tiger"      "el-gdt"       
 [9] "en-brown"      "en-ptb"        "en-tweet"      "es-cast3lb"   
[13] "eu-eus3lb"     "fi-tdt"        "fr-paris"      "hu-szeged"    
[17] "it-isst"       "iw-mila"       "ja-kyoto"      "ja-verbmobil" 
[21] "ko-sejong"     "nl-alpino"     "pl-ipipan"     "pt-bosque"    
[25] "ru-rnc"        "sl-sdt"        "sv-talbanken"  "tu-metusbanci"
[29] "zh-ctb6"       "zh-sinica"    

NLP documentation built on Oct. 23, 2020, 6:18 p.m.

Related to tagsets in NLP...