dev/Models/UniPOS/Readme.md

Pre-trained universal POS tagging models for 40+ languages

Pre-trained models are learned using training data from the Universal Dependencies (UD) v2.0. The universal POS tagging accuracies (labeled as UPOS) w.r.t. gold-standard segmentation on UD v2.0 (CoNLL 2017 shared task) test sets are as follows:

Language                 | ltcode        | UPOS  
------------------------ + ------------- + ------
UD_Ancient_Greek         | grc           | 83.61%
UD_Ancient_Greek-PROIEL  | grc_proiel    | 94.69%
UD_Arabic                | ar            | 92.76%
UD_Basque                | eu            | 91.53%
UD_Bulgarian             | bg            | 96.25%
UD_Catalan               | ca            | 96.42%
UD_Chinese               | zh            | 89.12%
UD_Croatian              | hr            | 95.04%
UD_Czech                 | cs            | 97.76%
UD_Czech-CAC             | cs_cac        | 98.05%
UD_Czech-CLTT            | cs_cltt       | 96.74%
UD_Danish                | da            | 93.48%
UD_Dutch                 | nl            | 90.73%
UD_Dutch-LassySmall      | nl_lassysmall | 95.77%
UD_English               | en            | 92.88%
UD_English-LinES         | en_lines      | 94.00%
UD_English-ParTUT        | en_partut     | 92.89%
UD_Estonian              | et            | 86.66%
UD_Finnish               | fi            | 92.11%
UD_Finnish-FTB           | fi_ftb        | 89.27%
UD_French                | fr            | 95.39%
UD_French-ParTUT         | fr_partut     | 91.45%
UD_French-Sequoia        | fr_sequoia    | 95.63%
UD_Galician              | gl            | 96.12%
UD_Galician-TreeGal      | gl_treegal    | 85.91%
UD_German                | de            | 90.24%
UD_Gothic                | got           | 93.48%
UD_Greek                 | el            | 94.24%
UD_Hebrew                | he            | 93.03%
UD_Hindi                 | hi            | 94.91%
UD_Hungarian             | hu            | 87.47%
UD_Indonesian            | id            | 90.80%
UD_Irish                 | ga            | 82.36%
UD_Italian               | it            | 96.22%
UD_Japanese              | ja            | 95.12%
UD_Korean                | ko            | 89.33%
UD_Latin                 | la            | 81.72%
UD_Latin-ITTB            | la_ittb       | 96.87%
UD_Latin-PROIEL          | la_proiel     | 94.31%
UD_Latvian               | lv            | 86.51%
UD_Norwegian-Bokmaal     | no_bokmaal    | 94.47%
UD_Norwegian-Nynorsk     | no_nynorsk    | 94.46%
UD_Old_Church_Slavonic   | cu            | 92.61%
UD_Persian               | fa            | 95.64%
UD_Polish                | pl            | 94.18%
UD_Portuguese            | pt            | 94.72%
UD_Portuguese-BR         | pt_br         | 95.49%
UD_Romanian              | ro            | 95.79%
UD_Russian               | ru            | 92.62%
UD_Russian-SynTagRus     | ru_syntagrus  | 96.88%
UD_Slovak                | sk            | 89.75%
UD_Slovenian             | sl            | 94.62%
UD_Slovenian-SST         | sl_sst        | 87.59%
UD_Spanish               | es            | 95.32%
UD_Spanish-AnCora        | es_ancora     | 96.62%
UD_Swedish               | sv            | 94.18%
UD_Swedish-LinES         | sv_lines      | 92.65%
UD_Turkish               | tr            | 92.10%
UD_Urdu                  | ur            | 91.66%
UD_Vietnamese            | vi            | 86.78%

Results when experiments were conducted on UD v1.3 are here.



bnosac/RDRPOSTagger documentation built on May 8, 2019, 3:43 p.m.