pattern_pos: POS tagging using the python pattern package including...

Description Usage Arguments Value Examples

Description

POS tagging using the python pattern package including relations. See http://www.clips.ua.ac.be/pattern. Only dutch/french/english/german/spanish/italian

Usage

1
2
pattern_pos(x, language, digest = FALSE, as_html = FALSE, core = FALSE,
  tagset = "penn")

Arguments

x

a character string in UTF-8

language

a character string with possible values 'dutch', 'french', 'english', 'german', 'spanish', 'italian'

digest

logical indicating to digest::digest the message

as_html

logical indicating to return only the xml (for debugging)

core

logical indicating to return only the core fields sentence.id, sentence.language, chunk.id, chunk.type, chunk.pnp, chunk.relation, word.id, word, word.type, word.lemma or if deeper level chunks are found, add also these deeper level information are added as columns chunk.leveli.type/chunk.leveli.relation/chunkid.leveli to the data.frame. Defaults to FALSE, indicating to also add the deeper levels. If set to TRUE, rbind-ing will be easier as it makes sure the number of columns is always only the core columns. See the examples.

tagset

character with the tagset to use. Defaults to 'penn' (the Penn Treebank tagset). Other options are 'universal' for the universal tagset, 'WOTAN' for Dutch, 'parole' for Spanish, 'STTS' for German. For French/Italian, if you did not fill in 'penn' or 'universal' it will use the tagset from the Pattern model building phase.

Value

a data.frame with at least the elements sentence.id, sentence.language, chunk.id, chunk.type, chunk.pnp, chunk.relation, word.id, word, word.type, word.lemma or an xml object if as_xml is set to TRUE. Mark that by default all POS tags are mapped on the Penn Treebank tags as available inside this package in penn_treebank_postags.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
x <- "Dus godvermehoeren met pus in alle puisten, zei die schele van Van Bukburg 
 en hij had nog gelijk ook. Er was toen dat liedje van tietenkonttieten kont tieten kontkontkont, 
 maar dat hoefden we geenseens niet te zingen"
pattern_pos(x = x, language = 'dutch')

x <- "Il pleure dans mon coeur comme il pleut sur la ville.
 Quelle est cette langueur qui penetre mon coeur?"
pattern_pos(x = x, language = 'french')

x <- "BNOSAC provides consultancy in open source analytical intelligence. 
 We gather dedicated open source software engineers with a focus on data mining, 
 business intelligence, statistical engineering and advanced artificial intelligence."
pattern_pos(x = x, language = 'english')

x <- "Der Turmer, der schaut zu Mitten der Nacht. 	
 Hinab auf die Graber in Lage
 Der Mond, der hat alles ins Helle gebracht.
 Der Kirchhof, er liegt wie am Tage.
 Da regt sich ein Grab und ein anderes dann."
pattern_pos(x = x, language = 'german')

x <- "Pasaron cuatro jinetes, sobre jacas andaluzas
 con trajes de azul y verde, con largas capas oscuras."
pattern_pos(x = x, language = 'spanish')

x <- "Avevamo vegliato tutta la notte - i miei amici ed io sotto lampade 
 di moschea dalle cupole di ottone traforato, stellate come le nostre anime, 
 perche come queste irradiate dal chiuso fulgore di un cuore elettrico.
 Avevamo lungamente calpestata su opulenti tappeti orientali la nostra atavica accidia, 
 discutendo davanti ai confini estremi della logica 
 ed annerendo molta carta di frenetiche scritture."
pattern_pos(x = x, language = 'italian')

pattern_pos(x = x, language = 'italian', core = TRUE)
pattern_pos(x = x, language = 'italian', core = FALSE)
pattern_pos(x = x, language = 'italian', as_html = TRUE)

bnosac/pattern.nlp documentation built on May 12, 2019, 11:27 p.m.