View source: R/parse.pos.tags.R
parse.pos.tags | R Documentation |
Function for extracting textual data from annotated corpora. It uderstands Stanford Tagger, TreeTagger TaKIPI (a tagger for Polish), and Alpino (a tagger for Dutch) output formats. Either part-of-speech tags, or words, or lemmata can be extracted.
parse.pos.tags(input.text, tagger = "stanford", feature = "pos")
input.text |
any string of characters (e.g. vector) containing markup tags that have to be deleted. |
tagger |
choose the input format: "stanford" for Stanford Tagger, "treetagger" for TreeTagger, "takipi" for TaKIPI. |
feature |
choose "pos" (default), "word", or "lemma" (this one is not available for the Stanford-formatted input). |
If the function is applied to a single text, then a vector of extracted features is returned. If it is applied to a corpus (a list, preferably of a class "stylo.corpus"), then a list of preprocessed texts are returned.
Maciej Eder
load.corpus
, txt.to.words
,
txt.to.words.ext
, txt.to.features
text = "I_PRP have_VBP just_RB returned_VBN from_IN a_DT visit_NN
to_TO my_PRP$ landlord_NN -_: the_DT solitary_JJ neighbor_NN that_IN
I_PRP shall_MD be_VB troubled_VBN with_IN ._. This_DT is_VBZ certainly_RB
a_DT beautiful_JJ country_NN !_. In_IN all_DT England_NNP ,_, I_PRP do_VBP
not_RB believe_VB that_IN I_PRP could_MD have_VB fixed_VBN on_IN a_DT
situation_NN so_RB completely_RB removed_VBN from_IN the_DT stir_VB of_IN
society_NN ._."
parse.pos.tags(text, tagger = "stanford", feature = "word")
parse.pos.tags(text, tagger = "stanford", feature = "pos")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.