parse.pos.tags: Extract POS-tags or Words from Annotated Corpora

Description Usage Arguments Value Author(s) See Also Examples

View source: R/parse.pos.tags.R

Description

Function for extracting textual data from annotated corpora. It uderstands Stanford Tagger, TreeTagger TaKIPI (a tagger for Polish), and Alpino (a tagger for Dutch) output formats. Either part-of-speech tags, or words, or lemmata can be extracted.

Usage

1
parse.pos.tags(input.text, tagger = "stanford", feature = "pos")

Arguments

input.text

any string of characters (e.g. vector) containing markup tags that have to be deleted.

tagger

choose the input format: "stanford" for Stanford Tagger, "treetagger" for TreeTagger, "takipi" for TaKIPI.

feature

choose "pos" (default), "word", or "lemma" (this one is not available for the Stanford-formatted input).

Value

If the function is applied to a single text, then a vector of extracted features is returned. If it is applied to a corpus (a list, preferably of a class "stylo.corpus"), then a list of preprocessed texts are returned.

Author(s)

Maciej Eder

See Also

load.corpus, txt.to.words, txt.to.words.ext, txt.to.features

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
text = "I_PRP have_VBP just_RB returned_VBN from_IN a_DT visit_NN 
  to_TO my_PRP$ landlord_NN -_: the_DT solitary_JJ neighbor_NN  that_IN 
  I_PRP shall_MD be_VB troubled_VBN with_IN ._. This_DT is_VBZ certainly_RB 
  a_DT beautiful_JJ country_NN !_. In_IN all_DT England_NNP ,_, I_PRP do_VBP 
  not_RB believe_VB that_IN I_PRP could_MD have_VB fixed_VBN on_IN a_DT 
  situation_NN so_RB completely_RB removed_VBN from_IN the_DT stir_VB of_IN 
  society_NN ._."

parse.pos.tags(text, tagger = "stanford", feature = "word")
parse.pos.tags(text, tagger = "stanford", feature = "pos")
  

stylo documentation built on Oct. 9, 2018, 1:04 a.m.

Related to parse.pos.tags in stylo...