lemmatizer: Lemmatize texts

Description Usage Arguments Details Value Examples

View source: R/lemmatizer.r

Description

Given a text vector, words' lemmata are returned

Usage

1
2
3
4
5
6
lemmatizer(
  rawtext,
  lang = "it",
  TreeTaggerPath = "C:/TreeTagger",
  parallel = TRUE
)

Arguments

rawtext

the raw texts to lemmatize

lang

language of the texts. Default to "it" (Italian). It support the following languages:

  • "it": Italian

  • "en": English

  • "de": German

  • "es": Spanish

  • "fr": French

  • "nl": Dutch

  • "pt": Portuguese

  • "ru": Russian

TreeTaggerPath

the file path of the local installation of Tree Tagger (default "C:/TreeTagger")

parallel

enables parallel processing to speed up the lemmatization process taking advantage of multiple cores (default TRUE). The number of cores is automatically set to all the available cores minus one

Details

the function is based on TreeTagger and the related R package koRpus. To install TreeTagger please refer to online documentation. Language specific files available in the following repository are also needed. The function returns the lemmata of "significant" words (nouns, names, adjectives, verbs, and adverbs) most commonly used in social science works. Also unrecognized words are returned.

Value

a text vector with lemmata (nouns, names, adjectives, verbs, adverbs and unrecognized words)

Examples

1
2
3
4
## Not run: 
dataframe$lemma <- lemmatizer(rawtext=dataframe$text, lang="it",
TreeTaggerPath = "C:/TreeTagger", parallel=TRUE)
## End(Not run)

nicolarighetti/textools documentation built on Oct. 16, 2021, 11:20 p.m.