lemmatize: Lemmatize Text

Description Usage Arguments Details Value See Also Examples

View source: R/lemmatize.R

Description

This function performs lemmatization on input text by reducing words to their base units.

Usage

1
2
3
4
5
6
lemmatize(
  inputText,
  method = "direct",
  treetaggerDirectory = NULL,
  progressBar = TRUE
)

Arguments

inputText

A character string or vector of character strings

method

Either 'direct' (which uses a predefined list of words and their lemmas) or 'treetagger' (which uses the software TreeTagger, implemented through the koRpus package)

treetaggerDirectory

the filepath to the location of your installation of the treetagger library (See Details below)

progressBar

Show a progress bar. Defaults to TRUE.

Details

This function is essentially a wrapper for the treetag function from the [koRpus] package. In turn, koRpus implements the TreeTagger software package (available here: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). The software must be downloaded and installed on your local computer in order to use the lemmatize function. Once installed, the treetaggerDirectory argument should consist of the path where the software was installed.

This function performs "lemmatization," which is one form of reducing words to their most basic units. It is more thorough than "stemming," which only removes suffixes. E.g. for the words "walked" and "dogs," both lemmatization and stemming would reduce the words to "walk" and "dog." However, stemming would ignore "ran" and "geese," while lemmatization would properly render these "run" and "goose."

Value

A dataframe with lemmatized text, as well as columns with information about parts of speech

See Also

the treetag function from the koRpus package, as well as the treetagger documentation: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Examples

1
2
3
4
5
6
7
8
myStrings = c("I walked in the park with both of my dogs.",
"The largest geese ran very fast.")
## Not run: 
lemmatized_data = lemmatize(myStrings, "~/path/to/TreeTagger")
lemmatized_data$lemma_text
## End(Not run)
# "I walk in the park with both of my dog."
# "The large goose run very fast."

nlanderson9/languagePredictR documentation built on June 10, 2021, 11 a.m.