lemmatize: Lemmatize Text
In nlanderson9/languagePredictR: Predict Outcomes from Natural Language

Description Usage Arguments Details Value See Also Examples

This function performs lemmatization on input text by reducing words to their base units.

lemmatize(
  inputText,
  method = "direct",
  treetaggerDirectory = NULL,
  progressBar = TRUE
)

`inputText`	A character string or vector of character strings
`method`	Either 'direct' (which uses a predefined list of words and their lemmas) or 'treetagger' (which uses the software `TreeTagger`, implemented through the `koRpus` package)
`treetaggerDirectory`	the filepath to the location of your installation of the `treetagger` library (See Details below)
`progressBar`	Show a progress bar. Defaults to TRUE.

This function is essentially a wrapper for the treetag function from the [koRpus] package. In turn, koRpus implements the TreeTagger software package (available here: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). The software must be downloaded and installed on your local computer in order to use the lemmatize function. Once installed, the treetaggerDirectory argument should consist of the path where the software was installed.

This function performs "lemmatization," which is one form of reducing words to their most basic units. It is more thorough than "stemming," which only removes suffixes. E.g. for the words "walked" and "dogs," both lemmatization and stemming would reduce the words to "walk" and "dog." However, stemming would ignore "ran" and "geese," while lemmatization would properly render these "run" and "goose."

A dataframe with lemmatized text, as well as columns with information about parts of speech

the treetag function from the koRpus package, as well as the treetagger documentation: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

myStrings = c("I walked in the park with both of my dogs.",
"The largest geese ran very fast.")
## Not run: 
lemmatized_data = lemmatize(myStrings, "~/path/to/TreeTagger")
lemmatized_data$lemma_text
## End(Not run)
# "I walk in the park with both of my dog."
# "The large goose run very fast."