wordStem: Get the common root/stem of words

Description Usage Arguments Details Value Author(s) References Examples

Description

This function computes the stems of each of the given words in the vector. This reduces a word to its base component, making it easier to compare words like win, winning, winner. See http://snowball.tartarus.org/ for more information about the concept and algorithms for stemming.

Usage

1
wordStem(words, language = character(), warnTested = FALSE)

Arguments

words

a character vector of words whose stems are to be computed.

language

the name of a recognized language for the package. This should either be a single string which is an element in the vector returned by getStemLanguages, or alternatively a character vector of length 3 giving the names of the routines for creating and closing a Snowball SN\_env environment and performing the stem (in that order). See the example below.

warnTested

an option to control whether a warning is issued about languages which have not been explicitly tested as part of the unit testing of the code. For the most part, one can ignore these warnings and so they are turned off. In the future, we might consider controlling this with a global option, but for now we suppress the warnings by default.

Details

This uses Dr. Martin Porter's stemming algorithm and the interface generated by Snowball http://snowball.tartarus.org/.

Value

A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the word.

Author(s)

Duncan Temple Lang <duncan@wald.ucdavis.edu>

References

See http://snowball.tartarus.org/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
   # Simple example
   # "win"    "win"    "winner"
 wordStem(c("win", "winning", 'winner'))


  # test the supplied vocabulary.
 testWords = readLines(system.file("words", "english", "voc.txt", package = "Rstem"))
 validate = readLines(system.file("words", "english", "output.txt", package = "Rstem"))

## Not run: 
 # Read the test words directly from the snowball site over the Web
 testWords = readLines(url("http://snowball.tartarus.org/english/voc.txt"))

## End(Not run)


 testOut = wordStem(testWords)
 all(validate == testOut)

  # Specify the language from one of the built-in languages.
 testOut = wordStem(testWords, "english")
 all(validate == testOut)

  # To illustrate using the dynamic lookup of symbols that allows one
  # to easily add new languages or create and close environment
  # routines (for example, to manage pools if this were an efficiency
  # issue!)
 testOut = wordStem(testWords, c("testDynCreate", "testDynClose", "testDynStem"))

Rstem documentation built on May 2, 2019, 6:47 p.m.

Related to wordStem in Rstem...