wordStem: Stem Turkish tokens

Description Usage Arguments Details Value References Examples

Description

This function attempts to stem Turkish tokens using a look-up table (derived from Nuve) as a fast substitute for more complex but more accurate morphological analysis. If tokens contain an apostrophe, only characters before are stemmed and the remainder discarded.

Usage

1

Arguments

x

A token or a vector of tokens

...

Extra arguments, currently ignored

Details

This code should work the same way as the original Java implementation. The interface on the other hand is designed to work feel like the SnowballC package.

Note: This function assumes that all proper nouns are capitalized and other words are not. (The stemmer is built around a look up table) so you may wish to check that non-proper nouns starting sentences are lowercased appropriately in the input.

Value

A stemmed token or vector of stemmed tokens, or the originals if no stems could be found

References

Examples

1
2
  toks <- c("kitapçığında", "kitapçıdaki", "İstanbul'da")
  wordStem(toks)

conjugateprior/Resha documentation built on May 20, 2019, 5:20 p.m.