katadasar: Word stemming for Bahasa Indonesia

Description Usage Arguments Author(s) References Examples

Description

Provides a function to retrieve word stem (a.k.a. word stemming) for Bahasa Indonesia using Nazief and Andriani's algorithm. It consists of set of functions to remove prefixes, suffixes or both, but still unable for infixes removal. This package is ported from C sharp code provided by csharp-indonesia http://www.csharp-indonesia.com/2014/07/algoritma-stemming-pencarian-kata-dasar.html. Credit goes to original author(s).

katadasar or katadasaR checks if a word is word stem according dictionary and do stemming process if it is an affixed word.

Usage

1
katadasaR(kata, kamus = NULL)

Arguments

kata

string vector of length 1, word or token from which word stem is retrieved.

kamus

string vector, additional dictionary to be included.

Author(s)

Nur Andi Setiabudi nurandi.mail@gmail.com

References

https://en.wikipedia.org/wiki/Stemming
http://crpit.scem.westernsydney.edu.au/confpapers/CRPITV38Asian.pdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
## Stem one word
  katadasaR("kemanusiaan")

## A set of words
 words <- c("jakarta", "seminar", "penggunaan", "menggurui", "pelajaran", "dimana")
 sapply(words, katadasaR)


## End(Not run)

nurandi/katadasaR documentation built on Feb. 20, 2022, 3:33 p.m.