CorrectS: Function to remove some forms of pluralization.

Description Usage Arguments Details Value Note Examples

Description

This function takes a character vector as input and removes some forms of pluralization from the ends of the words.

Usage

1
CorrectS(term_vec)

Arguments

term_vec

A character vector

Details

The entries of the vector should be single words or short n-grams without punctuation as the function only looks at the ends of strings. In other words, if entries are a paragraph of text. Only the final words will get de-pluralized. (Even then, if the final character is a period, as would be the case with paragraphs, it's likely that nothing will be de-pluralized.)

Value

Returns an object of class data.frame with three columns. The first column is the argument term_vec. The second column is the depluralized version of the words in term_vec. The third column is a logical, indicating whether or not the word in term_vec was changed.

Note

WARNING: This does make mistakes for irregular words. You should check its results manually. It tends to fail spectacularly for words ending in "es".

Examples

1
2
3
myvec <- c("banana", "bananas", "scientists", "large_armies")

CorrectS(term_vec=myvec)

ChengMengli/topic documentation built on May 31, 2019, 8:44 p.m.