clean data

Share:

Description

remove Punctuations, remove Numbers, Translate characters to lower or upper case, remove stopwords, remove user specified words, Stemming words.

Usage

1
2
cleanAbstracts(abstracts, rmNum = TRUE, tolw = TRUE, toup = FALSE,
  rmWords = TRUE, yrWords = NULL, stemDoc = FALSE)

Arguments

abstracts

output of getAbstracts, or just a paragraph of text

rmNum

Remove the text document with any numbers in it or not

tolw

Translate characters in character vectors to lower case or not

toup

Translate characters in character vectors to upper case or not

rmWords

Remove a set of English stopwords (e.g., 'the') or not

yrWords

A character vector listing the words to be removed.

stemDoc

Stem words in a text document using Porter's stemming algorithm.

See Also

getAbstracts

Examples

1
2
3
4
5
# Abs=getAbstracts(c("22693232", "22564732"))
# cleanAbs=cleanAbstracts(Abs)

# text="Jobs received a number of honors and public recognition."
# cleanD=cleanAbstracts(text)