cleanAbstracts: clean data

Description Usage Arguments See Also Examples

Description

remove Punctuations, remove Numbers, Translate characters to lower or upper case, remove stopwords, remove user specified words, Stemming words.

Usage

1
2
cleanAbstracts(abstracts, rmNum = TRUE, tolw = TRUE, toup = FALSE,
  rmWords = TRUE, yrWords = NULL, stemDoc = FALSE)

Arguments

abstracts

output of getAbstracts, or just a paragraph of text

rmNum

Remove the text document with any numbers in it or not

tolw

Translate characters in character vectors to lower case or not

toup

Translate characters in character vectors to upper case or not

rmWords

Remove a set of English stopwords (e.g., 'the') or not

yrWords

A character vector listing the words to be removed.

stemDoc

Stem words in a text document using Porter's stemming algorithm.

See Also

getAbstracts

Examples

1
2
3
4
5
# Abs=getAbstracts(c("22693232", "22564732"))
# cleanAbs=cleanAbstracts(Abs)

# text="Jobs received a number of honors and public recognition." 
# cleanD=cleanAbstracts(text)


Search within the PubMedWordcloud package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.