Description Usage Arguments Value Author(s) Examples
Prepares a certain data vector to build a corpus from. It also filters stopwords, performes stemming, strips whitespace and removes punctuation. You could also use the tm Package to build the Corpus from scratch but this function makes it easy for repeated generations of VCorpus objects.
1 | corpus_gen(data.vector, lang, furtherStops = NULL)
|
data.vector |
A vector which contains a String for each Document: c("DocA", "DocB", ..., "DocN") |
lang |
Language as a String in which the Documents are. Default is "english". This param also has influence in which stopwords are filtered in the generation step. |
furtherStops |
a Vector of words which should also filtered from the corpus beside the normal stopwords |
VCorpus Object
MFinst
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function (data.vector, lang, furtherStops = NULL)
{
corpus = VCorpus(VectorSource(as.vector(data.vector)), readerControl = list(language = lang))
corpus = tm_map(corpus, content_transformer(tolower))
corpus = tm_map(corpus, stripWhitespace)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, stemDocument, lang)
corpus = tm_map(corpus, removeWords, stopwords(lang))
if (!is.null(furtherStops)) {
corpus = tm_map(corpus, removeWords, furtherStops)
}
return(corpus)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.