corpus_gen: Corpus Generator
In mfinst/TM-CoCit-Support-FM: What the package does (short line)

Description Usage Arguments Value Author(s) Examples

Prepares a certain data vector to build a corpus from. It also filters stopwords, performes stemming, strips whitespace and removes punctuation. You could also use the tm Package to build the Corpus from scratch but this function makes it easy for repeated generations of VCorpus objects.

1	corpus_gen(data.vector, lang, furtherStops = NULL)

`data.vector`	A vector which contains a String for each Document: c("DocA", "DocB", ..., "DocN")
`lang`	Language as a String in which the Documents are. Default is "english". This param also has influence in which stopwords are filtered in the generation step.
`furtherStops`	a Vector of words which should also filtered from the corpus beside the normal stopwords

VCorpus Object

MFinst

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (data.vector, lang, furtherStops = NULL)
{
    corpus = VCorpus(VectorSource(as.vector(data.vector)), readerControl = list(language = lang))
    corpus = tm_map(corpus, content_transformer(tolower))
    corpus = tm_map(corpus, stripWhitespace)
    corpus = tm_map(corpus, removePunctuation)
    corpus = tm_map(corpus, stemDocument, lang)
    corpus = tm_map(corpus, removeWords, stopwords(lang))
    if (!is.null(furtherStops)) {
        corpus = tm_map(corpus, removeWords, furtherStops)
    }
    return(corpus)
  }