Description Usage Arguments Value Author(s) Examples
Function to generate a dtm and an tfidf from a corpus
1  | matrix_gen(corpus = NULL, structKey = "m", dtmTokenizer = NULL)
 | 
corpus | 
|
structKey | 
 key of structure to be generated Possible Values are: "tf" for Term frequency Matrix "dtm" for DocumentTermMatrix (tm Package Object) "m" for a Matrix of terms "v" for a sorted Vector of terms "tfidf" for a term frequency inverse document frequency Matrix  | 
dtmTokenizer | 
 Used if word combinations matter. You can use the already initilized tokenizer in the package or init your own with the RWeka Package.  | 
Structure specified by the structKey
MFinst
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  | ##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.
## The function is currently defined as
function (corpus = NULL, structKey = "m", dtmTokenizer = NULL)
{
    if (!is.null(dtmTokenizer)) {
        dtm = DocumentTermMatrix(corpus, control = list(tokenize = phraseTokenizer))
    }
    else {
        dtm = DocumentTermMatrix(corpus)
    }
    if (structKey == "dtm") {
        return(dtm)
    }
    m = as.matrix(dtm)
    if (structKey == "m") {
        return(m)
    }
    v = sort(colSums(m), decreasing = TRUE)
    if (structKey == "v") {
        return(v)
    }
    d = data.frame(word = names(v), freq = v)
    if (structKey == "d" || structKey == "tf") {
        return(d)
    }
    dtm.tfidf = weightTfIdf(dtm)
    dtm.tfidf = removeSparseTerms(dtm.tfidf, 0.999)
    tfidf.matrix = as.matrix(dtm.tfidf)
    if (structKey == "tfidf") {
        return(dtm.tfidf)
    }
  }
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.