matrix_gen: Matrix Generator

Description Usage Arguments Value Author(s) Examples

View source: R/matrix_gen.R

Description

Function to generate a dtm and an tfidf from a corpus

Usage

1
matrix_gen(corpus = NULL, structKey = "m", dtmTokenizer = NULL)

Arguments

corpus
structKey

key of structure to be generated Possible Values are: "tf" for Term frequency Matrix "dtm" for DocumentTermMatrix (tm Package Object) "m" for a Matrix of terms "v" for a sorted Vector of terms "tfidf" for a term frequency inverse document frequency Matrix

dtmTokenizer

Used if word combinations matter. You can use the already initilized tokenizer in the package or init your own with the RWeka Package.

Value

Structure specified by the structKey

Author(s)

MFinst

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (corpus = NULL, structKey = "m", dtmTokenizer = NULL)
{
    if (!is.null(dtmTokenizer)) {
        dtm = DocumentTermMatrix(corpus, control = list(tokenize = phraseTokenizer))
    }
    else {
        dtm = DocumentTermMatrix(corpus)
    }
    if (structKey == "dtm") {
        return(dtm)
    }
    m = as.matrix(dtm)
    if (structKey == "m") {
        return(m)
    }
    v = sort(colSums(m), decreasing = TRUE)
    if (structKey == "v") {
        return(v)
    }
    d = data.frame(word = names(v), freq = v)
    if (structKey == "d" || structKey == "tf") {
        return(d)
    }
    dtm.tfidf = weightTfIdf(dtm)
    dtm.tfidf = removeSparseTerms(dtm.tfidf, 0.999)
    tfidf.matrix = as.matrix(dtm.tfidf)
    if (structKey == "tfidf") {
        return(dtm.tfidf)
    }
  }

mfinst/TM-CoCit-Support-FM documentation built on March 4, 2020, 8:38 p.m.