Description Usage Arguments Value Author(s) Examples
Function to generate a dtm and an tfidf from a corpus
1 | matrix_gen(corpus = NULL, structKey = "m", dtmTokenizer = NULL)
|
corpus |
|
structKey |
key of structure to be generated Possible Values are: "tf" for Term frequency Matrix "dtm" for DocumentTermMatrix (tm Package Object) "m" for a Matrix of terms "v" for a sorted Vector of terms "tfidf" for a term frequency inverse document frequency Matrix |
dtmTokenizer |
Used if word combinations matter. You can use the already initilized tokenizer in the package or init your own with the RWeka Package. |
Structure specified by the structKey
MFinst
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function (corpus = NULL, structKey = "m", dtmTokenizer = NULL)
{
if (!is.null(dtmTokenizer)) {
dtm = DocumentTermMatrix(corpus, control = list(tokenize = phraseTokenizer))
}
else {
dtm = DocumentTermMatrix(corpus)
}
if (structKey == "dtm") {
return(dtm)
}
m = as.matrix(dtm)
if (structKey == "m") {
return(m)
}
v = sort(colSums(m), decreasing = TRUE)
if (structKey == "v") {
return(v)
}
d = data.frame(word = names(v), freq = v)
if (structKey == "d" || structKey == "tf") {
return(d)
}
dtm.tfidf = weightTfIdf(dtm)
dtm.tfidf = removeSparseTerms(dtm.tfidf, 0.999)
tfidf.matrix = as.matrix(dtm.tfidf)
if (structKey == "tfidf") {
return(dtm.tfidf)
}
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.