Description Usage Arguments Details Value Examples
Calculates a sparse document-term matrix calculated from a given object of class
kRp.corpus
and adds it to the object's feature list.
You can also calculate the term frequency inverted document frequency value (tf-idf) for each
term.
1 2 3 4 5 6 7 8 |
obj |
An object of class |
terms |
A character string defining the |
case.sens |
Logical, whether terms should be counted case sensitive. |
tfidf |
Logical,
if |
as.feature |
Logical,
whether the output should be just the sparse matrix or the input object with
that matrix added as a feature. Use |
The settings of terms
, case.sens
,
and tfidf
will be stored in the object's meta
slot,
so you can use corpusMeta(..., "doc_term_matrix")
to fetch it.
See the examples to learn how to limit the analysis to desired word classes.
Either an object of the input class or a sparse matrix of class
dgCMatrix
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # use readCorpus() to create an object of class kRp.corpus
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
myCorpus <- readCorpus(
dir=file.path(path.package("tm.plugin.koRpus"), "examples", "corpus"),
hierarchy=list(
Topic=c(
Winner="Reality Winner",
Edwards="Natalie Edwards"
),
Source=c(
Wikipedia_prev="Wikipedia (old)",
Wikipedia_new="Wikipedia (new)"
)
),
# use tokenize() so examples run without a TreeTagger installation
tagger="tokenize",
lang="en"
)
# get the document-term frequencies in a sparse matrix
myDTMatrix <- docTermMatrix(myCorpus, as.feature=FALSE)
# combine with filterByClass() to, e.g., exclude all punctuation
myDTMatrix <- docTermMatrix(filterByClass(myCorpus), as.feature=FALSE)
# instead of absolute frequencies, get the tf-idf values
myDTMatrix <- docTermMatrix(
filterByClass(myCorpus),
tfidf=TRUE,
as.feature=FALSE
)
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.