Description Usage Arguments Value See Also Examples
Constructs or coerces to a term-document matrix or a document-term matrix.
1 2 3 4 | TermDocumentMatrix(x, control = list())
DocumentTermMatrix(x, control = list())
as.TermDocumentMatrix(x, ...)
as.DocumentTermMatrix(x, ...)
|
x |
a corpus for the constructors and either a term-document matrix or a document-term matrix or a simple triplet matrix (package slam) or a term frequency vector for the coercing functions. |
control |
a named list of control options. There are local
options which are evaluated for each document and global options
which are evaluated once for the constructed matrix. Available local
options are documented in This is different for a Available global options are:
|
... |
the additional argument |
An object of class TermDocumentMatrix
or class
DocumentTermMatrix
(both inheriting from a
simple triplet matrix in package slam)
containing a sparse term-document matrix or document-term matrix. The
attribute weighting
contains the weighting applied to the
matrix.
termFreq
for available local control options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data("crude")
tdm <- TermDocumentMatrix(crude,
control = list(removePunctuation = TRUE,
stopwords = TRUE))
dtm <- DocumentTermMatrix(crude,
control = list(weighting =
function(x)
weightTfIdf(x, normalize =
FALSE),
stopwords = TRUE))
inspect(tdm[202:205, 1:5])
inspect(tdm[c("price", "prices", "texas"), c("127", "144", "191", "194")])
inspect(dtm[1:5, 273:276])
s <- SimpleCorpus(VectorSource(unlist(lapply(crude, as.character))))
m <- TermDocumentMatrix(s,
control = list(removeNumbers = TRUE,
stopwords = TRUE,
stemming = TRUE))
inspect(m[c("price", "texa"), c("127", "144", "191", "194")])
|
Loading required package: NLP
<<TermDocumentMatrix (terms: 4, documents: 5)>>
Non-/sparse entries: 6/14
Sparsity : 70%
Maximal term length: 9
Weighting : term frequency (tf)
Sample :
Docs
Terms 127 144 191 194 211
companies 1 1 0 0 0
company 1 0 0 1 0
companys 0 0 1 0 0
compared 0 0 0 0 1
<<TermDocumentMatrix (terms: 3, documents: 4)>>
Non-/sparse entries: 8/4
Sparsity : 33%
Maximal term length: 6
Weighting : term frequency (tf)
Sample :
Docs
Terms 127 144 191 194
price 2 1 2 2
prices 3 5 0 0
texas 1 0 0 2
<<DocumentTermMatrix (documents: 5, terms: 4)>>
Non-/sparse entries: 6/14
Sparsity : 70%
Maximal term length: 9
Weighting : term frequency - inverse document frequency (tf-idf)
Sample :
Terms
Docs companies company company's compared
127 2.736966 2.321928 0.000000 0.000000
144 2.736966 0.000000 0.000000 0.000000
191 0.000000 0.000000 4.321928 0.000000
194 0.000000 2.321928 0.000000 0.000000
211 0.000000 0.000000 0.000000 2.736966
<<TermDocumentMatrix (terms: 2, documents: 4)>>
Non-/sparse entries: 6/2
Sparsity : 25%
Maximal term length: 5
Weighting : term frequency (tf)
Sample :
Docs
Terms 127 144 191 194
price 5 6 2 2
texa 1 0 0 2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.