instances_Matrix | R Documentation |
Given an instance list, returns a term-document matrix (sparse format).
instances_Matrix(instances, verbose = getOption("dfrtopics.verbose"))
instances |
file holding MALLET instances or rJava reference to a MALLET
|
verbose |
if TRUE, give some progress messaging |
If the matrix is m
, then m[i, j]
gives the weight of word
i
in document j
. If another term-weighting is desired, this
matrix is convenient to operate on.
For the idea of going sparse, h/t Ben Marwick. The conversion is fairly slow
because it involves copying all the corpus data from Java to R and then goes
on to commit the Ultimate Sin and use a for
loop. Pass
verbose=T
for some reports on progress. TODO: make smarter.
a sparseMatrix
with documents in columns and
words in rows. The ordering of the words is as in the vocabulary
(instances_vocabulary
), and the ordering of documents is as
in the instance list (instances_ids
).
sparseMatrix
,
instances_vocabulary
, instances_ids
,
read_wordcounts
for access to unprocessed wordcounts data (i.e.
before stopword removal, etc.).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.