| instances_Matrix | R Documentation | 
Given an instance list, returns a term-document matrix (sparse format).
instances_Matrix(instances, verbose = getOption("dfrtopics.verbose"))
| instances | file holding MALLET instances or rJava reference to a MALLET
 | 
| verbose | if TRUE, give some progress messaging | 
If the matrix is m, then m[i, j] gives the weight of word
i in document j. If another term-weighting is desired, this
matrix is convenient to operate on.
For the idea of going sparse, h/t Ben Marwick. The conversion is fairly slow
because it involves copying all the corpus data from Java to R and then goes
on to commit the Ultimate Sin and use a for loop. Pass
verbose=T for some reports on progress. TODO: make smarter.
a sparseMatrix with documents in columns and
words in rows. The ordering of the words is as in the vocabulary
(instances_vocabulary), and the ordering of documents is as
in the instance list (instances_ids).
sparseMatrix,
instances_vocabulary, instances_ids,
read_wordcounts for access to unprocessed wordcounts data (i.e.
before stopword removal, etc.).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.