rmultinom_sparse | R Documentation |
According to the generative model of LDA, documents are drawn from mixtures of multinomial distributions over the vocabulary. When we simulate from the posterior, our task in practice is: for each document d, given the number of words n allocated to topic k in d, generate the result of n multinomial trials with word probabilities given from topic k. This function tries to do this efficiently given a vector of n values (one for each document) and a vector of topic weights, yielding a simulated term-document matrix of within-topic weights.
rmultinom_sparse(nn, probs)
nn |
vector of trial sizes: |
probs |
vector of word weights: |
R's built-in rmultinom
has two disadvantages
here. First, it is set up to generate many samples, each with
the same number of trials. But we require varying the number of
trials to correspond to our varying numbers of words allocated
to the given topic, so we would have to call rmultinom
once for each document and then rbind
the results. Second,
because the vocabulary can be large and topics typically allocate
most of the probability to only a few words, most elements of
each sample vector will be zero. But the built-in function cannot
take advantage of this sparsity and will require space for a full
simulated term-document matrix. This function, by contrast, returns a
sparse Matrix
.
Note that the parameters are not the same as rmultinom
's.
The equivalent of rmultinom(n, size, prob)
is
rmultinom_sparse(rep(size, n), prob)
.
sparse Matrix
of sampled term-document
counts, with terms in rows and documents in columns. Notice that
this means individual multinomial samples are columns of the
returned matrix.
imi_check
and mi_check
which use this
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.