make.heldout | R Documentation |
Tools for making and evaluating heldout datasets.
make.heldout(
documents,
vocab,
N = floor(0.1 * length(documents)),
proportion = 0.5,
seed = NULL
)
documents |
the documents to be modeled (see |
vocab |
the vocabulary item |
N |
number of docs to be partially held out |
proportion |
proportion of docs to be held out. |
seed |
the seed, set for replicability |
These functions are used to create and evaluate heldout likelihood using the document completion method. The basic idea is to hold out some fraction of the words in a set of documents, train the model and use the document-level latent variables to evaluate the probability of the heldout portion. See the example for the basic workflow.
prep <- prepDocuments(poliblog5k.docs, poliblog5k.voc,
poliblog5k.meta,subsample=500,
lower.thresh=20,upper.thresh=200)
heldout <- make.heldout(prep$documents, prep$vocab)
documents <- heldout$documents
vocab <- heldout$vocab
meta <- prep$meta
stm1<- stm(documents, vocab, 5,
prevalence =~ rating+ s(day),
init.type="Random",
data=meta, max.em.its=5)
eval.heldout(stm1, heldout$missing)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.