make.heldout: Heldout Likelihood by Document Completion
In stm: Estimation of the Structural Topic Model

make.heldout

R Documentation

Heldout Likelihood by Document Completion

Description

Tools for making and evaluating heldout datasets.

Usage

make.heldout(
  documents,
  vocab,
  N = floor(0.1 * length(documents)),
  proportion = 0.5,
  seed = NULL
)

Arguments

`documents`	the documents to be modeled (see `stm` for format).
`vocab`	the vocabulary item
`N`	number of docs to be partially held out
`proportion`	proportion of docs to be held out.
`seed`	the seed, set for replicability

Details

These functions are used to create and evaluate heldout likelihood using the document completion method. The basic idea is to hold out some fraction of the words in a set of documents, train the model and use the document-level latent variables to evaluate the probability of the heldout portion. See the example for the basic workflow.

Examples


prep <- prepDocuments(poliblog5k.docs, poliblog5k.voc,
                      poliblog5k.meta,subsample=500,
                      lower.thresh=20,upper.thresh=200)
heldout <- make.heldout(prep$documents, prep$vocab)
documents <- heldout$documents
vocab <- heldout$vocab
meta <- prep$meta

stm1<- stm(documents, vocab, 5,
           prevalence =~ rating+ s(day),
           init.type="Random",
           data=meta, max.em.its=5)
eval.heldout(stm1, heldout$missing)

stm documentation built on June 24, 2024, 5:18 p.m.