foreign_model | R Documentation |
Besides this package and mallet, which it builds on, there are several
other topic-modeling packages for R. topicmodels provides a
topic-modeling infrastructure as well as supplying functions for estimating
both ordinary LDA and Correlated Topic Models several ways. I have tried to
make it possible to use at least some of dfrtopics's functions with
results from topicmodels' LDA
and
CTM
functions. I have also wished to make it
possible to interface with the stm package and its Structural Topic
Model (stm
). Given a model from one of these two packages,
apply foreign_model
to obtain an object that can be used with (some
of) the functions in dfrtopics. Use unwrap
to get back the
original model object.
foreign_model(x, metadata = NULL) unwrap(x)
x |
model for translation from topicmodels or stm |
metadata |
metadata frame to attach to model. For converting from
|
Most of this package emerged out of my particular need to wrangle MALLET, and
as a result I did not take account of the topicmodels infrastructure
(which, furthermore, has been refined over time). I wish I had, since that
infrastructure is elegant and extensible, using S4 rather than S3. For now, I
am not going to overhaul my own class structure. As a stopgap, the strategy
adopted here is to provide "wrapper" objects for
TopicModel-class
and stm
objects that can respond to many of the same messages as
mallet_model
does. This is not the best way to do things, but
it's straightforward.
Not all functionality is supported. Anything that requires MALLET's
assignments of topics to individual words (the "sampling state") does not at
present work. Note too that doc_topics
and topic_words
applied
to a TopicModel
or an stm
return parameter estimates of
the probabilities of topics in documents or words in topics. In MALLET
terminology these are "smoothed and normalized," not raw sampling weights.
For this reason hyperparameters
does not return true
hyperparameter values for these models—which are, in any case, defined
variously for the various estimation procedures. Instead,
hyperparameters
returns dummy values of zero so that
tw_smooth_normalize
and dt_smooth_normalize
will
not incorrectly add anything to the posteriors. The actual hyperparameters
should be retrieved from the underlying model if needed.
align_topics
will work with glue objects and should help
compare variant models and estimation strategies.
It is possible to apply dfr_browser
to a glue object to
explore a model, with two caveats. First, the implication of using the
normalized posteriors is that all documents are given equal weight in the
display, whereas the display of a model from mallet by default weights
documents by their lengths; for a more comparable display of a mallet model
m
, use dfr_browser(m, proper=T)
. Second, at present the
display of an stm
object will not use any explicit estimates of the
effects of time covariates. It just takes the average estimated topic
proportion of all documents in each year. To examine the actual estimates,
together with uncertainties, the estimateEffect
method
should be used, or the interactive visualization provided by the
stmBrowser package, for which the kludges here are no substitute.
A wrapper object which will work with most functions of an object of
class mallet_model
.
wordcounts_DocumentTermMatrix
and
wordcounts_stm_inputs
to prepare wordcount data for input to
these other packages' modeling procedures.
## Not run: # aligning three models from three packages counts <- read_wordcounts(...) # etc. meta <- read_dfr_metadata(...) # etc. library(stm) corpus <- wordcounts_stm_inputs(counts, meta) m_stm <- stm(documents=corpus$documents, vocab=corpus$vocab, data=corpus$data, K=25, prevalence= ~ s(journaltitle)) m_stm_glue <- foreign_model(m_stm, corp$data) library(topicmodels) dtm <- wordcounts_DocumentTermMatrix(counts) m_lda <- LDA(dtm, k=25, control=list(alpha=0.1)) m_lda_glue <- foreign_model(m_lda, meta) insts <- wordcounts_instances(counts) m_mallet <- train_model(insts, n_topics=25, metadata=meta) model_distances(list(m_stm_glue, m_lda_glue, m_mallet), 100) %>% align_topics() %>% alignment_frame() ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.