plot.MultimodDiagnostic: Plotting Method for Multimodality Diagnostic Objects
In bstewart/stm: Estimation of the Structural Topic Model

plot.MultimodDiagnostic

R Documentation

Plotting Method for Multimodality Diagnostic Objects

Description

The plotting method for objects of the S3 class 'MultimodDiagnostic', which are returned by the function multiSTM(), which performs a battery of tests aimed at assessing the stability of the local modes of an STM model.

Usage

## S3 method for class 'MultimodDiagnostic'
plot(x, ind = NULL, topics = NULL, ...)

Arguments

`x`	An object of S3 class 'MultimodDiagnostic'. See `multiSTM`.
`ind`	An integer of list of integers specifying which plots to generate (see details). If `NULL` (default), all plots are generated.
`topics`	An integer or vector of integers specifying the topics for which to plot the posterior distribution of covariate effect estimates. If `NULL` (default), plots are generated for every topic in the S3 object.
`...`	Other arguments to be passed to the plotting functions.

Details

This methods generates a series of plots, which are indexed as follows. If a subset of the plots is required, specify their indexes using the ind argument. Please note that not all plot types are available for every object of class 'MultimodDiagnostic':

Histogram of Expected Common Words: Generates a 10-bin histogram of the column means of obj$wmat, a K-by-N matrix reporting the number of "top words" shared by the reference model and the candidate model. The "top words" for a given topic are defined as the 10 highest-frequency words.
Histogram of Expected Common Documents: Generates a 10-bin histogram of the column means of obj$tmat, a K-by-N matrix reporting the number of "top documents" shared by the reference model and the candidate model. The "top documents" for a given topic are defined as the 10 documents in the reference corpus with highest topical frequency.
Distribution of .95 Confidence-Interval Coverage for Regression Estimates: Generates a histogram of obj$confidence.ratings, a vector whose entries specify the proportion of regression coefficient estimates in a candidate model that fall within the .95 confidence interval for the corresponding estimate in the reference model. This can only be generated if obj$confidence.ratings is non-NULL.
Posterior Distributions of Covariate Effect Estimates By Topic: Generates a square matrix of plots, each depicting the posterior distribution of the regression coefficients for the covariate specified in obj$reg.parameter.index for one topic. The topics for which the plots are to be generated are specified by the topics argument. If the length of topics is not a perfect square, the plots matrix will include white space. The plots have a dashed black vertical line at zero, and a continuous red vertical line indicating the coefficient estimate in the reference model. This can only be generated if obj$cov.effects is non-NULL.
Histogram of Expected L1-Distance From Reference Model: Generates a 10-bin histogram of the column means of obj$lmat, a K-by-N matrix reporting the L1-distance of each topic from the corresponding one in the reference model.
L1-distance vs. Top-10 Word Metric: Produces a smoothed color density representation of the scatterplot of obj$lmat and obj$wmat, the metrics for L1-distance and shared top-words, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.
L1-distance vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of obj$lmat and obj$tmat, the metrics for L1-distance and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.
Top-10 Words vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of obj$wmat and obj$tmat, the metrics for shared top-words and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.
Maximized Bound vs. Aggregate Top-10 Words Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the top-words metric aggregated by model (obj$wmod/1000).
Maximized Bound vs. Aggregate Top-10 Docs Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the top-docs metric aggregated by model (obj$tmod/1000).
Maximized Bound vs. Aggregate L1-Distance Metric: Generates a scatter plot with linear trendline for the maximized bound vector (obj$lb) and a linear transformation of the L1-distance metric aggregated by model (obj$tmod/1000).
Top-10 Docs Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$tmat.
L1-Distance Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$lmat.
Top-10 Words Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of object$wmat.
Same as 5, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.
Same as 11, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.
Same as 7, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.
Same as 13, but using the limited-mass L1-distance metric. Can only be generated if obj$mass.threshold != 1.

Author(s)

Brandon M. Stewart (Princeton University) and Antonio Coppola (Harvard University)

References

Roberts, M., Stewart, B., & Tingley, D. (Forthcoming). "Navigating the Local Modes of Big Data: The Case of Topic Models. In Data Analytics in Social Science, Government, and Industry." New York: Cambridge University Press.

Examples



## Not run: 

# Example using Gadarian data

temp<-textProcessor(documents=gadarian$open.ended.response, 
                    metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta
set.seed(02138)
mod.out <- selectModel(docs, vocab, K=3, 
                       prevalence=~treatment + s(pid_rep), 
                       data=meta, runs=20)

out <- multiSTM(mod.out, mass.threshold = .75, 
                reg.formula = ~ treatment,
                metadata = gadarian)

plot(out)
plot(out, 1)

## End(Not run)

bstewart/stm documentation built on Jan. 3, 2024, 6:58 p.m.