View source: R/textmodel_wordfish.R
textmodel_wordfish | R Documentation |
Estimate Slapin and Proksch's (2008) "wordfish" Poisson scaling model of one-dimensional document positions using conditional maximum likelihood.
textmodel_wordfish(
x,
dir = c(1, 2),
priors = c(Inf, Inf, 3, 1),
tol = c(1e-06, 1e-08),
dispersion = c("poisson", "quasipoisson"),
dispersion_level = c("feature", "overall"),
dispersion_floor = 0,
abs_err = FALSE,
residual_floor = 0.5
)
x |
the dfm on which the model will be fit |
dir |
set global identification by specifying the indexes for a pair of
documents such that |
priors |
prior precisions for the estimated parameters |
tol |
tolerances for convergence. The first value is a convergence threshold for the log-posterior of the model, the second value is the tolerance in the difference in parameter values from the iterative conditional maximum likelihood (from conditionally estimating document-level, then feature-level parameters). |
dispersion |
sets whether a quasi-Poisson quasi-likelihood should be
used based on a single dispersion parameter ( |
dispersion_level |
sets the unit level for the dispersion parameter,
options are |
dispersion_floor |
constraint for the minimal underdispersion multiplier
in the quasi-Poisson model. Used to minimize the distorting effect of
terms with rare term or document frequencies that appear to be severely
underdispersed. Default is 0, but this only applies if |
abs_err |
specifies how the convergence is considered |
residual_floor |
specifies the threshold for residual matrix when
calculating the svds, only applies when |
The returns match those of Will Lowe's R implementation of
wordfish
(see the austin package), except that here we have renamed
words
to be features
. (This return list may change.) We
have also followed the practice begun with Slapin and Proksch's early
implementation of the model that used a regularization parameter of
se(\sigma) = 3
, through the third element in priors
.
An object of class textmodel_fitted_wordfish
. This is a list
containing:
dir |
global identification of the dimension |
theta |
estimated document positions |
alpha |
estimated document fixed effects |
beta |
estimated feature marginal effects |
psi |
estimated word fixed effects |
docs |
document labels |
features |
feature labels |
sigma |
regularization parameter for betas in Poisson form |
ll |
log likelihood at convergence |
se.theta |
standard errors for theta-hats |
x |
dfm to which the model was fit |
In the rare situation where a warning message of "The algorithm did not converge." shows up, removing some documents may work.
Benjamin Lauderdale, Haiyan Wang, and Kenneth Benoit
Slapin, J. & Proksch, S.O. (2008). A Scaling Model for Estimating Time-Series Party Positions from Texts. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.1540-5907.2008.00338.x")}. American Journal of Political Science, 52(3), 705–772.
Lowe, W. & Benoit, K.R. (2013). Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/pan/mpt002")}. Political Analysis, 21(3), 298–313.
predict.textmodel_wordfish()
(tmod1 <- textmodel_wordfish(quanteda::data_dfm_lbgexample, dir = c(1,5)))
summary(tmod1, n = 10)
coef(tmod1)
predict(tmod1)
predict(tmod1, se.fit = TRUE)
predict(tmod1, interval = "confidence")
## Not run:
library("quanteda")
dfmat <- dfm(tokens(data_corpus_irishbudget2010))
(tmod2 <- textmodel_wordfish(dfmat, dir = c(6,5)))
(tmod3 <- textmodel_wordfish(dfmat, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = 0))
(tmod4 <- textmodel_wordfish(dfmat, dir = c(6,5),
dispersion = "quasipoisson", dispersion_floor = .5))
plot(tmod3$phi, tmod4$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0))
plot(tmod3$phi, tmod4$phi, xlab = "Min underdispersion = 0", ylab = "Min underdispersion = .5",
xlim = c(0, 1.0), ylim = c(0, 1.0), type = "n")
underdispersedTerms <- sample(which(tmod3$phi < 1.0), 5)
which(featnames(dfmat) %in% names(topfeatures(dfmat, 20)))
text(tmod3$phi, tmod4$phi, tmod3$features,
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "grey90")
text(tmod3$phi['underdispersedTerms'], tmod4$phi['underdispersedTerms'],
tmod3$features['underdispersedTerms'],
cex = .8, xlim = c(0, 1.0), ylim = c(0, 1.0), col = "black")
if (requireNamespace("austin")) {
tmod5 <- austin::wordfish(quanteda::as.wfm(dfmat), dir = c(6, 5))
cor(tmod1$theta, tmod5$theta)
}
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.