id_estimate  R Documentation 
idealstan
modelThis function will take a preprocessed idealdata
vote/score dataframe and
run one of the available IRT/latent space ideal point models on the data using
Stan's MCMC engine.
id_estimate(
idealdata = NULL,
model_type = 2,
inflate_zero = FALSE,
vary_ideal_pts = "none",
keep_param = NULL,
grainsize = 1,
mpi_export = NULL,
use_subset = FALSE,
sample_it = FALSE,
subset_group = NULL,
subset_person = NULL,
sample_size = 20,
nchains = 4,
niters = 1000,
use_vb = FALSE,
ignore_db = NULL,
restrict_ind_high = NULL,
fix_high = 1,
fix_low = (1),
restrict_ind_low = NULL,
fixtype = "prefix",
const_type = "persons",
id_refresh = 0,
prior_fit = NULL,
warmup = 1000,
ncores = 4,
use_groups = FALSE,
discrim_reg_sd = 2,
discrim_miss_sd = 2,
person_sd = 3,
time_fix_sd = 0.1,
time_var = 10,
ar1_up = 1,
ar1_down = 0,
boundary_prior = NULL,
time_center_cutoff = 50,
restrict_var = FALSE,
sample_stationary = FALSE,
ar_sd = 2,
diff_reg_sd = 1,
diff_miss_sd = 1,
restrict_sd_high = 0.01,
restrict_sd_low = 0.01,
tol_rel_obj = 0.001,
gp_sd_par = 0.025,
gp_num_diff = 3,
gp_m_sd_par = 0.3,
gp_min_length = 0,
cmdstan_path_user = NULL,
gpu = FALSE,
map_over_id = "persons",
save_files = NULL,
pos_discrim = FALSE,
het_var = TRUE,
...
)
idealdata 
An object produced by the 
model_type 
An integer reflecting the kind of model to be estimated. See below. 
inflate_zero 
If the outcome is distributed as Poisson (count/unbounded integer),
setting this to

vary_ideal_pts 
Default 
keep_param 
A list with logical values for different categories of paremeters which
should/should not be kept following estimation. Can be any/all of 
grainsize 
The grainsize parameter for the 
mpi_export 
If 
use_subset 
Whether a subset of the legislators/persons should be used instead of the full response matrix 
sample_it 
Whether or not to use a random subsample of the response matrix. Useful for testing. 
subset_group 
If person/legislative data was included in the 
subset_person 
A list of character values of names of persons/legislators to use to subset if 
sample_size 
If 
nchains 
The number of chains to use in Stan's sampler. Minimum is one. See 
niters 
The number of iterations to run Stan's sampler. Shouldn't be set much lower than 500. See 
use_vb 
Whether or not to use Stan's variational Bayesian inference engine instead of full Bayesian inference. Pros: it's much faster.
Cons: it's not quite as accurate. See 
ignore_db 
If there are multiple time periods (particularly when there are
very many time periods), you can pass in a data frame
(or tibble) with one row per person per time period and an indicator column

restrict_ind_high 
If 
fix_high 
The value of which the high fixed ideal point/item should be fixed to. Default is +1. 
fix_low 
The value of which the high fixed ideal point/item should be fixed to. Default is 1. 
restrict_ind_low 
If 
fixtype 
Sets the particular kind of identification used on the model, could be either 'vb_full'
(identification provided exclusively by running a variational identification model with no prior info), or
'prefix' (two indices of ideal points or items to fix are provided to
options 
const_type 
Whether 
id_refresh 
The number of times to report iterations from the variational run used to identify models. Default is 0 (nothing output to console). 
prior_fit 
If a previous 
warmup 
The number of iterations to use to calibrate Stan's sampler on a given model. Shouldn't be less than 100.
See 
ncores 
The number of cores in your computer to use for parallel processing in the Stan engine.
See 
use_groups 
If 
discrim_reg_sd 
Set the prior standard deviation of the bimodal prior for the discrimination parameters for the noninflated model. 
discrim_miss_sd 
Set the prior standard deviation of the bimodal prior for the discrimination parameters for the inflated model. 
person_sd 
Set the prior standard deviation for the legislators (persons) parameters 
time_fix_sd 
The variance of the overtime component of the first person/legislator is fixed to this value as a reference. Default is 0.1. 
boundary_prior 
If your time series has very low variance (change over time),
you may want to use this option to put a boundaryavoiding inverse gamma prior on
the time series variance parameters if your model has a lot of divergent transitions.
To do so, pass a list with a element called

time_center_cutoff 
The number of time points above which the model will employ a centered time series approach for AR(1) and random walk models. Below this number the model will employ a noncentered approach. The default is 50 time points, which is relatively arbitrary and higher values may be better if sampling quality is poor above the threshold. 
sample_stationary 
If 
ar_sd 
If an AR(1) model is used, this defines the prior scale of the Normal distribution. A lower number can help identify the model when there are few time points. 
diff_reg_sd 
Set the prior standard deviation for the bill (item) intercepts for the noninflated model. 
diff_miss_sd 
Set the prior standard deviation for the bill (item) intercepts for the inflated model. 
restrict_sd_high 
Set the prior standard deviation for pinned parameters. This has a default of 0.01, but could be set lower if the data is really large. 
restrict_sd_low 
Set the prior standard deviation for pinned parameters. This has a default of 0.01, but could be set lower if the data is really large. 
tol_rel_obj 
If 
gp_sd_par 
The upper limit on allowed residual variation of the Gaussian process prior. Increasing the limit will permit the GP to more closely follow the time points, resulting in much sharper bends in the function and potentially oscillation. 
gp_num_diff 
The number of time points to use to calculate the lengthscale prior that determines the level of smoothness of the GP time process. Increasing this value will result in greater smoothness/autocorrelation over time by selecting a greater number of time points over which to calculate the lengthscale prior. 
gp_m_sd_par 
The upper limit of the marginal standard deviation of the GP time process. Decreasing this value will result in smoother fits. 
gp_min_length 
The minimum value of the GP lengthscale parameter. This is a hard
lower limit. Increasing this value will force a smoother GP fit. It should always be less than

cmdstan_path_user 
Default is NULL, and so will default to whatever is set in

gpu 
Whether a GPU is available to speed computation (primarily for GP timevarying models). 
map_over_id 
This parameter identifies which ID variable to use to construct the
shards for withinchain parallelization. It defaults to 
save_files 
The location to save CSV files with MCMC draws from 
pos_discrim 
Whether all discrimination parameters should be constrained to be
positive. If so, the model reduces to a traditional IRT model in which all items
positively predict ability. Setting this to 
het_var 
Whether to use a separate variance parameter for each item if using Normal or LogNormal distributions that have variance parameters. Defaults to TRUE and should be set to FALSE only if all items have a similar variance. 
... 
Additional parameters passed on to Stan's sampling engine. See 
To run an IRT ideal point model, you must first preprocess your data using the id_make
function. Be sure to specify the correct options for the
kind of model you are going to run: if you want to run an unbounded outcome (i.e. Poisson or continuous),
the data needs to be processed differently. Also any hierarchical covariates at the person or item level
need to be specified in id_make
. If they are specified in id_make
, than all
subsequent models fit by this function will have these covariates.
Note that for static ideal point models, the covariates are only defined for those persons who are not being used as constraints.
As of this version of idealstan
, the following model types are available. Simply pass
the number of the model in the list to the model_type
option to fit the model.
IRT 2PL (binary response) ideal point model, no missingdata inflation
IRT 2PL ideal point model (binary response) with missing inflation
Ordinal IRT (rating scale) ideal point model no missingdata inflation
Ordinal IRT (rating scale) ideal point model with missingdata inflation
Ordinal IRT (graded response) ideal point model no missingdata inflation
Ordinal IRT (graded response) ideal point model with missingdata inflation
Poisson IRT (Wordfish) ideal point model with no missing data inflation
Poisson IRT (Wordfish) ideal point model with missingdata inflation
unbounded (Gaussian) IRT ideal point model with no missing data
unbounded (Gaussian) IRT ideal point model with missingdata inflation
Positiveunbounded (Lognormal) IRT ideal point model with no missing data
Positiveunbounded (Lognormal) IRT ideal point model with missingdata inflation
Latent Space (binary response) ideal point model with no missing data
Latent Space (binary response) ideal point model with missingdata inflation
A fitted idealstan
object that contains posterior samples of all parameters either via full Bayesian inference
or a variational approximation if use_vb
is set to TRUE
. This object can then be passed to the plotting functions for further analysis.
Identifying IRT models is challenging, and ideal point models are still more challenging
because the discrimination parameters are not constrained.
As a result, more care must be taken to obtain estimates that are the same regardless of starting values.
The parameter fixtype
enables you to change the type of identification used. The default, 'vb_full',
does not require any further
information from you in order for the model to be fit. In this version of identification,
an unidentified model is run using
variational Bayesian inference (see vb
). The function will then select two
persons/legislators or items/bills that end up on either end of the ideal point spectrum,
and pin their ideal points
to those specific values.
To control whether persons/legislator or items/bills are constrained,
the const_type
can be set to either "persons"
or
"items"
respectively.
In many situations, it is prudent to select those persons or items
ahead of time to pin to specific values. This allows the analyst to
be more specific about what type of latent dimension is to be
estimated. To do so, the fixtype
option should be set to
"prefix"
. The values of the persons/items to be pinned can be passed
as character values to restrict_ind_high
and
restrict_ind_low
to pin the high/low ends of the latent
scale respectively. Note that these should be the actual data values
passed to the id_make
function. If you don't pass any values,
you will see a prompt asking you to select certain values of persons/items.
The pinned values for persons/items are set by default to +1/1, though
this can be changed using the fix_high
and
fix_low
options. This pinned range is sufficient to identify
all of the models
implemented in idealstan, though fiddling with some parameters may be
necessary in difficult cases. For timeseries models, one of the
person ideal point overtime variances is also fixed to .1, a value that
can be changed using the option time_fix_sd
.
Clinton, J., Jackman, S., & Rivers, D. (2004). The Statistical Analysis of Roll Call Data. The American Political Science Review, 98(2), 355370. doi:10.1017/S0003055404001194
Bafumi, J., Gelman, A., Park, D., & Kaplan, N. (2005). Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation. Political Analysis, 13(2), 171187. doi:10.1093/pan/mpi010
Kubinec, R. "Generalized Ideal Point Models for TimeVarying and MissingData Inference". Working Paper.
Betancourt, Michael. "Robust Gaussian Processes in Stan". (October 2017). Case Study.
id_make
for preprocessing data,
id_plot_legis
for plotting results,
summary
for obtaining posterior quantiles,
posterior_predict
for producing predictive replications.
# First we can simulate data for an IRT 2PL model that is inflated for missing data
library(ggplot2)
library(dplyr)
# This code will take at least a few minutes to run
## Not run:
bin_irt_2pl_abs_sim < id_sim_gen(model_type='binary',inflate=T)
# Now we can put that directly into the id_estimate function
# to get full Bayesian posterior estimates
# We will constrain discrimination parameters
# for identification purposes based on the true simulated values
bin_irt_2pl_abs_est < id_estimate(bin_irt_2pl_abs_sim,
model_type=2,
restrict_ind_high =
sort(bin_irt_2pl_abs_sim@simul_data$true_person,
decreasing=TRUE,
index=TRUE)$ix[1],
restrict_ind_low =
sort(bin_irt_2pl_abs_sim@simul_data$true_person,
decreasing=FALSE,
index=TRUE)$ix[1],
fixtype='prefix',
ncores=2,
nchains=2)
# We can now see how well the model recovered the true parameters
id_sim_coverage(bin_irt_2pl_abs_est) %>%
bind_rows(.id='Parameter') %>%
ggplot(aes(y=avg,x=Parameter)) +
stat_summary(fun.args=list(mult=1.96)) +
theme_minimal()
## End(Not run)
# In most cases, we will use preexisting data
# and we will need to use the id_make function first
# We will use the full rollcall voting data
# from the 114th Senate as a rollcall object
data('senate114')
# Running this model will take at least a few minutes, even with
# variational inference (use_vb=T) turned on
## Not run:
to_idealstan < id_make(score_data = senate114,
outcome = 'cast_code',
person_id = 'bioname',
item_id = 'rollnumber',
group_id= 'party_code',
time_id='date',
high_val='Yes',
low_val='No',
miss_val='Absent')
sen_est < id_estimate(to_idealstan,
model_type = 2,
use_vb = TRUE,
fixtype='prefix',
restrict_ind_high = "BARRASSO, John A.",
restrict_ind_low = "WARREN, Elizabeth")
# After running the model, we can plot
# the results of the person/legislator ideal points
id_plot_legis(sen_est)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.