id_make: Create data to run IRT model

View source: R/Estimate.R

id_makeR Documentation

Create data to run IRT model


To run an IRT model using idealstan, you must first process your data using the id_make function.


  score_data = NULL,
  outcome_disc = "outcome_disc",
  outcome_cont = "outcome_cont",
  person_id = "person_id",
  item_id = "item_id",
  time_id = "time_id",
  group_id = "group_id",
  model_id = "model_id",
  ordered_id = "ordered_id",
  ignore_id = "ignore_id",
  simul_data = NULL,
  person_cov = NULL,
  item_cov = NULL,
  item_cov_miss = NULL,
  remove_cov_int = FALSE,
  unbounded = FALSE,
  exclude_level = NA,
  simulation = FALSE



A data frame in long form, i.e., one row in the data for each measured score or vote in the data or a rollcall data object from package pscl.


Column name of the person/legislator ID index in score_data, default is 'person_id'. Should be integer, character or factor.


Column name of the item/bill ID index in score_data, default is 'item_id'. Should be integer, character or factor.


Column name of the time values in score_data: optional, default is 'time_id'. Should be a date or date-time class, but can be an integer (i.e., years in whole numbers).


Optional column name of a person/legislator group IDs (i.e., parties) in score_data. Optional, default is 'group_id'. Should be integer, character or factor.


Column name of the model/response types in the data. Default is "model_id". Only necessary if a model with multiple response types (i.e., binary + continuous outcomes). Must be a column with a series of integers matching the model types in id_estimate showing which row of the data matches which outcome.


Optionally, data that has been generated by the id_sim_gen function.


A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the person/legislator ideal points


A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the item/bill discrimination parameters for the regular model


A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model.


Whether to remove constituent terms from hierarchical covariates that interact covariates with IDs like person_id or item_id. Set to TRUE if including these constituent terms would cause multi-collinearity with other terms in the model (such as running a group-level model with a group-level interaction or a person-level model with a person-level interaction).


Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, miss_val is recoded as the maximum of the outcome + 1.


A vector of any values that should be treated as NA in the response matrix. Unlike the miss_val parameter, these values will be dropped from the data before estimation rather than modeled explicitly.


If TRUE, simulated values are saved in the idealdata object for later plotting with the id_plot_sims function


Column name of the outcome in score_data, default is "outcome"


This function can accept either a rollcall data object from package pscl or a long data frame where one row equals one item-person (bill-legislator) observation with associated outcome. The preferred method is the long data frame as passing a long data frame permits the inclusion of a wide range of covariates in the model, such as person-varying and item-varying (bill-varying) covariates. If a rollcall object is passed to the function, the rollcall data is converted to a long data frame with data from the matrix used to determine dates for bills. If passing a long data frame, you should specify the names of the columns containing the IDs for persons, items and groups (groups are IDs that may have multiple observations per ID, such as political parties or classes) to the id_make function, along with the name of the response/outcome. The only required columns are the item/bill ID and the person/legislator ID along with an outcome column.

The preferred format for the outcome column for discrete variables (binary or ordinal) is to pass a factor variable with levels in the correct order, i.e., in ascending order. For example, if using legislative data, the levels of the factor should be c('No','Yes'). If a different kind of variable is passed, such as a character or numeric variable, you should consider specifying low_val,high_val and middle_val to determine the correct order of the discrete outcome. Specifying middle_val is only necessary if you are estimating an ordinal model.

If you do not specify a value for miss_val, then any NA are assumed to be missing. If you do specify miss_val and you also have NA in your data (assuming miss_val is not NA), then the function will treat the data coded as miss_val as missing data that should be modeled and will treat the NA data as ignorable missing data that will be removed (list-wise deletion) before estimating a model.


A idealdata object that can then be used in the id_estimate function to fit a model.


# You can either use a pscl rollcall object or a vote/score matrix 
# where persons/legislators are in the rows
# and items/bills are in the columns


# First, using a rollcall object with the 114th Senate's rollcall votes:


to_idealstan <-   id_make(score_data = senate114,
               outcome = 'cast_code',
               person_id = 'bioname',
               item_id = 'rollnumber',
               group_id= 'party_code',

saudiwin/idealstan documentation built on Sept. 2, 2023, 1:29 a.m.