id_make | R Documentation |
To run an IRT model using idealstan
, you must first process your data using the id_make
function.
id_make(
score_data = NULL,
outcome_disc = "outcome_disc",
outcome_cont = "outcome_cont",
person_id = "person_id",
item_id = "item_id",
time_id = "time_id",
group_id = "group_id",
model_id = "model_id",
ordered_id = "ordered_id",
ignore_id = "ignore_id",
simul_data = NULL,
person_cov = NULL,
item_cov = NULL,
item_cov_miss = NULL,
remove_cov_int = FALSE,
unbounded = FALSE,
exclude_level = NA,
simulation = FALSE
)
score_data |
A data frame in long form, i.e., one row in the data for each
measured score or vote in the data or a |
outcome_disc |
Column name of the outcome with discrete values in |
outcome_cont |
Column name of the outcome with discrete values in |
person_id |
Column name of the person/legislator ID index in |
item_id |
Column name of the item/bill ID index in |
time_id |
Column name of the time values in |
group_id |
Optional column name of a person/legislator group IDs (i.e., parties) in |
model_id |
Column name of the model/response types in the data.
Default is |
ordered_id |
Column name of the variable showing the count of categories for ordinal/categorical items (must be at least 3 categories) |
ignore_id |
Optional column for identifying observations that should not be modeled (i.e., not just treated as missing, rather removed during estimation). Should be a binary vector (0 for remove and 1 for include). Useful for time-varying models where persons may not be present during particular periods and missing data is ignorable. |
simul_data |
Optionally, data that has been generated by the |
person_cov |
A one-sided formula that specifies the covariates
in |
item_cov |
A one-sided formula that specifies the covariates
in |
item_cov_miss |
A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model. |
remove_cov_int |
Whether to remove constituent terms from hierarchical covariates that
interact covariates with IDs like |
unbounded |
Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, missing value is recoded as the maximum of the outcome + 1. |
exclude_level |
A vector of any values that should be treated as |
simulation |
If |
This function accepts a long data frame where one row equals one item-person (bill-legislator)
observation with associated continuous or discrete outcomes/responses.
You either need to include columns with specific names as required by the id_make
function such as person_id
for person IDs and item_id
for item IDs or
specify the names of the
columns containing the IDs to the id_make
function for each column name (see examples).
The only required columns are the item/bill ID and the person/legislator ID along with an
outcome column, outcome_disc
for discrete variables and outcome_cont
for
continuous variables. If both columns are included, then any value can be included for
outcome_disc
if there are values for outcome_cont
and vice versa.
If items of multiple types are included, a column model_id
must be included with
the model type (see id_estimate
function documentation for list of model IDs)
for the response distribution, such as
1 for binary non-inflated, etc. If an ordinal outcome is included, an additional column
ordered_id
must be included that has the total count of categories for that
ordinal variable (i.e., 3 for 3 categories).
For discrete data, it is recommended to include a numeric variable that starts at 0, such as values of 0 and 1 for binary data and 0,1,2 for ordinal/categorical data. For continuous (unbounded) data, it is recommended to standardize the outcome to improve model convergence and fit.
Missing data should be passed as NA
values in either
outcome_disc
or outcome_cont
and will be processed internally.
A idealdata
object that can then be used in the id_estimate()
function
to fit a model.
To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed
to the time_id
option.
If the outcome is continuous, you need to pass a dataframe with one column named
"outcome_disc" or pass the name of the column with the continuous data to the outcome_disc
argument.
Covariates can be fit on the person-level ideal point parameters as well as
item discrimination parameters for either the inflated (missing) or non-inflated (observed)
models. These covariates must be columns that were included with the data fed to the
id_make()
function. The covariate relationships are specified as
one-sided formulas, i.e. ~cov1 + cov2 + cov1*cov2
. To interact covariates with the
person-level ideal points you can use ~cov1 + person_id + cov1*person_id
and for
group-level ideal poins you can use ~cov1 + group_id + cov1*group_id
where
group_id
or person_id
is the same name as the name of the column
for these options that you passed to id_make
(i.e., the names of the columns
in the original data). If you are also going to model these intercepts–i.e. you are
interacting the covariate with person_id
and the model is estimating ideal points
at the person level–then set remove_cov_int
to TRUE to avoid multicollinearity with the
ideal point intercepts.
# You can either use a pscl rollcall object or a vote/score matrix
# where persons/legislators are in the rows
# and items/bills are in the columns
library(dplyr)
# First, using a rollcall object with the 114th Senate's rollcall votes:
data('senate114')
to_idealstan <- id_make(score_data = senate114,
outcome_disc = 'cast_code',
person_id = 'bioname',
item_id = 'rollnumber',
group_id= 'party_code',
time_id='date')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.