est_bilog | R Documentation |
The function est_bilog
facilitates item calibration through BILOG-MG.
It offers two modes of operation: executing BILOG-MG in batch mode or
processing pre-generated BILOG-MG output files. When using the former, ensure
BILOG-MG is installed in the directory specified by bilog_exe_folder
.
In the latter case, if the necessary BILOG-MG files (e.g.,
"<analysis_name>.PAR", "<analysis_name>.PH1", etc.) exist and overwrite
= FALSE
, there is no need for the BILOG-MG program itself. This function is
capable of parsing BILOG-MG output without it.
Both BILOG-MG 3.0 and BILOG-MG 4.0 are supported. Refer to the
bilog_exe_folder
argument for guidance on selecting the desired
version.
est_bilog(
x = NULL,
model = "3PL",
target_dir = getwd(),
analysis_name = "bilog_calibration",
items = NULL,
examinee_id_var = NULL,
group_var = NULL,
logistic = TRUE,
num_of_alternatives = NULL,
criterion = 0.01,
num_of_quadrature = 81,
max_em_cycles = 100,
newton = 20,
reference_group = NULL,
fix = NULL,
scoring_options = c("METHOD=1", "NOPRINT"),
calib_options = c("NORMAL"),
prior_ability = NULL,
prior_ip = NULL,
overwrite = FALSE,
show_output_on_console = TRUE,
bilog_exe_folder = file.path("C:/Program Files/BILOGMG")
)
x |
Either a |
model |
Specifies the item model. Options include:
The default is |
target_dir |
The directory where BILOG-MG analysis and data files will
be stored. The default is the current working directory (i.e.,
|
analysis_name |
A concise filename (without extension) used for the data files created for the analysis. |
items |
A vector of column names or numbers in |
examinee_id_var |
The column name or number containing individual
subject IDs. If not provided (i.e., |
group_var |
The column name or number containing group membership
information for multi-group calibration. Ideally, the grouping variable
should be represented by single-digit integers. If other data types are
provided, integer values will be automatically assigned to the variables.
The default is |
logistic |
A logical value indicating whether to use logistic calibration.
The default value is |
num_of_alternatives |
An integer specifying the maximum number of response alternatives in the raw data. This value is used as an automatic starting value for estimating pseudo-guessing parameters. The default value is |
criterion |
The convergence criterion for EM and Newton iterations. The default value is 0.01. |
num_of_quadrature |
The number of quadrature points used in MML
estimation. The default value is 81. This value will be represented in the
BILOG-MG control file as: |
max_em_cycles |
An integer (0, 1, ...) representing the maximum number
of EM cycles. This value will be represented in the BILOG-MG control file
as: |
newton |
An integer (0, 1, ...) representing the number of Gauss-Newton
iterations following EM cycles. This value will be represented in the
BILOG-MG control file as: |
reference_group |
A value indicating which group's ability distribution
will be set to mean = 0 and standard deviation = 1. For example, if the
When groups are assumed to come from a single population, set this value to 0. The default value is 'NULL'. This value will be represented in the BILOG-MG control file as: 'REFERENCE = reference_group'. |
fix |
Specifies whether the parameters of specific items are free to be
estimated or should be held fixed at their starting values. This argument
accepts a |
scoring_options |
A string vector of keywords/options to be included in
the The default value is The primary option to add to this vector is
Additionally, you can include the following keywords:
Refer to the BILOG-MG manual for detailed explanations of these keywords/options. |
calib_options |
A string vector of additional keywords/options for the
The default value is Including Including If you're calibrating items using the Additional keywords/options that can be added to - Refer to the BILOG-MG manual for detailed explanations of these keywords/options. NOTE: Do not add the following keywords to |
prior_ability |
Prior ability refers to the quadrature points and weights representing the discrete finite distribution of ability for the groups. It should be structured as a list in the following format:
Here, <GROUP-NAME-1> refers to the name of the first group, <GROUP-NAME-2> refers to the name of the second group, and so on. Please refer to the examples section for a practical implementation. |
prior_ip |
Specify prior distributions for item parameters. The default
value is
Quoted descriptions were taken from the BILOG-MG manual. Examples:
In general, one can adjust the alpha and beta parameters to achieve a desired outcome, considering that the mode of the beta distribution is calculated as:
Additionally, setting Note: A non-null |
overwrite |
If set to |
show_output_on_console |
A logical value indicating whether to capture
and display the output of the command on the R console. The default is
|
bilog_exe_folder |
The directory containing the Bilog-MG executable
files. This function supports two versions: BILOG-MG 3 and BILOG-MG 4. For
BILOG-MG version 3, the directory should include the files
|
A list
with following elements is returned:
A list
with the following elements is returned:
An Itempool-class
object holding the item
parameters. Check ...$converged
to ensure the model has
converged before using ip
. This element is not created when
model = "CTT"
.
A data frame object containing information on examinee
scores such as items
attempted (tried
), items answered correctly (right
),
estimated examinee scores (ability
), standard errors of ability
estimates (se
), and response string probabilities (prob
).
This element is not created when model = "CTT"
.
Classical Test Theory (CTT) statistics, including p-values,
biserial, and point-biserial estimates calculated by BILOG-MG. If there
are groups, group-specific CTT statistics can be found in
ctt$group$GROUP-NAME
. Overall statistics for the entire group
are located at ctt$overall
.
A data frame containing items that could not be estimated.
The syntax file.
E-M Cycles of the calibration.
Newton Cycles of the calibration
The number of cycles run before calibration converges or fails to converge.
The largest change observed between the last two cycles.
-2 Log Likelihood value of the last step of
the E-M cycles. See also $em_cycles
. This value is NULL
when the model does not converge. This element is not created when
model = "CTT"
.
Posterior quadrature points and weights.
A list object that stores the arguments passed to the function.
Emre Gonulates
## Not run:
#############################################
############## Example 1 - 2PL ##############
#############################################
# IRT Two-parameter Logistic Model Calibration
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "2PL")
resp <- sim_resp(true_ip, true_theta)
# The following line will run BILOG-MG, estimate 2PL model and put the
# analysis results under the target directory:
bilog_calib <- est_bilog(x = resp, model = "2PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
# Check whether the calibration converged
bilog_calib$converged
# Get the estimated item pool
bilog_calib$ip
# See the BILOG-MG syntax
cat(bilog_calib$syntax)
# See the classical test theory statistics estimated by BILOG-MG:
bilog_calib$ctt
# Get -2LogLikelihood for the model (mainly for model comparison purposes):
bilog_calib$neg_2_log_likelihood
# Get estimated scores
head(bilog_calib$score)
# Compare true and estimated abilities
plot(true_theta, bilog_calib$score$ability, xlab = "True Theta",
ylab = "Estimated theta")
abline(a = 0, b = 1, col = "red", lty = 2)
# Compare true item parameters
plot(true_ip$a, bilog_calib$ip$a, xlab = "True 'a'", ylab = "Estimated 'a'")
abline(a = 0, b = 1, col = "red", lty = 2)
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
# Note that Bilog-MG centers the ability at mean 0.
mean(bilog_calib$score$ability)
# Quadrature points and posterior weights:
head(bilog_calib$posterior_dist)
#############################################
############## Example 2 - EAP ##############
#############################################
# Getting Expected-a-posteriori theta scores
result <- est_bilog(x = resp, model = "2PL",
scoring_options = c("METHOD=2", "NOPRINT"),
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
head(result$score)
###############################################
############## Example 3 - Rasch ##############
###############################################
# Rasch Model Calibration
true_theta <- rnorm(400)
true_ip <- generate_ip(n = 30, model = "Rasch")
resp <- sim_resp(true_ip, true_theta)
# Run calibration
bilog_calib <- est_bilog(x = resp, model = "Rasch",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
bilog_calib$ip
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
# Note that the 'b' parameters are rescaled so that their arithmetic mean
# equals 0.0.
mean(bilog_calib$ip$b)
#############################################
############## Example 4 - 3PL ##############
#############################################
# IRT Three-parameter Logistic Model Calibration
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# The following line will run BILOG-MG, estimate 3PL model and put the
# analysis results under the target directory:
bilog_calib <- est_bilog(x = resp, model = "3PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
Estimated item pool:
bilog_calib$ip
# Convergence status:
bilog_calib$converged
# Number of EM cycles:
bilog_calib$cycle
# Note that the maximum number of EM cycles were set at:
bilog_calib$input$max_em_cycles
# Largest change at the last cycle (note that convergence criterion is 0.01)
bilog_calib$largest_change
# Estimated Scores:
bilog_calib$score
# CTT stats calculated by BILOG-MG:
bilog_calib$ctt
#############################################
############## Example 5 - 1PL ##############
#############################################
# One-Parameter Logistic Model Calibration
true_theta <- rnorm(800)
true_ip <- generate_ip(n = 30, model = "2PL")
# Set 'a' parameters to a fixed number
true_ip$a <- 1.5
resp <- sim_resp(true_ip, true_theta)
# Run calibration
bilog_calib <- est_bilog(x = resp, model = "1PL",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE)
# Note that all 'a' parameter values and all 'se_a' values are the same:
bilog_calib$ip
plot(true_ip$b, bilog_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
abline(a = 0, b = 1, col = "red", lty = 2)
#############################################################
############## Example 6.1 - Multi-group - 3PL ##############
#############################################################
# Multi-group IRT calibration - 3PL
## Generate Data ##
ip <- generate_ip(n = 35, model = "3PL", D = 1.7)
n_upper <- sample(1200:3000, 1)
n_lower <- sample(1900:2800, 1)
theta_upper <- rnorm(n_upper, 1.5, .25)
theta_lower <- rnorm(n_lower)
resp <- sim_resp(ip = ip, theta = c(theta_lower, theta_upper))
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
## Run Calibration ##
mg_calib <- est_bilog(x = dt, model = "3PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
num_of_alternatives = 5,
# Use MAP ability estimation.
# "FIT": calculate GOF for response patterns
scoring_options = c("METHOD=3", "NOPRINT", "FIT"),
target_dir = "C:/Temp/Analysis", overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
# Check Convergence
mg_calib$converged
# Print estimated scores of first five examinees
head(mg_calib$score)
# Posterior distributions of 'Lower' (in red) and 'Upper' group
plot(mg_calib$posterior_dist$Upper$point,
mg_calib$posterior_dist$Upper$weight)
points(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Lower$weight, col = "red")
#############################################################
############## Example 6.2 - Multi-group - Response_set #####
#############################################################
# Multi-group IRT calibration - Response_set 2PL
## Generate Data ##
ip <- generate_ip(n = 35, model = "2PL", D = 1.7)
n_upper <- sample(1000:2000, 1)
n_lower <- sample(1000:2000, 1)
resp_set <- generate_resp_set(
ip = ip, theta = c(rnorm(n_lower), rnorm(n_upper, 1.5, .25)))
# Attach the group information
resp_set$mygroup <- c(rep("Lower", n_lower), rep("Upper", n_upper))
## Run Calibration ##
mg_calib <- est_bilog(x = resp_set,
model = "2PL",
group_var = "mygroup",
reference_group = "Lower",
target_dir = "C:/Temp/Analysis",
overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
###############################################################
############## Example 6.3 - Multi-group - 1PL ################
###############################################################
# Multi-group IRT calibration - 1PL
## Generate Data ##
n_item <- sample(30:40, 1)
ip <- generate_ip(n = n_item, model = "2PL", D = 1.7)
ip$a <- 1.25
n_upper <- sample(700:1000, 1)
n_lower <- sample(1200:1800, 1)
theta_upper <- rnorm(n_upper, 1.5, .25)
theta_lower <- rnorm(n_lower)
resp <- sim_resp(ip = ip, theta = c(theta_lower, theta_upper))
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
## Run Calibration ##
mg_calib <- est_bilog(x = dt,
model = "1PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
target_dir = "C:/Temp/Analysis",
overwrite = TRUE,
show_output_on_console = FALSE)
# Estimated item pool
mg_calib$ip
# Print group means
mg_calib$group_info
# Check Convergence
mg_calib$converged
# Print estimated scores of first five examinees
head(mg_calib$score)
###############################################################
############## Example 6.4 - Multi-group - Prior Ability ######
###############################################################
# Multi-group IRT calibration - 3PL with user supplied prior ability
# parameters
n_item <- sample(40:70, 1)
ip <- generate_ip(n = n_item, model = "3PL", D = 1.7)
n_upper <- sample(2000:4000, 1)
n_lower <- sample(3000:5000, 1)
theta_upper <- rgamma(n_upper, shape = 2, rate = 2)
# hist(theta_upper)
theta_lower <- rnorm(n_lower)
true_theta <- c(theta_lower, theta_upper)
resp <- sim_resp(ip = ip, theta = true_theta, prop_missing = .2)
# Create response data where first column group information
dt <- data.frame(level = c(rep("Lower", n_lower), rep("Upper", n_upper)),
resp)
# Set prior ability parameters
points <- seq(-4, 4, .1)
prior_ability = list(
Lower = list(points = points, weights = dnorm(points)),
# Also try misspecified prior:
# Upper = list(points = points, weights = dnorm(points, 1, .25))
Upper = list(points = points, weights = dgamma(points, 2, 2))
)
mg_calib <- est_bilog(x = dt,
model = "3PL",
group_var = "level",
reference_group = "Lower",
items = 2:ncol(dt), # Exclude the 'group' column
calib_options = c("IDIST = 2"),
prior_ability = prior_ability,
# Use MAP ability estimation.
scoring_options = c("METHOD=3"),
target_dir = target_dir,
overwrite = TRUE,
show_output_on_console = FALSE)
# Check whether model has convergence
mg_calib$converged
# Group information
mg_calib$group_info
# Quadrature points and posterior weights:
head(mg_calib$posterior_dist$Lower)
plot(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Lower$weight,
xlab = "Quadrature Points",
ylab = "Weights",
xlim = c(min(c(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Upper$point)),
max(c(mg_calib$posterior_dist$Lower$point,
mg_calib$posterior_dist$Upper$point))),
ylim = c(min(c(mg_calib$posterior_dist$Lower$weight,
mg_calib$posterior_dist$Upper$weight)),
max(c(mg_calib$posterior_dist$Lower$weight,
mg_calib$posterior_dist$Upper$weight))))
points(mg_calib$posterior_dist$Upper$point,
mg_calib$posterior_dist$Upper$weight, col = "red")
# Comparison of true and estimated item parameters
plot(ip$a, mg_calib$ip$a, xlab = "True 'a'", ylab = "Estimated 'a'")
plot(ip$b, mg_calib$ip$b, xlab = "True 'b'", ylab = "Estimated 'b'")
plot(ip$c, mg_calib$ip$c, xlab = "True 'c'", ylab = "Estimated 'c'")
# Ability parameters
plot(true_theta, mg_calib$score$ability,
xlab = "True Theta",
ylab = "Estimated Theta")
abline(a = 0, b = 1, col = "red")
####################################################################
############## Example 7 - Read BILOG-MG Output without BILOG-MG ###
####################################################################
# To read BILOG-MG output files saved in the "Analysis/" directory with file
# names like "my_analysis.PH1", "my_analysis.PH2", etc., and without
# performing the calibration (no need for an installed BILOG-MG program on
# your computer), use the following syntax:
result <- est_bilog(target_dir = file.path("Analysis/"), model = "3PL",
analysis_name = "my_analysis", overwrite = FALSE)
####################################################################
############## Example 8 - Fixed Item Parameters ###################
####################################################################
# Fixed item calibration involves setting specific item parameters to
# predefined values while allowing other items' parameters to be freely
# estimated.
# If you want to fix all values of a particular item parameter(s), you can
# use strong priors. Refer to the documentation for the "prior_ip" argument
# for more details.
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(3000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# Setup the data frame that will hold 'item_id's to be fixed, and the
# item parameters to be fixed.
fix_pars <- data.frame(item_id = c("Item_5", "Item_4", "Item_10"),
a = c(1, 1.5, 1.75),
b = c(-1, 0.25, 0.75),
c = c(.15, .25, .35))
fixed_calib <- est_bilog(x = resp, fix = fix_pars,
target_dir = "C:/Temp/Analysis", overwrite = TRUE)
# Check item parameters for Item_4, Item_5, Item_10:
fixed_calib$ip
######### #########
# If only some of the parameters are supplied, the defaults will be used
# for the missing parameters. For example, for the example below, the
# default 'a' parameter value is 1, and the default 'c' parameter value is
# (1/num_of_alternatives) = (1/5) = 0.2.
fix_pars2 <- data.frame(item_id = c("Item_1", "Item_2", "Item_3"),
b = c(-1, 0.25, 0.75))
fixed_calib2 <- est_bilog(x = resp, fix = fix_pars2,
target_dir = "C:/Temp/Analysis", overwrite = TRUE)
# Check item parameters for Item_4, Item_5, Item_10:
fixed_calib2$ip
##################################################################
############## Example 9 - 3PL with Common Guessing ##############
##################################################################
# IRT Three-parameter Logistic Model Calibration with Common Guessing
# Create responses to be used in BILOG-MG estimation
true_theta <- rnorm(4000)
true_ip <- generate_ip(n = 30, model = "3PL")
resp <- sim_resp(true_ip, true_theta)
# Run calibration:
bilog_calib <- est_bilog(x = resp, model = "3PL",
target_dir = "C:/Temp/Analysis",
calib_options = c("NORMAL", "COMMON"),
overwrite = TRUE)
# Note the 'c' parameters
bilog_calib$ip
##################################################################
############## Example 10 - 3PL with Fixed Guessing ##############
##################################################################
# IRT Three-parameter Logistic Model Calibration with Fixed Guessing
# The aim is to fix guessing parameters of all items to a fixed
# number like 0.25
true_theta <- rnorm(3000)
true_ip <- generate_ip(n = 30, model = "3PL")
true_ip$c <- 0.25
resp <- sim_resp(true_ip, true_theta)
prc1 <- est_bilog(x = resp, model = "3PL", target_dir = "C:/Temp/Analysis",
prior_ip = list(ALPHA = 10000000, BETA = 30000000),
overwrite = TRUE)
## End(Not run) # end dontrun
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.