knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, eval=TRUE, cache=TRUE) library(Rmst)
Rmst allows uers to use the bottom-up or the top-down approach to assemble computerized adaptive multistage tests (MSTs). See more details regarding automated test assembly (ATA) and mixed integer programming (MIP) in Rata. After assembling MST panels, users can simulate the administration of those MST panels, which is a useful means of evaluating the psychometric and content characteristics of those panels.
Install the stable version from CRAN:
install.packages("Rmst")
Install the most recent version from github:
devtools::install_github("xluo11/Rmst")
Users would use the following functions to build an ATA model for assembling MST panels:
mst
: initiate an ATA model for assembling MST panels. mst_route
: add or remove routes from the MST panel designmst_objective
: add an absolute or relative objective to the modelmst_constraint
: add categorical and continuous constraints to the modelmst_stage_length
: set the limits on the module length in a stagemst_rdp
: anchor the TIF intersections between two adjacent modulesmst_module_info
: set the minimum and/or maximum values of the module information functionmst_assemble
: assemble MST panels using MIPmst_get_items
: retrieve items from the solved ATA modelplot
: plot the TIF of assembled modules or routesUse mst(..., method=)
to choose between the top-down and the bottomp-up approach. In the top-down appraoch, objectives and constraints are added directly on routes, and thus the indices
refers to the route index in the top-down approach and the module index in the bottom-up approach. If needed, use mst_objective(..., method=)
and mst_constraint(..., method=)
to override the default method in order to implement a hybrid assembly approach.
First, let's load some packages and write a helper function for generating item pools. By default, the pool includes 400 3PL items, 20 GPCM items, and 20 GRM items. Each item has a categorical item attribute (content area) and a continuous attribute (response time).
library(Rirt) library(dplyr, warn.conflicts=FALSE) library(ggplot2, warn.conflicts=FALSE) item_pool <- function(types=c('3pl', 'gpcm', 'grm'), n_3pl=400, n_gpcm=20, n_grm=20, seed=987653) { set.seed(seed) items <- model_mixed_gendata(1, n_3pl, n_gpcm, n_grm, n_c=3)$items if(n_3pl > 0) items$'3pl' <- cbind(items$'3pl', id=1:n_3pl, type='3PL', content=sample(4, n_3pl, replace=TRUE), time=round(rlnorm(n_3pl, 4.0, .28)), group=sort(sample(n_3pl/2, n_3pl, replace=TRUE))) if(n_gpcm > 0) items$'gpcm' <- cbind(items$'gpcm', id=1:n_gpcm, type='GPCM', content=sample(4, n_gpcm, replace=TRUE), time=round(rlnorm(n_gpcm, 4.0, .28)), group=sort(sample(n_gpcm/2, n_gpcm, replace=TRUE))) if(n_grm > 0) items$'grm' <- cbind(items$'grm', id=1:n_grm, type='GRM', content=sample(4, n_grm, replace=TRUE), time=round(rlnorm(n_grm, 4.0, .28)), group=sort(sample(n_grm/2, n_grm, replace=TRUE))) items[names(items) %in% types] }
Assemble 2 panels of 1-2 MST using the top-down design approach. Each route includes 40 items, and no item reuse is allowed. Maximize TIF over the [-1.64, 0] for the route 1M-2E and over [0, 1.64] for the route 1M-2H in hopes of covering the regioin [-1.64, 1.64] (90% of the population) jointly with adequate test information.
As for non-statistical constraints, each route is required to have 36 3PL items, 2 GPCM items, and 2 GRM items. Also, each route includes 9 to 11 items in each of the four content domains and has an average response time of 60 +/- 4 seconds per item.
Solve the model using lp_solve under a time limit of 5 minutes.
x <- mst(item_pool(), design='1-2', n_panels=2, method='topdown', test_len=40, max_use=1) x <- mst_objective(x, seq(-1.64, 0, length.out=3), 'max', indices=1) x <- mst_objective(x, seq(0, 1.64, length.out=3), 'max', indices=2) x <- mst_constraint(x, 'type', min=36, max=36, level='3PL') x <- mst_constraint(x, 'type', min=2, max=2, level='GPCM') x <- mst_constraint(x, 'type', min=2, max=2, level='GRM') for(i in 1:4) x <- mst_constraint(x, 'content', min=9, max=11, level=i) x <- mst_constraint(x, 'time', min=56*40, max=64*40) x <- mst_assemble(x, 'lpsolve', time_limit=60*5) # Plot the route information functions plot(x, byroute=TRUE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
# Plot the module information functions plot(x, byroute=FALSE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
To achieve a more balanced routing distribution, anchor the TIF intersection of Module 2E and 2M at 0 and solve the model again.
x <- mst_rdp(x, 0, 2:3, tol=.1) x <- mst_assemble(x, 'lpsolve', time_limit=60*5) # Plot the route information functions plot(x, byroute=TRUE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
# Plot the module information functions plot(x, byroute=FALSE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
Alternatively, the model can be solved by using GLPK.
x_glpk <- mst_assemble(x, 'glpk', time_limit=60*5) # Plot the route information functions plot(x_glpk, byroute=TRUE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
# Plot the module information functions plot(x_glpk, byroute=FALSE, label=TRUE) + geom_vline(xintercept=seq(-1.64, 1.64, length.out=3), linetype=2, color='gray60')
Examine the distribution of item type, content domain and average response time in each panel and route in the lp_solve solution.
rs <- NULL for(p in 1:x$n_panels) for(i in 1:x$n_routes) { items <- mst_get_items(x, panel_ix=p, route_ix=i) item_content <- rowSums(sapply(items, function(x) freq(x$content, 1:4)$freq)) names(item_content) <- paste('content', 1:4, sep='') item_type <- unlist(Map(nrow, items)) item_time <- mean(unlist(sapply(items, function(x) x$time))) rs <- rbind(rs, c(panel=p, route=i, item_content, item_type, time=item_time)) } rs
Examine the number of items in each module.
Map(function(x) mutate(x, module=paste(stage, label, sep='')) %>% select(id, panel, module, type), x$items) %>% Reduce(rbind, .) %>% group_by(panel, module) %>% summarise(n=n(), n_3pl=sum(type=='3PL'), n_gpcm=sum(type=='GPCM'), n_grm=sum(type=='GRM')) %>% as.data.frame()
Let's simulate the administration of these assembled two MST panels to 3,000 students drawn from the standard normal distribution. After finishing Stage 1, students are routed to Stage 2 using a fixed point theta=0.
true_t <- rnorm(3000, 0, 1) sims <- Map(function(t) mst_sim(x, t, rdp=list('stage2'=0)), true_t) rs <- Map(function(xx) { cbind(true=xx$true, est=xx$theta, panel=xx$admin$'3pl'$panel[1], se=xx$stats$se[x$n_stages], info=xx$stats$info[x$n_stages], route=x$module[xx$stats$route, c('stage', 'label')] %>% apply(., 1, paste, collapse='') %>% paste(., collapse='-'), n_items=sum(xx$stats$n_items)) }, sims) %>% Reduce(rbind, .) %>% data.frame(., stringsAsFactors=FALSE) rs$true <- as.numeric(rs$true) %>% round(., 4) rs$est <- as.numeric(rs$est) %>% round(., 4) rs$info <- as.numeric(rs$info) %>% round(., 4) rs$se <- as.numeric(rs$se) %>% round(., 4) rs$n_items <- as.integer(rs$n_items) rs$panel <- paste('Panel', rs$panel) # Panel usage freq(rs$panel) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Route usage freq(rs$route) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Compare true and estimated thetas with(rs, c(corr=cor(true, est), rmse=rmse(true, est))) %>% round(., 2) mutate(rs, lb=est-1.96*se, ub=est+1.96*se) %>% ggplot(aes(true, est, ymin=lb, ymax=ub, color=route)) + geom_linerange(color='skyblue',alpha=.5) + geom_point(alpha=.5) + facet_wrap(~panel) + xlab(expression(paste('True ', theta))) + ylab(expression(paste('Estimated ', theta))) + theme_bw() + scale_color_discrete(guide=guide_legend('Routes'))
Assemble 2 panels of 1-3 MST using the bottom-up design approach. Each module includes 20 items so that each route has 40 items, and no item reuse is allowed. Maximize TIF over the [-1.96, -0.65] for the easy route (1M-2E), over [-0.65, 0.65] for the moderate route (1M-2M), and over [0.65, 1.96] for the hard route (1M-2H). Together, the central region of the scale from -1.96 to 1.96 which includes 95% of the populatoin is supplied with high test information.
In addition, each module is required to comply with the following constraints:
The model is solved using lp_solve under a time limit of 5 minutes.
x <- mst(item_pool(), design='1-3', n_panels=2, method='bottomup', test_len=20, max_use=1) x <- mst_objective(x, seq(-.65, .65, length.out=3), 'max', indices=c(1,3)) x <- mst_objective(x, seq(-1.96, -0.65, length.out=3), 'max', indices=2) x <- mst_objective(x, seq(0.65, 1.96, length.out=3), 'max', indices=4) x <- mst_constraint(x, 'type', min=15, max=15, level='3PL') x <- mst_constraint(x, 'type', min=2, max=3, level='GPCM') x <- mst_constraint(x, 'type', min=2, max=3, level='GRM') for(i in 1:4) x <- mst_constraint(x, 'content', min=4, max=6, level=i) x <- mst_constraint(x, 'time', min=56*20, max=64*20) x <- mst_assemble(x, 'lpsolve', time_limit=60*5) # Plot the route functions plot(x, byroute=TRUE, label=TRUE) + geom_vline(xintercept=seq(-1.96, 1.96, length.out=3), linetype=2, color='gray60')
# Plot the module information functions plot(x, byroute=FALSE, label=TRUE) + geom_vline(xintercept=seq(-1.96, 1.96, length.out=3), linetype=2, color='gray60')
Examine the compliance of the solution with the content constraints.
Map(function(x) mutate(x, module=paste(stage, label, sep='')) %>% select(id, panel, module, type, content, time), x$items) %>% Reduce(rbind, .) %>% group_by(panel, module) %>% summarise(n=n(), n_3pl=sum(type=='3PL'), n_gpcm=sum(type=='GPCM'), n_grm=sum(type=='GRM'), time=round(mean(time), 2), content1=sum(content==1), content2=sum(content==2), content3=sum(content==3), content4=sum(content==4)) %>% as.data.frame()
After test assembly, administer the panels to 3,000 students drawn from the standard normal distribution using the maximum information routing rule.
true_t <- rnorm(3000, 0, 1) sims <- Map(function(t) mst_sim(x, t), true_t) rs <- Map(function(xx) { cbind(true=xx$true, est=xx$theta, panel=xx$admin$'3pl'$panel[1], se=xx$stats$se[x$n_stages], info=xx$stats$info[x$n_stages], route=x$module[xx$stats$route, c('stage', 'label')] %>% apply(., 1, paste, collapse='') %>% paste(., collapse='-'), n_items=sum(xx$stats$n_items)) }, sims) %>% Reduce(rbind, .) %>% data.frame(., stringsAsFactors=FALSE) rs$true <- as.numeric(rs$true) %>% round(., 4) rs$est <- as.numeric(rs$est) %>% round(., 4) rs$info <- as.numeric(rs$info) %>% round(., 4) rs$se <- as.numeric(rs$se) %>% round(., 4) rs$panel <- as.integer(rs$panel) rs$n_items <- as.integer(rs$n_items) # Panel usage freq(rs$panel) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Route usage freq(rs$route) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Compare true and estimated thetas with(rs, c(corr=cor(true, est), rmse=rmse(true, est))) %>% round(., 2) mutate(rs, lb=est-1.96*se, ub=est+1.96*se) %>% ggplot(aes(true, est, ymin=lb, ymax=ub, color=route)) + geom_linerange(color='skyblue', alpha=.5) + geom_point(alpha=.5) + facet_wrap(~panel) + xlab(expression(paste('True ', theta))) + ylab(expression(paste('Estimated ', theta))) + theme_bw() + scale_color_discrete(guide=guide_legend('Routes'))
Assemble 2 panels of 1-2-3 MST using the top-down approach. Two routes with capricious ability change (1M-2E-3H & 1M-2H-3E) are blocked. Items are allowed to be used up to four times. Each route needs to meet the following criteria:
To yield decent routing accuracy, the first two stages are required to include at least 8 items. The model is solved by lp_solve under a time limit of 5 minutes.
x <- mst(item_pool(), design='1-2-3', n_panels=2, method='topdown', test_len=40, max_use=4) x <- mst_route(x, c(1, 2, 6), "-") x <- mst_route(x, c(1, 3, 4), "-") x <- mst_objective(x, seq(-1.96, -0.64, length.out=3), 'max', indices=1) x <- mst_objective(x, seq(-0.64, 0.64, length.out=3), 'max', indices=2:3) x <- mst_objective(x, seq( 0.64, 1.96, length.out=3), 'max', indices=4) x <- mst_constraint(x, 'type', min=34, max=38, level='3PL') x <- mst_constraint(x, 'type', min=1, max=3, level='GPCM') x <- mst_constraint(x, 'type', min=1, max=3, level='GRM') for(i in 1:4) x <- mst_constraint(x, 'content', min=8, max=12, level=i) x <- mst_constraint(x, 'time', min=56*40, max=64*40) x <- mst_stage_length(x, 1:2, min=8) x <- mst_assemble(x, 'lpsolve', time_limit=60*5) # Plot the route information functions plot(x, byroute=TRUE, label=TRUE) + geom_vline(xintercept=seq(-1.96, 1.96, length.out=3), linetype=2, color='gray60') # Plot the module information functions plot(x, byroute=FALSE, label=TRUE) + geom_vline(xintercept=seq(-1.96, 1.96, length.out=3), linetype=2, color='gray60')
Examine the compliance of the assembly results with the content constraints.
rs <- NULL for(p in 1:x$n_panels) for(i in 1:x$n_routes) { items <- mst_get_items(x, panel_ix=p, route_ix=i) item_content <- rowSums(sapply(items, function(x) freq(x$content, 1:4)$freq)) names(item_content) <- paste('content', 1:4, sep='') item_type <- unlist(Map(nrow, items)) item_time <- mean(unlist(sapply(items, function(x) x$time))) rs <- rbind(rs, c(panel=p, route=i, item_content, item_type, time=item_time)) } rs Map(function(x) mutate(x, module=paste(stage, label, sep='')) %>% select(id, panel, module, type), x$items) %>% Reduce(rbind, .) %>% group_by(panel, module) %>% summarise(n=n(), n_3pl=sum(type=='3PL'), n_gpcm=sum(type=='GPCM'), n_grm=sum(type=='GRM')) %>% as.data.frame()
Administer the assembled panels to 3,000 students drawn from the standard normal distribution using the maximum information routing rule.
# simulation true_t <- rnorm(3000, 0, 1) sims <- Map(function(t) mst_sim(x, t), true_t) rs <- Map(function(xx) { cbind(true=xx$true, est=xx$theta, panel=xx$admin$'3pl'$panel[1], se=xx$stats$se[x$n_stages], info=xx$stats$info[x$n_stages], route=x$module[xx$stats$route, c('stage', 'label')] %>% apply(., 1, paste, collapse='') %>% paste(., collapse='-'), n_items=sum(xx$stats$n_items)) }, sims) %>% Reduce(rbind, .) %>% data.frame(., stringsAsFactors=FALSE) rs$true <- as.numeric(rs$true) %>% round(., 4) rs$est <- as.numeric(rs$est) %>% round(., 4) rs$info <- as.numeric(rs$info) %>% round(., 4) rs$se <- as.numeric(rs$se) %>% round(., 4) rs$panel <- as.integer(rs$panel) rs$n_items <- as.integer(rs$n_items) # Panel usage freq(rs$panel) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Route usage freq(rs$route) %>% mutate(perc=round(perc, 2), cum_perc=round(cum_perc, 2)) # Compare true and estimated thetas with(rs, c(corr=cor(true, est), rmse=rmse(true, est))) %>% round(., 2) mutate(rs, lb=est-1.96*se, ub=est+1.96*se) %>% ggplot(aes(true, est, ymin=lb, ymax=ub, color=route)) + geom_linerange(color='skyblue', alpha=.5) + geom_point(alpha=.5) + facet_wrap(~panel) + xlab(expression(paste('True ', theta))) + ylab(expression(paste('Estimated ', theta))) + theme_bw() + scale_color_discrete(guide=guide_legend('Routes'))
Assemble 2 panels of 1-3 MST using a hybrid assembly approach that sets constraints on routes but test information functions on modules. In particular, module 1M is required to have a flat TIF of 6 over the region from -1.28 to 1.28, module 2E, 2M, and 2H to hit the TIF target of 8 at -1.28, 0, and 1.28 respectively. Additionally, each route is required to have 4 to 6 items in each content domain and 20 items in total.
x <- mst(item_pool(), '1-3', n_panels=2, method='topdown', test_len=20, max_use=1) x <- mst_objective(x, seq(-1.28, 1.28, length.out=3), target=6, indices=1, method='bottomup') x <- mst_objective(x, -1.28, target=8, indices=2, method='bottomup') x <- mst_objective(x, 0, target=8, indices=3, method='bottomup') x <- mst_objective(x, 1.28, target=8, indices=4, method='bottomup') for(i in 1:4) x <- mst_constraint(x, 'content', min=4, max=6, level=i) x <- mst_assemble(x, 'lpsolve', time_limit=30) # plot the route information functions plot(x, byroute=FALSE, label=TRUE) # plot the module information functions plot(x, byroute=TRUE, label=TRUE) # check the content distribution rs <- NULL for(p in 1:x$n_panels) for(r in 1:x$n_routes){ x_items <- mst_get_items(x, panel_ix=p, route_ix=r) x_content <- Map(function(x) if(is.null(x)) rep(0, 4) else with(x, freq(content, 1:4)$freq), x_items) x_content <- colSums(Reduce(rbind, x_content)) names(x_content) <- paste('content', 1:4, sep='') rs <- rbind(rs, c(panel=p, route=r, x_content)) } rs
If you encounter a bug, please post a code example that exposes the bug on github. You can post your questions and feature requests on github or to the author.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.