knitr::opts_chunk$set(echo = TRUE)
This code reproduces the simulation study reported in Halpin P. F. and Bergner, Y. (2018) Psychometric models for small group collaborations. Please contact me at peter.halpin@unc.edu
with questions or feedback.
Note that the functions in this package currently do not check for or catch invalid arguments and will break (badly) if the inputs are not formatted as described in this example. Future versions of the package will address these issues, but for now it's user beware.
Set up sesssion.
# devtools::install_github("peterhalpin/scirt") # install package library(scirt)
The data-generating item paramers for the simulation are loaded with the package as the data.frame sim_parms
. The columns are the discrimination (alpha
) and difficulty (beta
) parameters of the 2PL model. These naming conventions are required for all parms
arguments in this package. There are 100 items on the groups test, each including the "COL"
suffix in the row names. There are 100 items on the individual test, each including the "IND"
suffix in the row names. The use of the "IND"
and "COL"
tags for items is required for all functions that expect a combined assessment (they are not necessary for functions that use either the individual test or collaborative test separately).
sim_parms[c(1:2, 100:102, 200), ]
The data.frame sim_data
contains the generated item responses, along with information about the groups. For more information on data generation see help(data_gen)
or inspect the source code of that function.
The first five colums of sim_data
indicate
Columns 6:105
contain the responses to the group test, and columns 106:205
contain the responses to the individual test.
For any function taking item response data as an argument (resp
) in this package, the columns containing response data for each test form need to be in the same order as they appear in the rows of the parms
argument (here, sim_parms
).
Additionally, each odd row of the item reponse data matrix for a combined assessment must contain the data for member 1 of a dyad, and the subsequent even row must contain the data for member 2 of that dyad. Note that for all columns except the responses to the individual assessment, the odd and even rows will be duplicated.
head(sim_data[c(1:7, 105:107, 205)])
Lastly, we have some additional data-generating parameters for the simulation.
n_obs <- nrow(sim_data) # n respondents K <- n_obs/2 # n groups odd <- seq(1, n_obs, by = 2) # indexes odd rows of sim_data n_items <-nrow(sim_parms)/2 # n items per form i <- 20 # n items for short form ind_names <- row.names(sim_parms)[grep("IND", row.names(sim_parms)) ] # individual test ind_names_short <- ind_names[sample(n_items, i)] # short form for individual test col_names <- row.names(sim_parms)[grep("COL", row.names(sim_parms)) ] # group test col_names_short <- col_names[sample(n_items, i)] # short form for group test
The function rsc
estimates the RSC model from a combined assessment. It also returns estimates of individual abiltiy based on the individual component of the assessment. Standard errors are computed by numerically inverting either the expected (default) or observed Hessian at the solution. See help(rsc)
for more details.
The following chunk runs rsc
in each of the four test length conditions obtained by crossing the long (100 items) and short (20 items) forms. Note that for the short forms, results will differ slightly from those reported in the paper due to random sampling of items.
The runtime for the long-long condition (200 items with 1000 respondents) is about 90 seconds for each estimator with a 2.4 GHz Intel Xeon (single core). For this sample size, parallelization (via parallel:mclapply
) is acutally slower than for a single core; however, the parallel
option of rsc
can lead to significant improvements in runtime for very large samples (e.g., when bootstrapping). If you don't want to wait around for this to run, try using a subset of rows from sim_data
(keeping in mind that for every odd row m
the row m + 1
must also be selected).
# long group test, long individual test ll_items <- c(col_names, ind_names) ml_ll <- rsc(sim_data[ll_items], sim_parms[ll_items, ], method = "ML") map_ll <- rsc(sim_data[ll_items], sim_parms[ll_items, ], method = "MAP") # long group test, short individual test ls_items <- c(col_names, ind_names_short) ml_ls <- rsc(sim_data[ls_items], sim_parms[ls_items, ], method = "ML") map_ls <- rsc(sim_data[ls_items], sim_parms[ls_items, ], method = "MAP") # short group test, long individual test sl_items <- c(col_names_short, ind_names) ml_sl <- rsc(sim_data[sl_items], sim_parms[sl_items, ], method = "ML") map_sl <- rsc(sim_data[sl_items], sim_parms[sl_items, ], method = "MAP") # short group test, short individual test ss_items <- c(col_names_short, ind_names_short) ml_ss <- rsc(sim_data[ss_items], sim_parms[ss_items, ], method = "ML") map_ss <- rsc(sim_data[ss_items], sim_parms[ss_items, ], method = "MAP")
The rest of this document sets up ggplot2
to reproduce Figure 1 and
Figure 2 from the paper. Note that for the ML estimator in the short test length conditions,u
can be estiamted very poorly. Those samples are omitted.
library(ggplot2) # format data gg <- rbind(ml_ll, map_ll, ml_sl, map_sl, ml_ls, map_ls, ml_ss, map_ss) gg$ind_form <- rep(c("Individual long", "Indvidual short"), each = K*4) gg$col_form <- rep(c("Group long", "Group short"), each = K*2) %>% rep(times = 2) gg$Method <- rep(c("ML", "MAP"), each = K) %>% rep(times = 4) gg$dgp_u <- rep(sim_data$u[odd], times = 8) gg$sample <- 0 gg$sample[sample(nrow(gg), nrow(gg)/10)] <- 1 # Omit MLEs that diverged to +/- inf # drops <- gg[abs(gg$u) > 5, ] # table(drops$ind_form, drops$col_form, drops$Method) gg <- gg[abs(gg$u) < 5, ] # Plot p <- ggplot(gg, aes(x = dgp_u, y = u, group = Method)) + geom_point(data = gg[gg$sample == 1,], size = 3, aes(shape = Method, color = Method)) + geom_smooth(se = F, size = .8, aes(linetype = Method, color = Method)) + scale_color_manual(values = c("grey10", "grey10")) + scale_shape_discrete(solid=F) + xlab("Data generating values") + ylab("Estimate") + geom_abline(slope = 1, intercept = 0, col = "grey50", size = 1.2) + theme_bw(base_size = 15) p + facet_grid(ind_form ~ col_form)
q <- ggplot(gg, aes(x = u, y = u_se, group = Method)) + geom_point(data = gg[gg$sample == 1,], size = 3, aes(shape = Method, color = Method)) + geom_smooth(se = F, size = .8, aes(linetype = Method, color = Method)) + scale_color_manual(values = c("grey10", "grey10")) + scale_shape_discrete(solid=F) + xlab("Estimate") + ylab("Standard error") + theme_bw(base_size = 15) + coord_cartesian(ylim=c(0,5), xlim=c(-4,4)) q + facet_grid(ind_form ~ col_form)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.