Description Usage Arguments Value Author(s) Examples
Facilitates data analysis using the software Conquest and/or TAM. It automatically
checks data for IRT consistency, generates Conquest syntax, label, anchor and data files or
corresponding TAM call for a single model specified by several arguments in R. Finally, an
R object is created which contain the required input for Conquest or TAM. To start the estimation,
call runModel
with the argument returned by defineModel
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | defineModel (dat, items, id, splittedModels = NULL,
irtmodel = c("1PL", "2PL", "PCM", "PCM2", "RSM", "GPCM", "2PL.groups", "GPCM.design", "3PL"),
qMatrix=NULL, DIF.var=NULL, HG.var=NULL, group.var=NULL, weight.var=NULL, anchor = NULL,
domainCol=NULL, itemCol=NULL, valueCol=NULL, check.for.linking = TRUE, minNperItem = 50, removeMinNperItem = FALSE,
boundary = 6, remove.boundary = FALSE, remove.no.answers = TRUE, remove.no.answersHG = TRUE,
remove.missing.items = TRUE, remove.constant.items = TRUE, remove.failures = FALSE,
remove.vars.DIF.missing = TRUE, remove.vars.DIF.constant = TRUE,
verbose=TRUE, software = c("conquest","tam"), dir = NULL, analysis.name,
schooltype.var = NULL, model.statement = "item", compute.fit = TRUE,
n.plausible=5, seed = NULL, conquest.folder=NULL,constraints=c("cases","none","items"),
std.err=c("quick","full","none"), distribution=c("normal","discrete"),
method=c("gauss", "quadrature", "montecarlo"), n.iterations=2000,
nodes=NULL, p.nodes=2000, f.nodes=2000,converge=0.001,deviancechange=0.0001,
equivalence.table=c("wle","mle","NULL"), use.letters=FALSE,
allowAllScoresEverywhere = TRUE, guessMat = NULL, est.slopegroups = NULL,
fixSlopeMat = NULL, slopeMatDomainCol=NULL, slopeMatItemCol=NULL, slopeMatValueCol=NULL,
progress = FALSE, increment.factor=1 , fac.oldxsi=0,
export = list(logfile = TRUE, systemfile = FALSE, history = TRUE,
covariance = TRUE, reg_coefficients = TRUE, designmatrix = FALSE))
|
dat |
A data frame containing all variables necessary for analysis. |
items |
Names or column numbers of variables with item responses. Item response values must
be numeric (i.e. 0, 1, 2, 3 ... ). Character values (i.e. A, B, C ... or a, b, c, ...)
are not allowed. Class of item columns are expected to be numeric or integer.
Columns of class |
id |
Name or column number of the identifier (ID) variable. |
splittedModels |
Optional: Object returned by 'splitModels'. Definition for multiple model handling. |
irtmodel |
Specification of the IRT model. The argument corresponds to the |
qMatrix |
Optional: A named data frame indicating how items should be grouped to dimensions. The first column contains the names of all items and should be named item. The other columns contain dimension definitions and should be named with the respective dimension names. A positive value (e.g., 1 or 2 or 1.4) indicates the loading weight with which an item loads on the dimension, a value of 0 indicates that the respective item does not load on this dimension. If no q matrix is specified by the user, an unidimensional structure is assumed. |
DIF.var |
Name or column number of one grouping variable for which differential item functioning analysis is to be done. |
HG.var |
Optional: Names or column numbers of one or more context variables (e.g., sex, school). These variables will be used for latent regression model in ConQuest or TAM. |
group.var |
Optional: Names or column numbers of one or more grouping variables. Descriptive statistics for WLEs and Plausible Values will be computed separately for each group in ConQuest. |
weight.var |
Optional: Name or column number of one weighting variable. |
anchor |
Optional: A named data frame with anchor parameters. The first column contains the
names of all items which are used to be anchor items and may be named item.
The second column contains anchor parameters. Anchor items can be a subset of the
items in the dataset and vice versa. If the data frame contains more than two
columns, columns must be named explicitly using the following arguments
|
domainCol |
Optional: Only necessary if the |
itemCol |
Optional: Only necessary if the |
valueCol |
Optional: Only necessary if the |
check.for.linking |
A logical value indicating whether the items in dataset are checked for being connected with each other via design. |
minNperItem |
Numerical: A message is printed on console if an item has less valid values than the number
defined in |
removeMinNperItem |
Logical: Remove items with less valid responses than defined in |
boundary |
Numerical: A message is printed on console if a subject has answered less than the number of items defined in boundary. |
remove.boundary |
Logical: Remove subjects who have answered less items than defined in the |
remove.no.answers |
Logical: Should persons without any item responses being removed prior to analysis? |
remove.no.answersHG |
Logical: Should persons without any responses on any background variable being removed prior to analysis? |
remove.missing.items |
Logical: Should items without any item responses being removed prior to analysis? |
remove.constant.items |
Logical: Should items without variance being removed prior to analysis? |
remove.failures |
Logical: Should persons without any correct item response (i.e., only “0” responses) being removed prior to analysis? |
remove.vars.DIF.missing |
Logical: Applies only in DIF analyses. Should items without any responses in at least one DIF group being removed prior to analyses? Note: Conquest may crash if these items remain in the data. |
remove.vars.DIF.constant |
Logical: Applies only in DIF analyses. Should items without variance in at least one DIF group being removed prior to analyses? Note: Conquest may crash if these items remain in the data. |
verbose |
A logical value indicating whether messages are printed on the R console. |
software |
The desired estimation software for the analysis. |
dir |
The directory in which the output will be written to. If |
analysis.name |
A character string specifying the analysis name. If |
schooltype.var |
Optional: Name or column number of the variable indicating the school type (e.g. academic track, non-academic track). Only necessary if p values should be computed for each school type separately. |
model.statement |
Optional: Applies only if |
compute.fit |
Applies only if |
n.plausible |
The number of plausible values which are to be drawn from the conditioning model. |
seed |
Optional: Set seed value for analysis. |
conquest.folder |
Applies only if |
constraints |
A character string specifying how the scale should be constrained. Possible options
are "cases" (default), "items" and "none". When anchor parameter are specified in
anchor, constraints will be set to "none" automatically. In |
std.err |
Applies only if |
distribution |
Applies only if |
method |
A character string indicating which method should be used for analysis. Possible
options are "gauss" (default), "quadrature" and "montecarlo". See ConQuest manual
pp.225 for details on these methods. When using |
n.iterations |
An integer value specifying the maximum number of iterations for which estimation will proceed without improvement in the deviance. |
nodes |
An integer value specifying the number of nodes to be used in the analysis. The
default value is 20. When using |
p.nodes |
Applies only if |
f.nodes |
Applies only if |
converge |
An integer value specifiying the convergence criterion for parameter estimates. The estimation will terminate when the largest change in any parameter estimate between successive iterations of the EM algorithm is less than converge. The default value is 0.001. |
deviancechange |
An integer value specifiying the convergence criterion for the deviance. The estimation will terminate when the change in the deviance between successive iterations of the EM algorithm is less than deviancechange. The default value is 0.0001. |
equivalence.table |
Applies only if |
use.letters |
Applies only if |
allowAllScoresEverywhere |
Applies only if |
guessMat |
Applies only if |
est.slopegroups |
Applies only if |
fixSlopeMat |
Applies only if |
slopeMatDomainCol |
Optional: Only necessary if the |
slopeMatItemCol |
Optional: Only necessary if the |
slopeMatValueCol |
Optional: Only necessary if the |
progress |
Applies only if |
increment.factor |
Applies only if |
fac.oldxsi |
Applies only if |
export |
Applies only if |
A list which contains information about the desired estimation. The list is intended for
further processing via runModel
. Structure of the list varies depending on
whether multiple models were called using splitModels
or not. If splitModels
was called,
the number of elements in the list equals the number of models defined via splitModels
. Each
element in the list is a list with various elements:
software |
Character string of the software which is intended to use for the further estimation, i.e. "conquest" or "tam" |
qMatrix |
The Q matrix allocating items to dimensions. |
all.Names |
Named list of all relevant variables of the data set. |
dir |
Character string of the directory the results are to be saved. |
analysis.name |
Character string of the analysis' name. |
deskRes |
Data frame with descriptives (e.g., p values) of the test items. |
discrim |
Data frame with item discrimination values. |
perNA |
The person identifiers of examinees which are excluded from the analysis due to solely missing values. |
per0 |
The person identifiers of examinees which have solely false responses. if |
perA |
The person identifiers of examinees which have solely correct responses. |
perExHG |
The person identifiers of examinees which are excluded from the analysis due to missing values on explicit variables. |
itemsExcluded |
Character string of items which were excluded, for example due to zero variance or solely missing values. |
If software == "conquest", the output additionally includes the following elements:
input |
Character string of the path with Conquest input (cqc) file. |
conquest.folder |
Character string of the path of the conquest executable file. |
model.name |
Character string of the model name. |
If software == "tam", the output additionally includes the following elements:
anchor |
Optional: data frame of anchor parameters (if anchor parameters were defined). |
daten |
The prepared data for TAM analysis. |
irtmodel |
Character string of the used IRT model. |
est.slopegroups |
Applies for 2pl modeling. Information about which items share a common slope parameter. |
guessMat |
Applies for 3pl modeling. Information about which items share a common guessing parameter. |
control |
List of control parameters for TAM estimation. |
n.plausible |
Desired number of plausible values. |
Sebastian Weirich
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 | ################################################################################
### Preparation: necessary for all examples ###
################################################################################
# load example data
# (these are simulated achievement test data)
data(sciences)
# first reshape the data set into wide format
datW <- reshape2::dcast(sciences, id+grade+sex~variable, value.var="value")
# second, create the q matrix from the long format data frame
qMat <- sciences[ which( sciences[,"subject"] == "biology") ,c("variable","domain")]
qMat <- qMat[!duplicated(qMat[,1]),]
qMat <- data.frame ( qMat[,1,drop=FALSE], knowledge = as.numeric(qMat[,"domain"] == "knowledge"),
procedural = as.numeric(qMat[,"domain"] == "procedural"))
## Not run:
################################################################################
### Example 1: Unidimensional Rasch Model ###
################################################################################
# Example 1: define and run a unidimensional Rasch model with all variables in dataset
# using "Conquest". Note: if software="conquest", the path of the windows executable
# ConQuest console must be specified by setting conquest.folder = "<path_to_your_conquest.exe>"
# defining the model: specifying q matrix is not necessary
mod1 <- defineModel(dat=datW, items= -c(1:3), id="id", analysis.name = "unidim",
conquest.folder = "N:/console_Feb2007.exe", dir = "N:/test")
# run the model
run1 <- runModel(mod1)
# get the results
res1 <- getResults(run1)
# extract the item parameters from the results object
item <- itemFromRes ( res1 )
################################################################################
### Example 1a: Unidimensional Rasch Model with DIF estimation ###
################################################################################
# nearly the same procedure as in example 1. Using 'sex' as DIF variable
# note that 'sex' is a factor variable here. Conquest needs all explicit variables
# to be numeric. Variables will be automatically transformed to numeric by
# 'defineModels'. However, it might be the better idea to transform the variable
# manually.
datW[,"sexNum"] <- car::recode ( datW[,"sex"] , "'male'=0; 'female'=1",
as.factor.result = FALSE)
# as we have defined a new variable ('sexNum') in the data, it is a good idea
# to explicitly specify item columns ... instead of saying 'items= -c(1:3)' which
# means: Everything except column 1 to 3 are item columns
items<- grep("^Bio|^Che|^Phy", colnames(datW))
# Caution: two items ("ChePro48", "PhyPro01") are excluded because they are
# constant in one of the DIF groups
mod1a<- defineModel(dat=datW, items= items, id="id", DIF.var = "sexNum",
analysis.name = "unidimDIF", conquest.folder = "N:/console_Feb2007.exe",
dir = "N:/temp")
# run the model
run1a<- runModel(mod1a)
# get the results
res1a<- getResults(run1a)
################################################################################
### Example 2a: Multidimensional Rasch Model with anchoring ###
################################################################################
# Example 2a: running a multidimensional Rasch model on a subset of items with latent
# regression (sex). Use item parameter from the first model as anchor parameters
# use only biology items from both domains (procedural/knowledge)
# read in anchor parameters from the results object of the first example
aPar <- itemFromRes ( res1 )
aPar <- aPar[,c("item", "est")]
# defining the model: specifying q matrix now is necessary.
# Please note that all latent regression variables have to be of class numeric.
# If regression variables are factors, dummy variables automatically will be used.
# (This behavior is equivalent as in lm() for example.)
mod2a<- defineModel(dat=datW, items= qMat[,1], id="id", analysis.name = "twodim",
qMatrix = qMat, HG.var = "sex", anchor = aPar, n.plausible = 20,
conquest.folder = "N:/console_Feb2007.exe", dir = "N:/test")
# run the model
run2a<- runModel(mod2a)
# get the results
res2a<- getResults(run2a)
################################################################################
### Example 2b: Multidimensional Rasch Model with equating ###
################################################################################
# Example 2b: running a multidimensional Rasch model on a subset of items
# defining the model: specifying q matrix now is necessary.
mod2b<- defineModel(dat=datW, items= qMat[,1], id="id", analysis.name = "twodim2",
qMatrix = qMat, n.plausible = 20, conquest.folder = "N:/console_Feb2007.exe",
dir = "N:/test")
# run the model
run2b<- runModel(mod2b)
# get the results
res2b<- getResults(run2b)
### equating (wenn nicht verankert)
eq2b <- equat1pl( results = res2b, prmNorm = aPar)
### transformation to the 'bista' metric: needs reference population definition
ref <- data.frame ( domain = c("knowledge", "procedural"), m = c(0.078, -0.175), sd= c(1.219, 0.799))
cuts <- list ( knowledge = list ( values = c(380,540)), procedural = list ( values = c ( 410, 550)))
tf2b <- transformToBista ( equatingList = eq2b, refPop = ref, cuts = cuts)
################################################################################
### Example 3: Multidimensional Rasch Model in TAM ###
################################################################################
# Example 3: the same model in TAM
# we use the same anchor parameters from example 1
# estimate model 2 with latent regression and anchored parameters in TAM
# specification of an output folder (via 'dir' argument) no longer necessary
mod2T<- defineModel(dat=datW, items= qMat[,1], id="id", qMatrix = qMat,
HG.var = "sex", anchor = aPar, software = "tam")
# run the model
run2T<- runModel(mod2T)
# Object 'run2T' is of class 'tam.mml'
class(run2T)
# the class of 'run2T' corresponds to the class defined by the TAM package; all
# functions of the TAM package intended for further processing (e.g. drawing
# plausible values, plotting deviance change etc.) work, for example:
wle <- tam.wle(run2T)
# Finally, the model result are collected in a single data frame
res2T<- getResults(run2T)
## End(Not run)
################################################################################
### Example 4: define und run multiple models defined by 'splitModels' ###
################################################################################
# Example 4: define und run multiple models defined by 'splitModels'
# Model split is possible for different groups of items (i.e. domains) and/or
# different groups of persons (for example, federal states within Germany)
# define person grouping
pers <- data.frame ( idstud = datW[,"id"] , group1 = datW[,"sex"],
group2 = datW[,"grade"], stringsAsFactors = FALSE )
# define 18 models, splitting according to person groups and item groups separately
# by default, multicore processing is applied
l1 <- splitModels ( qMatrix = qMat, person.groups = pers, nCores = 1)
# apply 'defineModel' for each of the 18 models in 'l1'
modMul<- defineModel(dat = datW, items = qMat[,1], id = "id",
check.for.linking = TRUE, splittedModels = l1, software = "tam")
# run all models
runMul<- runModel(modMul)
# get results of all models
resMul<- getResults(runMul)
################################################################################
### Example 5: Linking and equating for multiple models ###
################################################################################
# Example 5: define und run multiple models according to different domains (item groups)
# and further linking/equating. This example mimics the routines necessary for the
# 'Vergleichsarbeiten' at the Institute of Educational Progress (IQB)
# specify two models according to the two domains 'knowledge' and 'procedural'
l2 <- splitModels ( qMatrix = qMat, nCores = 1)
# define 2 models
mods <- defineModel(dat = datW, id = "id", check.for.linking = TRUE,
splittedModels = l2, software = "tam")
# run 2 models
runs <- runModel(mods)
# get the results
ress <- getResults(runs)
# only for illustration, we create arbitrary 'normed' parameters for anchoring
prmNrm<- itemFromRes(ress)[ sample ( 1:56, 31,FALSE) ,c("item", "est")]
prmNrm[,"est"] <- prmNrm[,"est"] - 0.6 + rnorm ( 31, 0, 0.75)
# anchoring without exclusion of linking DIF items (DIF items will only be identified)
anch <- equat1pl ( results = ress, prmNorm = prmNrm, excludeLinkingDif = FALSE,
difBound = 0.6)
# anchoring with exclusion of linking DIF items
anch2 <- equat1pl ( results = ress, prmNorm = prmNrm, excludeLinkingDif = TRUE,
difBound = 0.6, iterativ = FALSE)
# anchoring with iterative exclusion of linking DIF items
anch3 <- equat1pl ( results = ress, prmNorm = prmNrm, excludeLinkingDif = TRUE,
difBound = 0.6, iterativ = TRUE)
# transformation to the Bista metric
# first we arbitrarily define mean and standard deviation of the reference
# population according to both dimensions (defined in the Q matrix):
# procedural and knowledge
# Note that the the first column of the 'refPop' data frame must include the
# domain names. Domain names must match the names defined in the Q matrix
refPop<- data.frame ( domain = c("procedural", "knowledge"), m = c(0.122, -0.047),
sd = c(0.899, 1.121))
# second, we specify a list with cut scores. Values must be in ascending order.
# Labels of the competence stages are optional. If no labels are specified,
# the will be defaulted to 1, 2, 3 ... etc.
# Note: if labels are specified, there must be one label more than cut scores.
# (i.e. 4 cut scores need 5 labels, etc.)
cuts <- list ( procedural = list ( values = c(380, 420, 500, 560)) ,
knowledge = list ( values = 400+0:2*100, labels = c("A1", "A2", "B1", "B2")))
# transformation
dfr <- transformToBista ( equatingList = anch3, refPop = refPop, cuts=cuts )
head(dfr$itempars)
head(dfr$personpars)
################################################################################
### Example 5a: Linking for multiple models, including global domain ###
################################################################################
# Example 5a: define und run multiple models according to different domains (item groups)
# and further linking/equating. Same as example 5, but extended for the 'global'
# domain.
# add the 'global' domain in the Q matrix
qMat2 <- qMat
qMat2[,"global"] <- 1
# specify two models according to the two domains 'knowledge' and 'procedural'
l3 <- splitModels ( qMatrix = qMat2, nCores = 1)
# define 2 models
mods3 <- defineModel(dat = datW, id = "id", check.for.linking = TRUE,
splittedModels = l3, software = "tam")
# run 2 models
runs3 <- runModel(mods3)
# get the results
res3 <- getResults(runs3)
# only for illustration, we create arbitrary 'normed' parameters for anchoring.
# Each item now has to item parameter: one is domain-specific, one is the global
# item parameter. Hence, each item occurs two times in 'prmNrm'
prmNrm<- itemFromRes(ress)[ sample ( 1:56, 31,FALSE) ,c("dimension", "item", "est")]
prmNrm[,"est"] <- prmNrm[,"est"] - 0.6 + rnorm ( 31, 0, 0.75)
prmGlo<- prmNrm
prmGlo[,"dimension"] <- "global"
prmNrm<- rbind ( prmNrm, prmGlo)
# if the item identifier in 'prmNrm' is not unique, 'equat1pl' has to know which
# parameter in 'prmNrm' belongs to which dimension/domain. Hence, the dimension
# is added to 'prmNrm'.
# anchoring: if 'prmNrm' has more than 2 columns, the columns of 'prmNrm' must be
# specified in 'equat1pl'
anch3 <- equat1pl ( results = res3, prmNorm = prmNrm, item = "item", value = "est",
domain = "dimension", excludeLinkingDif = FALSE, difBound = 0.6)
# transformation to the Bista metric
# first we arbitrarily define mean and standard deviation of the reference
# population according to the three dimensions (defined in the Q matrix):
# procedural, knowledge, and global
# Note that the the first column of the 'refPop' data frame must include the
# domain names. Domain names must match the names defined in the Q matrix
refPop<- data.frame ( domain = c("procedural", "knowledge", "global"),
m = c(0.122, -0.047, 0.069), sd = c(0.899, 1.121, 1.015))
# second, we specify a list with cut scores. Values must be in ascending order.
# Labels of the competence stages are optional. If no labels are specified,
# the will be defaulted to 1, 2, 3 ... etc.
# Note: if labels are specified, there must be one label more than cut scores.
# (i.e. 4 cut scores need 5 labels, etc.)
cuts <- list ( procedural = list ( values = c(380, 420, 500, 560)) ,
knowledge = list ( values = 400+0:2*100, labels = c("A1", "A2", "B1", "B2")),
global = list ( values = 400+0:2*100, labels = c("A1", "A2", "B1", "B2")))
# transformation
dfr <- transformToBista ( equatingList = anch3, refPop = refPop, cuts=cuts )
head(dfr$itempars)
head(dfr$personpars)
################################################################################
### Example 6: Linking and equating for multiple models (II) ###
################################################################################
# Example 6 and 6a: define und run multiple models according to different domains
# (item groups) and different person groups. This example mimics the routines
# necessary for the 'Laendervergleich/Bildungstrend' at the Institute for
# Educational Progress (IQB). Example 6 demonstrates routines without trend
# estimation. Example 6a demonstrates routines with trend estimation---hence,
# example 6 mimics time of measurement 't1', example 6a mimics time of measurement
# 't2'.
# Preparation: assume time of measurement 't1' corresponds to the year 2003.
datT1<- reshape2::dcast(subset ( sciences, year == 2003),
formula = id+grade+sex+country~variable, value.var="value")
# First step: item calibration in separate unidimensional models for each domain
modsT1<- splitModels ( qMatrix = qMat, nCores = 1)
# define 2 models. Note: not all items of the Q matrix are present in the data.
# Items which occur only in the Q matrix will be ignored.
defT1 <- defineModel(dat = datT1, id = "id", check.for.linking = TRUE,
splittedModels = modsT1, software = "tam")
# run 2 models
runT1 <- runModel(defT1)
# get the results
resT1 <- getResults(runT1)
# extract item parameters from the 'results' object
itemT1<- itemFromRes(resT1)
# Second step: drawing plausible values. Two-dimensional model is specified for
# each person group with fixed item parameters. Moreover, a latent regression
# model is used (in the actual 'Laendervergleich', regressors are principal
# components).
# create arbitrary principal components
for ( i in c("PC1", "PC2", "PC3") ) {
datT1[,i] <- rnorm( n = nrow(datT1), mean = 0, sd = 1.2)
}
# number of extracted principal components vary: three components for Berlin,
# two for Bavaria. Hence, Bavaria has no valid values on 'PC3'.
datT1[which(datT1[,"country"] == "Bavaria"),"PC3"] <- NA
# define person grouping
pers <- data.frame ( idstud = datT1[,"id"] , country = datT1[,"country"])
# Running second step: split models according to person groups
# ('all.persons' must be FALSE, otherwise the whole group would be treated as
# a separate distinct group.)
modT1P<- eatModel::splitModels ( person.groups = pers , all.persons = FALSE, nCores = 1)
# define the 2 country-specific 2-dimensional models, specifying latent regression
# model and fixed item parameters.
defT1P<- defineModel(dat = datT1, items = itemT1[,"item"], id = "id",
check.for.linking = TRUE, splittedModels = modT1P, qMatrix = qMat,
anchor = itemT1[,c("item", "est")], HG.var = c("PC1", "PC2", "PC3"),
software = "tam")
# run the 2 models
runT1P<- runModel(defT1P)
# get the results
resT1P<- getResults(runT1P)
# equating is not necessary, as the models run with fixed item parameters
# However, to prepare for the transformation on the 'bista' metric, run
# 'equat1pl' with empty arguments
ankT1P<- equat1pl ( results = resT1P)
# transformation to the 'bista' metric
# Note: if the sample was drawn from the reference population, mean and SD
# are not yet known. So we ignore the 'refPop' argument in 'transformToBista'
# and simply define the cut scores.
cuts <- list ( procedural = list ( values = c(380, 500, 620)) ,
knowledge = list ( values = 400+0:2*100, labels = c("A1", "A2", "B1", "B2")))
# transformation
dfrT1P<- transformToBista ( equatingList = ankT1P, cuts=cuts )
################################################################################
### Example 6a: Extend example 6 with trend estimation ###
################################################################################
# Example 6a needs the objects created in example 6
# Preparation: assume time of measurement 't2'.
datT2<- reshape2::dcast(subset ( sciences, year == 2013),
formula = id+grade+country~variable, value.var="value")
# First step: item calibration in separate unidimensional models for each domain
modsT2<- eatModel::splitModels ( qMatrix = qMat, nCores = 1)
# define 2 models. Items which occur only in the Q matrix will be ignored.
defT2 <- defineModel(dat = datT2, id = "id", check.for.linking = TRUE,
splittedModels = modsT2, software = "tam")
# run 2 models
runT2 <- runModel(defT2)
# get the results
resT2 <- getResults(runT2)
# collect item parameters
itemT2<- itemFromRes(resT2)
# Second step: compute linking constant between 't1' and 't2' with the exclusion
# of linking DIF items and computation of linking error according to a jackknife
# procedure. We use the 'itemT1' object created in example 6. To demonstrate the
# jackknife procedure, we create an arbitrary 'testlet' variable in 'itemT1'. The
# testlet variable indicates items which belong to a common stimulus. The linking
# procedure is executed simultaneously for procedural and knowledge.
itemT1[,"testlet"] <- substr(as.character(itemT1[,"item"]), 1, 7)
L.t1t2<- equat1pl ( results = resT2, prmNorm = itemT1, item = "item",
testlet = "testlet", value = "est", excludeLinkingDif = TRUE,
difBound = 1)
# Third step: transform item parameters of 't2' to the metric of 't1'
# We now need to specify the 'refPop' argument. We use the values from 't1' which
# serves as the reference.
ref <- dfrT1P[["refPop"]]
T.t1t2<- transformToBista ( equatingList = L.t1t2, refPop=,ref, cuts = cuts)
# The object 'T.t1t2' now contains transformed person and item parameters with
# original and transformed linking errors. See for example:
head(T.t1t2$personpars)
# Fourth step: drawing plausible values for 't2'. We use the transformed item
# parameters (captured in 'T.t1t2') for anchoring
# create arbitrary principal components
for ( i in c("PC1", "PC2", "PC3", "PC4") ) {
datT2[,i] <- rnorm( n = nrow(datT2), mean = 0, sd = 1.2)
}
# number of extracted principal components vary: four components for Berlin,
# three for Bavaria. Hence, Bavaria has no valid values on 'PC4'.
datT2[which(datT2[,"country"] == "Bavaria"),"PC4"] <- NA
# define person grouping
persT2<- data.frame ( idstud = datT2[,"id"] , country = datT2[,"country"])
# Running second step: split models according to person groups (countries)
# ('all.persons' must be FALSE, otherwise the whole group would be treated as
# a separate distinct group.)
modT2P<- eatModel::splitModels ( person.groups = persT2 , all.persons = FALSE, nCores = 1)
# define the 2 country-specific 2-dimensional models, specifying latent regression
# model and fixed item parameters. We used the transformed item parameters (captured
# in 'T.t1t2[["itempars"]]' --- using the 'estTransf' column) for anchoring.
defT2P<- defineModel(dat = datT2, items = itemT2[,"item"], id = "id",
check.for.linking = TRUE, splittedModels = modT2P, qMatrix = qMat,
anchor = T.t1t2[["itempars"]][,c("item", "estTransf")],
HG.var = c("PC1", "PC2", "PC3", "PC4"), software = "tam")
# run the 2 models
runT2P<- runModel(defT2P)
# get the results
resT2P<- getResults(runT2P)
# equating is not necessary, as the models run with fixed item parameters
# However, to prepare for the transformation on the 'bista' metric, run
# 'equat1pl' with empty arguments
ankT2P<- equat1pl ( results = resT2P)
# transformation to the 'bista' metric, using the previously defined cut scores
dfrT2P<- transformToBista ( equatingList = ankT2P, refPop=ref, cuts=cuts )
# prepare data for jackknifing and trend estimation via 'eatRep'
dTrend<- prepRep ( calibT2 = T.t1t2, bistaTransfT1 = dfrT1P, bistaTransfT2 = dfrT2P,
makeIdsUnique = FALSE)
################################################################################
### Example 6b: trend analyses (jk2.mean) ###
################################################################################
# Example 6b needs the objects created in example 6a
# We use the 'dTrend' object to perform some trend analyses.
# load the 'eatRep' package ... note: needs eatRep version 0.9.2 or higher
library(eatRep)
# merge background variables from original data to the 'dTrend' frame
# first reshape 'sciences' into wide format and create 'class' variable
sw <- reshape2::dcast(sciences, id+year+wgt+jkzone+jkrep+country+grade+sex~1,
value.var="value")
dTrend<- merge(sw, dTrend, by = "id", all.x = FALSE, all.y = TRUE)
dTrend[,"idclass"] <- substr(as.character(dTrend[,"id"]),1,2)
# compute means for both countries without trend, only for domain 'knowledge'
# create subsample
subSam<- dTrend[intersect(which(dTrend[,"dimension"] == "knowledge"),which(dTrend[,"year"] == 2003)),]
m01 <- jk2.mean(datL = subSam, ID="id", imp = "imp", groups = "model",
dependent = "valueTransfBista")
r01 <- report(m01, add = list(domain = "knowledge"))
# same example as before, now additionally using weights
m02 <- jk2.mean(datL = subSam, ID="id", imp = "imp", groups = "model",
wgt = "wgt", dependent = "valueTransfBista")
r02 <- report(m02, add = list(domain = "knowledge"))
# now additionally using replication methods (jk2)
m03 <- jk2.mean(datL = subSam, ID="id", imp = "imp", groups = "model", type = "jk2",
wgt = "wgt", PSU = "jkzone", repInd = "jkrep", dependent = "valueTransfBista")
r03 <- report(m03, add = list(domain = "knowledge"))
# additionally: sex differences in each country, using 'group.differences.by' argument
m04 <- jk2.mean(datL = subSam, ID="id", imp = "imp", groups = c("sex", "model"),
group.differences.by = "sex", type = "jk2",wgt = "wgt", PSU = "jkzone",
repInd = "jkrep", dependent = "valueTransfBista")
r04 <- report(m04, add = list(domain = "knowledge"))
# additionally: differ the sex-specific means in each country from the sex-specific means
# in the whole population? Are the differences (male vs. female) in each country different
# from the difference (male vs. female) in the whole population?
m05 <- jk2.mean(datL = subSam, ID="id", imp = "imp", groups = c("sex", "model"),
group.differences.by = "sex", cross.differences = TRUE, type = "jk2",wgt = "wgt",
PSU = "jkzone", repInd = "jkrep", dependent = "valueTransfBista")
r05 <- report(m05, add = list(domain = "knowledge"))
# additionally: trend estimation for each country- and sex-specific mean, each country-
# specific sex differences and each difference between country-specific sex difference
# and the sex difference in the whole population
# create a new sub sample with both---the data of 2003 and 2013 ... only for domain
# 'knowledge'. Note: if no linking error is defined, linking error of 0 is assumed.
# (Due to unbalanced sample data, we switch to 'jk1' method for the remainder of 6b.)
subS2 <- dTrend[which(dTrend[,"dimension"] == "knowledge"),]
m06 <- jk2.mean(datL = subS2, ID="id", imp = "imp", groups = c("sex", "model"),
group.differences.by = "sex", cross.differences = TRUE, type = "jk1",wgt = "wgt",
PSU = "idclass", trend = "year", linkErr = "trendErrorTransfBista",
dependent = "valueTransfBista")
r06 <- report(m06, trendDiffs = TRUE, add = list(domain = "knowledge"))
# additionally: repeat this analysis for both domains, 'knowledge' and 'procedural',
# using a 'by'-loop. Now we use the whole 'dTrend' data instead of subsamples
m07 <- by ( data = dTrend, INDICES = as.character(dTrend[,"dimension"]),
FUN = function ( subdat ) {
m07a <- jk2.mean(datL = subdat, ID="id", imp = "imp", groups = c("sex", "model"),
group.differences.by = "sex", cross.differences = TRUE, type = "jk1",wgt = "wgt",
PSU = "idclass", trend = "year", linkErr = "trendErrorTransfBista",
dependent = "valueTransfBista")
return(m07a)})
r07 <- lapply(names(m07), FUN = function (domain) {report(m07[[domain]], trendDiffs = TRUE, add = list(domain = domain))})
r07 <- do.call("rbind", r07)
################################################################################
### Example 6c: trend analyses (jk2.table) ###
################################################################################
# Example 6c needs the objects created in example 6a. Additionally, the merged
# 'dTrend' frame created in Example 6a and augmented in 6b is necessary.
# load the 'eatRep' package ... note: needs eatRep version 0.9.2 or higher
library(eatRep)
# compute frequencies for trait levels, only for domain 'knowledge', without trend
# create 'knowledge' subsample
subSam<- dTrend[intersect(which(dTrend[,"dimension"] == "knowledge"),which(dTrend[,"year"] == 2003)),]
freq01<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = "model",
dependent = "traitLevel")
res01 <- report(freq01, add = list(domain = "knowledge"))
# same example as before, now additionally using weights
freq02<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = "model",
wgt = "wgt", dependent = "traitLevel")
res02 <- report(freq02, add = list(domain = "knowledge"))
# now additionally using replication methods (jk2)
freq03<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = "model", type = "jk2",
wgt = "wgt", PSU = "jkzone", repInd = "jkrep", dependent = "traitLevel")
res03 <- report(freq03, add = list(domain = "knowledge"))
# additionally: sex differences in each country, using 'group.differences.by' argument
# Note: for frequency tables group differences may result in a chi square test or in
# a difference of each categories' frequency.
# first: request chi square test
freq04<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = c("model", "sex"),
type = "jk2", group.differences.by = "sex", chiSquare = TRUE, wgt = "wgt",
PSU = "jkzone", repInd = "jkrep", dependent = "traitLevel")
res04 <- report(freq04, add = list(domain = "knowledge"))
# now request differences for each trait level category
freq05<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = c("model", "sex"),
type = "jk2", group.differences.by = "sex", chiSquare = FALSE, wgt = "wgt",
PSU = "jkzone", repInd = "jkrep", dependent = "traitLevel")
res05 <- report(freq05, add = list(domain = "knowledge"))
# additionally: differ the sex-specific means in each country from the sex-specific means
# in the whole population? Are the differences (male vs. female) in each country different
# from the difference (male vs. female) in the whole population?
freq06<- jk2.table(datL = subSam, ID="id", imp = "imp", groups = c("model", "sex"),
type = "jk2", group.differences.by = "sex", cross.differences = TRUE, chiSquare = FALSE,
wgt = "wgt", PSU = "jkzone", repInd = "jkrep", dependent = "traitLevel")
res06 <- report(freq06, add = list(domain = "knowledge"))
# additionally: trend estimation for each country- and sex-specific mean, each country-
# specific sex differences and each difference between country-specific sex difference
# and the sex difference in the whole population
# create a new sub sample with both---the data of 2003 and 2013 ... only for domain
# 'knowledge'. Note: if no linking error is defined, linking error of 0 is assumed.
# (Due to unbalanced sample data, we switch to 'jk1' method for the remainder of 6c.)
subS2 <- dTrend[which(dTrend[,"dimension"] == "knowledge"),]
freq07<- jk2.table(datL = subS2, ID="id", imp = "imp", groups = c("model", "sex"),
type = "jk1", group.differences.by = "sex", cross.differences = TRUE, chiSquare = FALSE,
wgt = "wgt", PSU = "idclass", trend = "trend", linkErr = "trendErrorTraitLevel", dependent = "traitLevel")
res07 <- report(freq07, add = list(domain = "knowledge"))
# additionally: repeat this analysis for both domains, 'knowledge' and 'procedural',
# using a 'by'-loop. Now we use the whole 'dTrend' data instead of subsamples
freq08<- by ( data = dTrend, INDICES = as.character(dTrend[,"dimension"]),
FUN = function ( subdat ) {
f08 <- jk2.table(datL = subdat, ID="id", imp = "imp", groups = c("model", "sex"),
type = "jk1", group.differences.by = "sex", cross.differences = TRUE, chiSquare = FALSE,
wgt = "wgt", PSU = "idclass", trend = "trend",
linkErr = "trendErrorTraitLevel", dependent = "traitLevel")
return(f08)})
res08 <- lapply(names(freq08), FUN = function (domain) { report(freq08[[domain]], add = list(domain = domain))})
res08 <- do.call("rbind", res08)
################################################################################
### Example 6d: trend analyses (jk2.glm) ###
################################################################################
# Example 6c needs the objects created in example 6a. Additionally, the merged
# 'dTrend' frame created in Example 6a and augmented in 6b is necessary.
# load the 'eatRep' package ... note: needs eatRep version 0.9.2 or higher
library(eatRep)
# regress procedural compentence on knowledge competence ... it's necessary to
# reshape the data
datGlm<- reshape2::dcast(dTrend, value.var = "valueTransfBista",
formula = id+imp+wgt+jkzone+jkrep+idclass+model+trend+sex~dimension)
# first example: only for year 2003
dat03 <- datGlm[which(datGlm[,"trend"] == "T1"),]
m08 <- jk2.glm(datL = dat03, ID="id", imp="imp", wgt="wgt", PSU="jkzone",
repInd = "jkrep", type = "jk2", formula = procedural~knowledge)
res08 <- report(m08)
# compute regression with two regressors separately for each country
m09 <- jk2.glm(datL = dat03, ID="id", imp="imp", wgt="wgt", PSU="jkzone",
repInd = "jkrep", type = "jk2", groups = "model",
formula = procedural~sex+knowledge)
res09 <- report(m09)
# differ country-specific regression coefficients from the regression coefficents
# in the whole population?
m10 <- jk2.glm(datL = dat03, ID="id", imp="imp", wgt="wgt", PSU="jkzone",
repInd = "jkrep", type = "jk2", groups = "model", group.splits = 0:1,
cross.differences = TRUE, formula = procedural~sex+knowledge)
res10 <- report(m10)
# differ country-specific regression coefficients from the regression coefficents
# in the whole population? Are these differences different for 2003 vs. 2013?
m11 <- jk2.glm(datL = datGlm, ID="id", imp="imp", wgt="wgt", PSU="jkzone",
repInd = "jkrep", type = "jk2", groups = "model", group.splits = 0:1,
cross.differences = TRUE, trend = "trend", formula = procedural~sex+knowledge)
res11 <- report(m11, trendDiffs = TRUE)
################################################################################
### Example 7: Linking and equating for multiple models (III) ###
################################################################################
# For the purpose of illustration, this example repeats analyses of example 6,
# using a 2pl estimation now. Example 7 demonstrates routines without trend
# estimation.
# Preparation: assume time of measurement 't1' corresponds to the year 2003.
datT1<- reshape2::dcast(subset ( sciences, year == 2003),
formula = id+grade+sex+country~variable, value.var="value")
# First step: item calibration in separate unidimensional models for each domain
# split 2 models. Note: not all items of the Q matrix are present in the data.
# Items which occur only in the Q matrix will be ignored.
modsT1<- splitModels ( qMatrix = qMat, nCores = 1)
# lets specify a 2pl model with constraints: a common discrimination for all
# knowledge items, and a common discrimination for procedural items
slopes<- data.frame ( variable = qMat[,"variable"],
slope = as.numeric(as.factor(substr(as.character(qMat[,"variable"]),4,6))))
# prepare 2pl model
defT1 <- defineModel(dat = datT1, id = "id", check.for.linking = TRUE,
splittedModels = modsT1, irtmodel = "2PL.groups", est.slopegroups = slopes,
software = "tam")
# run 2 models
runT1 <- runModel(defT1)
# get the results
resT1 <- getResults(runT1)
# extract item parameters from the 'results' object
itemT1<- itemFromRes(resT1)
# Second step: drawing plausible values. Two-dimensional model is specified for
# each person group with fixed item parameters. Moreover, a latent regression
# model is used (in the actual 'Laendervergleich', regressors are principal
# components).
# create arbitrary principal components
for ( i in c("PC1", "PC2", "PC3") ) {
datT1[,i] <- rnorm( n = nrow(datT1), mean = 0, sd = 1.2)
}
# number of extracted principal components vary: three components for Berlin,
# two for Bavaria. Hence, Bavaria has no valid values on 'PC3'.
datT1[which(datT1[,"country"] == "Bavaria"),"PC3"] <- NA
# define person grouping
pers <- data.frame ( idstud = datT1[,"id"] , country = datT1[,"country"])
# Running second step: split models according to person groups
# ('all.persons' must be FALSE, otherwise the whole group would be treated as
# a separate distinct group.)
modT1P<- splitModels ( person.groups = pers , all.persons = FALSE, nCores = 1)
# define the 2 country-specific 2-dimensional models, specifying latent regression
# model and fixed item and fixed slope parameters.
defT1P<- defineModel(dat = datT1, items = itemT1[,"item"], id = "id", irtmodel = "2PL",
check.for.linking = TRUE, splittedModels = modT1P, qMatrix = qMat,
anchor = itemT1[,c("item", "est")], fixSlopeMat = itemT1[,c("item", "estSlope")],
HG.var = c("PC1", "PC2", "PC3"), software = "tam")
# run the 2 models
runT1P<- runModel(defT1P)
# get the results
resT1P<- getResults(runT1P, Q3 = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.