mml: Marginal Maximum Likelihood Estimation of Linear Models

Description Usage Arguments Details Value Author(s) Examples

View source: R/de.R

Description

Implements a survey-weighted marginal maximum estimation, a type of regression where the outcome is a latent trait (such as student ability. Instead of using an estimate, the likelihood function marginalizes student ability. Includes a variety of variance estimation strategies.

Usage

1
2
3
4
mml(formula, stuItems, stuDat, paramTab, Q = 30, polyModel = c("GPCM",
  "GRM"), regType = c("regression", "popMean"), weightvar = NULL,
  control = list(), idVar = c(), missingCode = 8,
  missingValue = "c", multiCore = FALSE, bobyqaControl = list)

Arguments

formula

a formula object in the style of lm

stuItems

a list where each element is named a student ID and contains a data.frame; see Details for the format

stuDat

a data.frame with a single row per student. Predictors in the formula must be in stuDat.

paramTab

a data.frame with columns shown in Details

Q

the number of integration points

polyModel

polytomous response model; one of GPCM for the Graded Partial Credit Model or GRM for the Graded Response Model

regType

one of regression or popMean where the latter estimates a population level mean

weightvar

a variable name on stuDat that is the full sample weight

control

a list with four elements that control the fitting process. See Details.

idVar

a variable name on stuDat that is the identifier. Every ID from stuDat must appear on stuItems and vice versa.

missingCode

the value a score is set to that indicates the item is missing. An item scored as NA will be ignored. The missingCode argument allows the user to recode scores to missingValue. This argument applies exclusively to binomial items.

missingValue

the value to set items scored as missingCode. When set to a number, that value is used for all items. When set to “C”, then the guessing parameter is used.

multiCore

allows the foreach package to be used. You should have already called registerDoParallel.

bobyqaControl

a list that gets passed to bobyqa

Details

The mml function models a latent outcome conditioning on student item response data, student covariate data, and item parameter information; these three parts are broken up into three arguments. Student item response data go into stuItems, whereas student covariates, weights, and sampling information go into stuDat. The paramTab contains item parameter information for each item—the result of a separate item parameter scaling. In the case of the National Assessment of Educational Progress (NAEP), they can be found online, for example, at https://nces.ed.gov/nationsreportcard/tdw/analysis/scaling_irt.aspx. The model for dichotomous responses data is by default three Parameter Logit (3PL), unless the item parameter information provided by users suggests otherwise. For example, if the scaling used a two Parameter Logit (2PL) model, then the guessing parameter can simply be set to zero. For polytomous responses data, the model is dictated by the polyModel argument.

Student data are broken up into two parts. The item response data goes into stuItems ,and the student covariates for the formula go into stuDat. Information about items, such as item difficulties, is in paramTab. All dichotomous items are assumed to be 3PL, though by setting the guessing parameter to zero, the user can use a 2PL or the one Parameter Logit (1PL) or Rasch models. The model for polytomous responses data is dictated by the polyModel argument.

The marginal maximum likelihood then integrates the product of the student ability from the assessment data, and the estimate from the linear model estimates each student's ability based on the formula provided and a residual standard error term. This integration happens from the minimum node to the maximum node in the control argument (described later in this section) with Q quadrature points.

The stuItems argument has the scored student data. It is a list where each element is named with student ID and contains a data.frame with at least two columns. The first required column is named key and shows the item name as it appears in paramTab; the second column in named score and shows the score for that item. For binomial items, the score is 0 or 1. For GPCM items, the scores start at zero as well. For GRM, the scores start at 1.

The paramTab argument is a data.frame with a column named ItemID that agrees with the key column in the stuItems argument, and, for a 3PL item, columns P0, P1, and P2 for the “a”, “d”, and “g” parameters, respectively; see the vignette for details of the 3PL model. For a GPCM model, P0 is the “a” parameter, and the other columns are the “d” parameters; see the vignette for details of the GPCM model.

The control argument is a list with, optional, items D, the scale parameter, that defaults to 1.7; startVal, which is the starting value for the coefficients; and min.node and max.node, which sets the range of nodes for all students; these default to -4 and 4, respectively. The quadrature points then are a range from min.node to max.node with a total of Q nodes.

Value

object of class mml.means. This is a list with elements:

call

the call used to generate this mml.means object

coefficients

the marginal maximum likelihood regression coefficients, including the estimated residual standard error

LogLik

the log-likelihood of the fit model

X

the design matrix of the marginal maximum likelihood regression

Convergence

a convergence note from the bobyqa optimizer

location

used for scaling the estimates

scale

used for scaling the estimates

lnlf

the likelihood function

rr1

the density function of each individual, conditional only on item responses in stuItems

stuDat

the stuDat argument

weightvar

the weight variable

nodes

the nodes the likelihood was evaluated on

iterations

the number of iterations required to reach convergence

obs

the number of observations used

Author(s)

Harold Doran, Paul Bailey, Claire Kelley, and Sun-joo Lee

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
## Not run: 
# get NAEP Primer data
require(EdSurvey)

# data
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
cols <- c("m066401", "m093701", "m086001", "m051901", "m067801", "m046501",
          "origwt", "repgrp1", "jkunit", "dsex")
data <- getData(sdf, varnames=cols, addAttributes=TRUE,
                omittedLevels=FALSE, defaultConditions=FALSE,
                returnJKreplicates=FALSE)

# 3PL items only:
# P0 is the discrimination parameter (a),
# P1 is the item difficulty (d),
# P2 is the guessing parameter (g) 
# polytomous responses could use P3-P10 for more difficulties
paramTab <- structure(list(ItemID = c("m066401", "m093701", "m086001",
                                      "m051901", "m067801", "m046501"),
                           P0 = c(0.68, 1.22, 1.05, 1.6, 0.86, 1.03),
                           P1 = c(-0.33, 1.81, 1, 0.61, -1.61, -0.14),
                           P2 = c(0.15, 0.17, 0.22, 0.08, 0.06, 0.37),
                           P3 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P4 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P5 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P6 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P7 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P8 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P9 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           P10 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
                           ScorePoints = c(1, 1, 1, 1, 1, 1),
                           MODEL = c("3pl", "3pl", "3pl", "3pl", "3pl", "3pl")),
                      row.names = c(1L, 3L, 4L, 5L, 9L, 13L),
                      class = "data.frame", location = 277.1563, scale = 37.7297)
# scores an item as correct if it contains an asterisk and as skipped if it
# is "Omitted", "Not Reached", or "Multiple". The value NA is left as NA.
# this score function is intended to be simple not reflect typical NAEP scoring.
simpleScore <- function(col) {
  score0 <- 0+grepl("*", col, fixed=TRUE)
  score1 <- ifelse(col %in% c("Omitted", "Not Reached", "Multiple"), 8, score0)
  score2 <- ifelse(col %in% NA, NA, score1)
  return(score2)
}

# score each item in paramTab
for(name in paramTab$ItemID){
  # show score output vs input data
  print(table(sdf[,name], simpleScore(sdf[,name]), useNA="ifany"))
  # score item
  data[,name] <- simpleScore(data[,name])  
}

# make stuItems 
data$id <- 1:nrow(data)
# first make a long data.frame of the item score data
stuItems <- reshape(data=data, varying=c(paramTab$ItemID), idvar=c("id"),
                    direction="long", v.names="score", times=paramTab$ItemID,
                    timevar="key")[,c("id", "key", "score")]
# then break it up into a single data.frame per student
stuItems <- split(stuItems, "id")

# Studat is the student covariates, weights, and sampling information
# used for variance estimation
stuDat <- data[, c('origwt', 'repgrp1', 'jkunit', 'dsex', 'id')]

### MML call 
mml1 <- mml(~dsex, stuItems=stuItems, 
            stuDat=stuDat, paramTab=paramTab, 
            regType = 'regression', Q=34, idVar="id", weightvar = "origwt")

# summary, assumes the sample was drawn IID
summary(mml1)
# summary, accounts for correlation between students in the same schools
summary(mml1, varType="Taylor", stratavar="repgrp1", psuvar="jkunit")

## End(Not run)

American-Institutes-for-Research/DE documentation built on Dec. 29, 2019, 12:22 a.m.