deconv_npmle | R Documentation |
Implements a version of the Rabe-Hesketh et al. (2003) algorithm for computing the nonparametric MLE of a univariate latent variable distribution from observed error-prone measures. Allows for normal heteroskedastic measurement error with variance that depends on the latent variable, such as with estimates of latent ability from item response theory models.
deconv_npmle(W, csem, gridspec = list(fixed=FALSE, xmin=-5, xmax=5, numpoints=2000), lambda = 0.0005, lltol = 1e-7, psmall = 0.00005, discrete = FALSE, quietly = FALSE)
W |
Vector of observed measures, where |
csem |
A function of a single variable |
gridspec |
A named list specifying the grid over which the NPMLE will be
computed. It must have a logical component |
lambda |
Step size, on probability scale, in Rabe-Hesketh et al. (2003) algorithm. See reference for details. |
lltol |
Algorithm stops when the improvement to the log likelihood does not
exceed |
psmall |
If a mass point in the estimated latent distribution evolves to have
probability less than |
discrete |
Not currently implemented. |
quietly |
If |
The assumed model is W = X + U
where the conditional
distribution of U
given X = x
is assumed to be normal
with mean zero and standard deviation csem(x)
. The function
uses W
to estimate a discrete latent distribution for X
that maximizes the likelihood of the observed data. The function
optimizes the mass points (among a grid of candidate values) and the
associated probabilities.
In the special case of homoskedastic error, the function csem
must be defined such that when passed a vector of length n
, it
returns a vector of length n
where each element is a common
constant.
The default values of xmin
and xmin
in gridspec
are generally appropriate only for a latent variable on a standardized
scale with mean zero and variance one, and should be set to
appropriate values given the scale of W
.
A list with elements
gridspec: Information about the initial grid
.history: Iteration-by-iteration evolution of the estimated
distribution, if gridspec$fixed
is FALSE. Otherwise it is
an empty list
px: A dataframe providing the final NPMLE distribution. There are as many rows as there are mass points in the estimated distribution; fields described below
reliability: An estimate of the reliability of W
, equal
to the estimated variance of X
divided by the sample variance
of W
simex_varfuncs: A dataframe with as many rows as there are
unique values of W
, providing estimated plug-in variance
functions to use for SIMEX data generation with latent
heteroskedastic error as described in Lockwood and McCaffrey
(forthcoming); see references. Fields described below
The fields of px
are:
x: Location of mass point
csem: Value of function csem
at mass point
p: probability at mass point
ll: log likelihood at solution
ex: Estimate of mean of latent distribution
varx: Estimate of variance of latent distribution
The fields of simex_varfuncs
are:
W: Unique observed values w
of W
gW: The square of csem
evaluated at W = w
gEXW: The square of csem
evaluated at E[X | W=w]
, the
conditional mean of X
given W=w
EgXW: The conditional mean of the square of csem
of
X
given W=w
, equal to E[g(X) | W=w]
J.R. Lockwood jrlockwood@ets.org
Lockwood J.R. and McCaffrey D.F. (2014). “Correcting for test score measurement error in ANCOVA models for estimating treatment effects,” Journal of Educational and Behavioral Statistics 39(1):22-52.
Lockwood J.R. and McCaffrey D.F. (2017). “Simulation-extrapolation with latent heteroskedastic variance,” Psychometrika 82(3):717-736.
Rabe-Hesketh S., Pickles A. and Skrondal A. (2003). “Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation,” Statistical Modelling 3:215-232.
testscores
,eivreg
data(testscores) ## get the unique values of the lag 1 math score and CSEM ## values and approximate the CSEM function using approxfun() tmp <- unique(testscores[,c("math_lag1","math_lag1_csem")]) print(tmp <- tmp[order(tmp$math_lag1),]) .csem <- approxfun(tmp$math_lag1, tmp$math_lag1_csem, rule=2:2) plot(tmp$math_lag1, tmp$math_lag1_csem) lines(tmp$math_lag1, .csem(tmp$math_lag1), col="blue") ## get NPMLE distribution of latent lag 1 math achievement m <- deconv_npmle(W = testscores$math_lag1, csem = .csem, gridspec = list(fixed = FALSE, xmin = min(testscores$math_lag1), xmax = max(testscores$math_lag1), numpoints = 10000), quietly = TRUE) print(m$px) ## estimated mean is approximately the mean of W, but ## the estimated variance is less than the variance of W, ## as it should be print(c(empirical = mean(testscores$math_lag1), estimated = m$px$ex[1])) print(c(empirical = var(testscores$math_lag1), estimated = m$px$varx[1])) ## estimated reliability of W: print(m$reliability) ## if implementing SIMEX, simex_varfuncs provides plug-in ## options to use for the heteroskedastic error variance ## of each observed W print(m$simex_varfuncs) ## simple "value-added" estimates of school effects on math, ## adjusting for measurement error in the lag 1 math score. testscores$schoolid <- factor(testscores$schoolid) meiv <- eivreg(math ~ math_lag1 + sped + frl + schoolid, data = testscores, reliability = c(math_lag1 = m$reliability), contrasts = list(schoolid = "contr.sum")) print(summary(meiv)) ## alternative deconvolution with fixed grid m <- deconv_npmle(W = testscores$math_lag1, csem = .csem, gridspec = list(fixed = TRUE, xmin = min(testscores$math_lag1), xmax = max(testscores$math_lag1), numpoints = 40), quietly = TRUE) print(m$px)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.