gendata_hetop: Generate count data from Heteroskedastic Ordered Probit...

View source: R/gendata_hetop.R

gendata_hetopR Documentation

Generate count data from Heteroskedastic Ordered Probit (HETOP) Model

Description

Generates count data for G groups and K ordinal categories under a heteroskedastic ordered probit model, given the total number of units in each group and parameters determining the category probabilities for each group.

Usage

gendata_hetop(G, K, ng, mug, sigmag, cutpoints)

Arguments

G

Number of groups.

K

Number of ordinal categories.

ng

Vector of length G providing the total number of units in each group.

mug

Vector of length G giving the latent variable mean for each group.

sigmag

Vector of length G giving the latent variable standard deviation for each group.

cutpoints

Vector of length (K-1) giving cutpoint locations, held constant across groups, that map the continuous latent variable to the observed categorical variable.

Details

For each group g, the function generates ng IID normal random variables with mean mug[g] and standard deviation sigmag[g], and then assigns each to one of K ordered groups, depending on cutpoints. The resulting data for a group is a table of category counts summing to ng[g].

Value

A G x K matrix where column k of row g provides the number of simulated units from group g falling into category k.

Author(s)

J.R. Lockwood jrlockwood@ets.org

References

Reardon S., Shear B.R., Castellano K.E. and Ho A.D. (2017). “Using heteroskedastic ordered probit models to recover moments of continuous test score distributions from coarsened data,” Journal of Educational and Behavioral Statistics 42(1):3–45.

Lockwood J.R., Castellano K.E. and Shear B.R. (2018). “Flexible Bayesian models for inferences from coarsened, group-level achievement data,” Journal of Educational and Behavioral Statistics. 43(6):663–692.

Examples

set.seed(1001)

## define true parameters
G         <- 10
mug       <- seq(from= -2.0, to= 2.0, length=G)
sigmag    <- seq(from=  2.0, to= 0.8, length=G)
cutpoints <- c(-1.0, 0.0, 0.8)

## generate data with large counts
ng   <- rep(100000,G)
ngk  <- gendata_hetop(G, K = 4, ng, mug, sigmag, cutpoints)
print(ngk)

## compare theoretical and empirical cell probabilities
phat  <- ngk / ng
ptrue <- t(sapply(1:G, function(g){
    tmp <- c(pnorm(cutpoints, mug[g], sigmag[g]), 1)
    c(tmp[1], diff(tmp))
}))
print(max(abs(phat - ptrue)))

jrlockwood/HETOP documentation built on April 9, 2022, 4 a.m.