Model the Association Among Two or Three MRCVs

Share:

Description

The genloglin function uses a generalized loglinear modeling approach to estimate the association among two or three MRCVs. Standard errors are adjusted using a second-order Rao-Scott approach.

Usage

1
2
genloglin(data, I, J, K = NULL, model, add.constant = 0.5, boot = TRUE, 
    B = 1999, B.max = B, print.status = TRUE)

Arguments

data

A data frame containing the raw data where rows correspond to the individual item response vectors, and columns correspond to the binary items, W1, ..., WI, Y1, ..., YJ, and Z1, ..., ZK (in this order).

I

The number of items corresponding to row variable W.

J

The number of items corresponding to column variable Y.

K

The number of items corresponding to strata variable Z.

model

For the two MRCV case, a character string specifying one of the following models: "spmi" (the complete independence model), "homogeneous" (the homogeneous association model), "w.main" (the w-main effects model), "y.main" (the y-main effects model), "wy.main" (the w-main and y-main effects model), or "saturated". Alternatively, a user-supplied formula can be specified, where the formula is limited to the generic variables W, Y, wi, yj, count, W1,..., WI, and Y1,..., YJ. For the three MRCV case, only user-supplied formulas are accepted. In addition to the generic variables defined for two MRCVs, the formula may include the generic variables Z, zk, and Z1,..., ZK.

add.constant

A positive constant to be added to all zero marginal cell counts.

boot

A logical value indicating whether bootstrap resamples should be taken.

B

The desired number of bootstrap resamples.

B.max

The maximum number of bootstrap resamples. Resamples for which at least one item has all positive or negative responses are thrown out; genloglin uses the first B valid resamples or all valid resamples if that number is less than B.

print.status

A logical value indicating whether progress updates should be provided. When print.status = TRUE, the status of the IPF algorithm is printed after every 5 iterations. Upon completion of the IPF algorithm, a progress bar appears that documents progress of the bootstrap.

Details

The genloglin function first converts the raw data into a form that can be used for estimation. For the two MRCV case, the reformatted data frame contains 2Ix2J rows and 5 columns generically named W, Y, wi, yj, and count. For the three MRCV case, the reformatted data frame contains 2Ix2Jx2K rows and 7 columns generically named W, Y, Z, wi, yj, zk, and count. Then, the model of interest is estimated by calling the glm function where the family argument is specified as poisson. For all predictor variables, the first level is the reference group (i.e., 1 is the reference for variables W, Y, and Z, and 0 is the reference for variables wi, yj, and zj).

The boot argument must equal TRUE in order to obtain bootstrap results with the genloglin method functions.

Value

genloglin returns an object of class 'genloglin'. The object is a list containing at least the following objects: original.arg, mod.fit, sum.fit, and rs.results.

original.arg is a list containing the following objects:

  • data: The original data frame supplied to the data argument.

  • I: The original value supplied to the I argument.

  • J: The original value supplied to the J argument.

  • K: The original value supplied to the K argument.

  • nvars: The number of MRCVs.

  • model: The original value supplied to the model argument.

  • add.constant: The original value supplied to the add.constant argument.

  • boot: The original value supplied to the boot argument.

mod.fit is a list containing the same objects returned by glm with a few modifications as described in summary.genloglin.

sum.fit is a list containing the same objects returned by the summary method for class "glm" with a few modifications as described in summary.genloglin.

rs.results is a list containing the following objects (see Appendix A of Bilder and Loughin, 2007, for more detail):

  • cov.mu: The covariance matrix for the estimated cell counts.

  • E: The covariance matrix for the residuals.

  • gamma: Eigenvalues used in computing second-order Rao-Scott adjusted statistics.

— For boot = TRUE, the primary list additionally includes boot.results, a list containing the following objects:

  • B.use: The number of bootstrap resamples used.

  • B.discard: The number of bootstrap resamples discarded due to having at least one item with all positive or negative responses.

  • model.data.star: For the two MRCV case, a numeric matrix containing 2Ix2J rows and B.use+4 columns, where the first 4 columns correspond to the model variables W, Y, wi, and yj, and the last B.use columns correspond to the observed counts for each resample. For the three MRCV case, a numeric matrix containing 2Ix2Jx2K rows and B.use+6 columns, where the first 6 columns correspond to the model variables W, Y, Z, wi, yj, and zk, and the last B.use columns correspond to the observed counts for each resample.

  • mod.fit.star: For the two MRCV case, a numeric matrix containing B.use rows and 2Ix2J +1 columns, where the first 2Ix2J columns correspond to the model-predicted counts for each resample, and the last column corresponds to the residual deviance for each resample. For the three MRCV case, a numeric matrix containing B.use rows and 2Ix2Jx2K+1 columns, where the first 2Ix2Jx2K columns correspond to the model-predicted counts for each resample, and the last column corresponds to the residual deviance for each resample.

  • chisq.star: A numeric vector of length B.use containing the Pearson statistics (comparing model to the saturated model) calculated for each resample.

  • lrt.star: A numeric vector of length B.use containing the LRT statistics calculated for each resample.

  • residual.star: A numeric matrix with 2Ix2J rows (or 2Ix2Jx2K rows for the three MRCV case) and B.use columns containing the residuals calculated for each resample.

References

Bilder, C. and Loughin, T. (2007) Modeling association between two or more categorical variables that allow for multiple category choices. Communications in Statistics–Theory and Methods, 36, 433–451.

See Also

The genloglin methods summary.genloglin, residuals.genloglin, anova.genloglin, and predict.genloglin, and the corresponding generic functions summary, residuals, anova, and predict.

The glm function for fitting generalized linear models.

The MI.test function for testing for MMI (one MRCV case) or SPMI (two MRCV case).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Estimate the y-main effects model for 2 MRCVs
mod.fit <- genloglin(data = farmer2, I = 3, J = 4, model = "y.main", boot = FALSE)
# Summarize model fit information
summary(mod.fit)
# Examine standardized Pearson residuals
residuals(mod.fit)
# Compare the y-main effects model to the saturated model
anova(mod.fit, model.HA = "saturated", type = "rs2")
# Obtain observed and model-predicted odds ratios
predict(mod.fit)

# Estimate a model that is not one of the named models
# Note that this was the final model chosen by Bilder and Loughin (2007)
mod.fit.other <- genloglin(data = farmer2, I = 3, J = 4, model = count ~ -1 + W:Y + 
    wi%in%W:Y + yj%in%W:Y + wi:yj + wi:yj%in%Y + wi:yj%in%W3:Y1, boot = 
    FALSE)
# Compare this model to the y-main effects model
anova(mod.fit, model.HA = count ~ -1 + W:Y + wi%in%W:Y + yj%in%W:Y + wi:yj + 
    wi:yj%in%Y + wi:yj%in%W3:Y1, type = "rs2", gof = TRUE)

# To obtain bootstrap results from the method functions the genloglin() boot 
# argument must be specified as TRUE (the default)
# A small B is used for demonstration purposes; normally, a larger B should be used
## Not run: 
mod.fit <- genloglin(data = farmer2, I = 3, J = 4, model = "y.main", boot = TRUE, 
    B = 99)
residuals(mod.fit)
anova(mod.fit, model.HA = "saturated", type = "all")
predict(mod.fit)

# Estimate a model for 3 MRCVs
mod.fit.three <- genloglin(data = farmer3, I = 3, J = 4, K = 5, model = count ~ 
    -1 + W:Y:Z + wi%in%W:Y:Z + yj%in%W:Y:Z + zk%in%W:Y:Z + wi:yj + 
    wi:yj%in%Y + wi:yj%in%W + wi:yj%in%Y:W + yj:zk + yj:zk%in%Z + 
    yj:zk%in%Y + yj:zk%in%Z:Y, boot = TRUE, B = 99)
residuals(mod.fit.three)
anova(mod.fit.three, model.HA = "saturated", type = "all")
predict(mod.fit.three, pair = "WY")
## End(Not run)