eatRep-package: Statistical analyses in complex survey designs with multiple...

Description Details Author(s) References

Description

Computes some basic statistic operations (means, standard deviations, frequency tables, percentiles and generalized linear models) in complex survey designs comprising multiple imputed variables and/or a clustered sampling structure which both deserve special procedures at least in estimating standard errors.

For example, computing standard errors for the mean of a multiple imputed variable (e.g. plausible values) involves the formulas provided by Rubin (1987). Computing standard errors for the mean of a nested imputed variable involves the formulas provided by Rubin (2003). Both methods are implemented in the package. The estimation of R^2 and adjusted R^2 in linear and generalized linear regression models with multiple imputed data sets is realized using the methods provided in Harel (2009).

Moreover, computing standard errors for the mean of a variable which stems from a clustered design may involve replication methods like balanced repeated replicate (BRR), bootstrap or Jackknife methods. See Weststat (2000), Foy, Galia & Li (2008), Rust and Rao (1996), and Wolter (1985) for details. To date, the Jackknife-1 (JK1), Jackknife-2 (JK2) and the Balanced Repeated Replicates (BRR) procedures are supported.

The package eatRep is designed to combine both methods which is necessary if (nested) multiple imputed data are used in clustered designs. Considering the structure is relevant especially for the estimation of standard errors. The estimation of national trends requires a sequential analysis for both measurements and a comparison of estimates between them.

Technically, eatRep is a wrapper for the survey package (Lumley, 2004). Each function in eatRep corresponds to a specific function in survey which is called repeatedly during the analysis. Hence, a nested loop is used. We use “trend replicates” in the outer loop, “imputation replicates” in the middle loop to account for multiple imputed data, and “cluster replicates” in the inner loop to account for the clustered sampling structure. While the functional principle of survey is based on replication of standard analyses, eatRep is based on replication of survey analyses to take multiple imputed data into account.

For each imputed data set in each measurement, i.e. in the inner loop, the eatRep function first creates replicate weights based on the primary sampling unit (PSU) variable and the replication indicator variable. In the jackknife procedure, the first one is often referred to as “Jackknife Zone”, whereas the second one is often referred to as “Jackknife Replicate”. The number of distinct units in the PSU variable define the number of replications which are necessary due to the clustered structure. A design object is created and the appropriate survey function is called. The process is repeated for each imputed dataset and the results of the analyses are pooled. The pooling procedure varies in relation to the type of variable to be pooled. For examples, means or regression coefficients are pooled according to Rubin (1987) or Rubin (2003). R^2 is pooled according to Harel (2009), using a Fisher z-transformation. Chi-square distributed values are pooled according to Thomas and Rao (1990) for clustered data and according to Enders (2010) and Allison (2002) for multiple imputed data. For trend analyses, the whole process is repeated two times (according to the two measurements) and the difference of the estimates are computed along with their pooled standard errors.

Without trend estimation, the outer loop has only one cycle (instead of two). Without multiple imputations, the middle loop has only one cycle. Without a clustered sampling structure (i.e, in a random sample), the inner loop has only one cycle. Without trend, imputation and clustered structure, no replication is performed at all. To compute simple mean estimates, for example, eatRep then simply calls mean instead of svymean from the survey package. A special case occurs with nested multiple imputation. We then have four loops in a nested structure. Hence, the corresponding analyses may take considerably computational effort.

Important note: The structure of the the eatRep functions varied substantially between versions 0.5.0 and 0.6.0. Up to version 0.5.0, the data has to be provided in the wide format. Beginning with version 0.6.0, eatRep functions need the long format. This distinction practically means that version 0.5.0 allows to analyze data where, for example, the number of imputations is different between independent and dependent variables, albeit the second one is not nested within the first one. This case is conceptually questionable and it is not clear how to imply the pooling rules. Hence, this is no longer supported in version 0.6.0 and higher. The number of imputations have to be equal or a nested structure must be guaranteed.

Details

Package: eatRep
Type: Package
Version: 0.9.5
Date: 2018-10-06
License: GPL(>=2)

Author(s)

Author/maintainer: Sebastian Weirich <sebastian.weirich@iqb.hu-berlin.de>

References

Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.

Enders, C. K. (2010). Applied missing data analysis. Guilford Press.

Foy, P., Galia , J. & Li, I. (2008). Scaling the data from the TIMSS 2007 mathematics and science asssessment. In J. F. Olson, M. O. Martin & I. V. S. Mullis (ed.), TIMSS 2007 Technical Report (S. 225–280). Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

Harel, O. (2009): The estimation of R^2 and adjusted R^2 in incomplete data sets using multiple imputation. Journal of Applied Statistics. 36, 10, 1109–1118.

Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software 9(1): 1–19

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Rubin, D.B. (2003): Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica 57, 1, 3–18.

Rust, K., & Rao, JNK. (1996): Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research 5, 283–310.

Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis.

Thomas, D. R. & Rao, JNK (1990): Small-sample comparison of level and power for simple goodnessof- fit statistics under cluster sampling. JASA 82:630-636

Westat (2000). WesVar. Rockville, MD: Westat.

Wolter, K. M. (1985). Introduction to variance estimation. New York: Springer.


eatRep documentation built on May 2, 2019, 5:40 p.m.