repGlm  R Documentation 
Compute generalized linear models for complex cluster designs with multiple imputed variables based
on the Jackknife (JK1, JK2) or balanced repeated replicates (BRR) procedure. Conceptually, the function combines replication
methods and methods for multiple imputed data. Technically, this is a wrapper for the svyglm
function
of the survey
package.
repGlm(datL, ID, wgt = NULL, type = c("none", "JK2", "JK1", "BRR", "Fay"), PSU = NULL,
repInd = NULL, repWgt = NULL, nest=NULL, imp=NULL, groups = NULL,
group.splits = length(groups), group.delimiter = "_",
cross.differences = FALSE, trend = NULL, linkErr = NULL, formula,
family=gaussian, forceSingularityTreatment = FALSE,
glmTransformation = c("none", "sdY"), doCheck = TRUE, na.rm = FALSE,
poolMethod = c("mice", "scalar"), useWec = FALSE,
scale = 1, rscales = 1, mse=TRUE, rho=NULL, hetero=TRUE,
se_type = c("HC3", "HC0", "HC1", "HC2", "CR0", "CR2"),
clusters = NULL, crossDiffSE.engine= c("lavaan", "lm"),
stochasticGroupSizes = FALSE, verbose = TRUE, progress = TRUE)
datL 
Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis. 
ID 
Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values. 
wgt 
Optional: Variable name or column number of weighting variable. If no weighting variable is specified, all cases will be equally weighted. 
type 
Defines the replication method for cluster replicates which is to be applied. Depending on 
PSU 
Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied,
the PSU is the jackknife zone variable. If 
repInd 
Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate
variable. If 
repWgt 
Normally, replicate weights are created by 
nest 
Optional: name or column number of the nesting variable. Only applies in nested multiple imputed data sets. 
imp 
Optional: name or column number of the imputation variable. Only applies in multiple imputed data sets. 
groups 
Optional: vector of names or column numbers of one or more grouping variables. 
group.splits 
Optional: If groups are defined, 
group.delimiter 
Character string which separates the group names in the output frame. 
cross.differences 
Either a list of vectors, specifying the pairs of levels for which crosslevel differences should be computed.
Alternatively, if 
trend 
Optional: name or column number of the trend variable. Note: Trend variable must have exact two levels. Levels for grouping variables must be equal in both 'sub populations' partitioned by the trend variable. 
linkErr 
Optional: name or column number of the linking error variable. If 
formula 
Model formula, see help page of 
family 
A description of the error distribution and link function to be used in the model. See help page of 
forceSingularityTreatment 
Logical: Forces the function to use the workaround to handle singularities in regression models. 
glmTransformation 
Optional: Allows for transformation of parameters from linear regression and logistic regression before pooling.
Useful to compare parameters from different glm models, see Mood (2010). Note: This argument applies only if

doCheck 
Logical: Check the data for consistency before analysis? If 
na.rm 
Logical: Should cases with missing values be dropped? 
poolMethod 
Which pooling method should be used? The “mice” method is recommended. 
useWec 
Logical: use weighted effect coding? 
scale 
scaling constant for variance, for details, see help page of 
rscales 
scaling constant for variance, for details, see help page of 
mse 
Logical: If 
rho 
Shrinkage factor for weights in Fay's method. See help page of 
hetero 
Logical: Assume heteroscedastic variance for weighted effect coding? Only applies for random samples, i.e. if no replication analyses are executed. 
se_type 
The sort of standard error sought for cross level differences. Only applies if 
clusters 
Optional: Variable name or column number of cluster variable. Only necessary if weighted effecting coding
should be performed using heteroscedastic variances. See the help page of 
crossDiffSE.engine 
Optional: Sort of estimator which should be used for standard error estimation in weighted effect coding regression.
Only applies if 
stochasticGroupSizes 
Logical: Assume stochastic group sizes for using weighted effect coding regression with categorical predictors? Note: To date, only lavaan allows for stochastic group sizes. Stochastic group sizes cannot be assumed if any replication method (jackknife, BRR) is applied. 
verbose 
Logical: Show analysis information on console? 
progress 
Logical: Show progress bar on console? 
Function first creates replicate weights based on PSU
and repInd
variables according to JK2 or
BRR procedure. According to multiple imputed data sets, a workbook with several analyses is created.
The function afterwards serves as a wrapper for svyglm
implemented in the survey
package.
The results of the several analyses are then pooled according to Rubin's rule, which is adapted for nested
imputations if the nest
argument implies a nested structure.
A list of data frames in the long format. The output can be summarized using the report
function.
The first element of the list is a list with either one (no trend analyses) or two (trend analyses)
data frames with at least six columns each. For each subpopulation denoted by the groups
statement, each dependent variable, each parameter and each coefficient the corresponding value is given.
group 
Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’. 
depVar 
Denotes the name of the dependent variable in the analysis. 
modus 
Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’. 
parameter 
Denotes the parameter of the regression model for which the corresponding value is given further. Amongst others, the ‘parameter’ column takes the values ‘(Intercept)’ and ‘gendermale’ if ‘gender’ was the dependent variable, for instance. See example 1 for further details. 
coefficient 
Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate). 
value 
The value of the parameter estimate in the corresponding group. 
If groups were specified, further columns which are denoted by the group names are added to the data frame.
te Grotenhuis, M., Pelzer, B., Eisinga, R., Nieuwenhuis, R., SchmidtCatran, A., & Konig, R. (2017). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health. 62, 163–167.
### load example data (long format)
data(lsa)
### use only the first nest
bt < lsa[which(lsa[,"nest"] == 1),]
### use only data from 2010
bt2010 < bt[which(bt[,"year"] == 2010),]
## use only reading data
bt2010read < bt2010[which(bt2010[,"domain"] == "reading"),]
### Example 1: Computes linear regression from reading score on gender separately
### for each country. Assume no nested structure.
mod1 < repGlm(datL = bt2010read, ID = "idstud", wgt = "wgt", type = "jk2",
PSU = "jkzone", repInd = "jkrep", imp = "imp", groups = "country",
formula = score~sex, family ="gaussian")
res1 < report(mod1, printGlm = TRUE)
### Example 2: Computes log linear regression from pass/fail on ses and gender
### separately for each country in a nested structure. Assuming equally weighted
### cases by omitting "wgt" argument
dat < lsa[intersect(which(lsa[,"year"] == 2010), which(lsa[,"domain"] == "reading")),]
mod2 < repGlm(datL = dat, ID = "idstud", type = "JK2", PSU = "jkzone",
repInd = "jkrep", imp = "imp", nest="nest", groups = "country",
formula = passReg~sex*ses, family = quasibinomial(link="logit"))
res2 < report(mod2, printGlm = TRUE)
### Example 3: Like example 1, but without any replication methods
### trend estimation (without linking error) and nested imputation
dat < lsa[which(lsa[,"domain"] == "reading"),]
mod3 < repGlm(datL = dat, ID = "idstud", wgt = "wgt", imp = "imp", nest = "nest",
groups = "country", formula = score~sex, trend = "year")
res3 < report(mod3, printGlm = TRUE)
### Example 4: weighted effect coding to estimate whether a specific country's mean
### differs from the overall mean (whereas the overall population is a composite of
### all countries). The procedure adapts the weighted effect coding procedures
### described in te Grotenhuis (2017) for multiple imputation and replication methods.
mod4 < repGlm(datL = bt2010read, ID = "idstud", wgt = "wgt", type = "jk2",
PSU = "jkzone", repInd = "jkrep", imp = "imp", formula = score~country,
useWec=TRUE)
res4 < report(mod4, printGlm = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.