repTable  R Documentation 
Compute frequency tables for categorical variables (e.g. factors: dichotomous or polytomous) in complex
cluster designs. Estimation of standard errors optionally takes the clustered structure and multiple imputed
variables into account. To date, Jackknife1 (JK1), Jackknife2 (JK2) and Balanced repeated replicate (BRR) methods are
implemented to account for clustered designs. Procedures of Rubin (1987) and Rubin (2003) are implemented to account for
multiple imputed data and nested imputed data, if necessary. Conceptually, the function combines replication and imputation
methods. Technically, this is a wrapper for the svymean
function of the survey
package.
repTable(datL, ID, wgt = NULL, type = c("none", "JK2", "JK1", "BRR", "Fay"), PSU = NULL, repInd = NULL, repWgt = NULL, nest=NULL, imp=NULL, groups = NULL, group.splits = length(groups), group.differences.by = NULL, cross.differences = FALSE, crossDiffSE = c("wec", "rep","old"), nBoot = 100, chiSquare = FALSE, correct = TRUE, group.delimiter = "_", trend = NULL, linkErr = NULL, dependent, separate.missing.indicator = FALSE, na.rm=FALSE, expected.values = NULL, doCheck = TRUE, forceTable = FALSE, engine = c("survey", "BIFIEsurvey"), scale = 1, rscales = 1, mse=TRUE, rho=NULL, verbose = TRUE, progress = TRUE )
datL 
Data frame in the long format (i.e. each line represents one ID unit in one imputation of one nest) containing all variables for analysis. 
ID 
Variable name or column number of student identifier (ID) variable. ID variable must not contain any missing values. 
wgt 
Optional: Variable name or column number of weighting variable. If no weighting variable is specified, all cases will be equally weighted. 
type 
Defines the replication method for cluster replicates which is to be applied. Depending on 
PSU 
Variable name or column number of variable indicating the primary sampling unit (PSU). When a jackknife procedure is applied,
the PSU is the jackknife zone variable. If 
repInd 
Variable name or column number of variable indicating replicate ID. In a jackknife procedure, this is the jackknife replicate
variable. If 
repWgt 
Normally, replicate weights are created by 
nest 
Optional: name or column number of the nesting variable. Only applies in nested multiple imputed data sets. 
imp 
Optional: name or column number of the imputation variable. Only applies in multiple imputed data sets. 
groups 
Optional: vector of names or column numbers of one or more grouping variables. 
group.splits 
Optional: If groups are defined, 
group.differences.by 
Optional: Specifies one grouping variable for which a chisquare test should be applied.
The corresponding variable must be included in the 
cross.differences 
Either a list of vectors, specifying the pairs of levels for which crosslevel differences should be computed.
Alternatively, if TRUE, crosslevel differences for all pairs of levels are computed. If FALSE, no crosslevel
differences are computed. (see examples 2a, 3, and 4 in the help file of the 
crossDiffSE 
Method for standard error estimation for cross level differences, where groups are dependent.

nBoot 
Without replicates (i.e., for completely random samples), the 
chiSquare 
Logical. Applies only if 
correct 
Logical. Applies only if 'group.differences.by' is requested without cluster replicates. A logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables. See help page of 'chisq.test' for further details. 
group.delimiter 
Character string which separates the group names in the output frame. 
trend 
Optional: name or column number of the trend variable. Note: Trend variable must have exact two levels. Levels for grouping variables must be equal in both 'sub populations' partitioned by the trend variable. 
linkErr 
Optional: Either the name or column number of the linking error variable. If 'NULL', a linking error of 0 will be assumed in trend estimation.
Alternatively, the linking error may be given as a single scalar value (i.e. 'linkErr = 1.225'). Alternatively, linking errors may be
given as data.frame with following specifications: Two columns, named 
dependent 
Variable name or column number of the dependent variable. 
separate.missing.indicator 
Logical. Should frequencies of missings in dependent variable be integrated? Note: That is only useful if missing occur as 
na.rm 
Logical: Should cases with missing values be dropped? 
expected.values 
Optional. A vector of values expected in dependent variable. Recommend to left this argument empty. 
doCheck 
Logical: Check the data for consistency before analysis? If 
forceTable 
Logical: Function decides internally whether the table or the mean function of 
engine 
Which package should be used for estimation? 
scale 
scaling constant for variance, for details, see help page of 
rscales 
scaling constant for variance, for details, see help page of 
mse 
Logical: If 
rho 
Shrinkage factor for weights in Fay's method. See help page of 
verbose 
Logical: Show analysis information on console? 
progress 
Logical: Show progress bar on console? 
Function first creates replicate weights based on PSU and repInd variables according to JK2 procedure
implemented in WesVar. According to multiple imputed data sets, a workbook with several analyses is created.
The function afterwards serves as a wrapper for svymean
called by svyby
implemented in the survey
package.
Relative frequencies of the categories of the dependent variable are computed by the means of the dichotomous indicators
(e.g. dummy variables) of each category. The results of the several analyses are then pooled according to Rubin's rule,
which is adapted for nested imputations if the dependent
argument implies a nested structure.
A list of data frames in the long format. The output can be summarized using the report
function.
The first element of the list is a list with either one (no trend analyses) or two (trend analyses)
data frames with at least six columns each. For each subpopulation denoted by the groups
statement, each
dependent variable, each parameter (i.e., the values of the corresponding categories of the dependent variable)
and each coefficient (i.e., the estimate and the corresponding standard error) the corresponding value is given.
group 
Denotes the group an analysis belongs to. If no groups were specified and/or analysis for the whole sample were requested, the value of ‘group’ is ‘wholeGroup’. 
depVar 
Denotes the name of the dependent variable in the analysis. 
modus 
Denotes the mode of the analysis. For example, if a JK2 analysis without sampling weights was conducted, ‘modus’ takes the value ‘jk2.unweighted’. If a analysis without any replicates but with sampling weights was conducted, ‘modus’ takes the value ‘weighted’. 
parameter 
Denotes the parameter of the regression model for which the corresponding value is given further. For frequency tables, this is the value of the category of the dependent variable which relative frequency is given further. 
coefficient 
Denotes the coefficient for which the corresponding value is given further. Takes the values ‘est’ (estimate) and ‘se’ (standard error of the estimate). 
value 
The value of the parameter, i.e. the relative frequency or its standard error. 
If groups were specified, further columns which are denoted by the group names are added to the data frame.
Rubin, D.B. (2003): Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica 57, 1, 3–18.
data(lsa) ### Example 1: only means, SD and variances for each country ### subsetting: We only consider domain 'reading' rd < lsa[which(lsa[,"domain"] == "reading"),] ### We only consider the first "nest". rdN1 < rd[which(rd[,"nest"] == 1),] ### First, we only consider year 2010 rdN1y10< rdN1[which(rdN1[,"year"] == 2010),] ### First example: Computes frequencies of polytomous competence levels (1, 2, 3, 4, 5) ### conditionally on country, using a chisquare test to decide whether the distribution ### varies between countries (it's an overall test, i.e. with three groups, df1=8). freq.tab1 < repTable(datL = rdN1y10, ID = "idstud", wgt = "wgt", imp="imp", type = "JK2", PSU = "jkzone", repInd = "jkrep", groups = "country", group.differences.by = "country", dependent = "comp", chiSquare = TRUE) res1 < report(freq.tab1, add = list ( domain = "reading" )) ### Second example: Computes frequencies of polytomous competence levels (1, 2, 3, 4, 5) ### conditionally on country. Now we test whether the frequency of each single category ### differs between pairs of countries (it's not an overall test ... repTable now ### calls repMean internally, using dummy variables freq.tab2 < repTable(datL = rdN1y10, ID = "idstud", wgt = "wgt", imp="imp", type = "JK2", PSU = "jkzone", repInd = "jkrep", groups = "country", group.differences.by = "country", dependent = "comp", chiSquare = FALSE) res2 < report(freq.tab2, add = list ( domain = "reading" )) ### Third example: trend estimation and nested imputation and 'by' loop ### (to date, only crossDiffSE = "old" works) freq.tab3 < by ( data = lsa, INDICES = lsa[,"domain"], FUN = function (subdat) { repTable(datL = subdat, ID = "idstud", wgt = "wgt", imp="imp", nest = "nest", type = "JK2", PSU = "jkzone", repInd = "jkrep", groups = "country", group.differences.by = "country", group.splits = 0:1, cross.differences = TRUE, crossDiffSE = "old", dependent = "comp", chiSquare = FALSE, trend = "year", linkErr = "leComp") }) res3 < do.call("rbind", lapply(names(freq.tab3), FUN = function (domain) { report(freq.tab3[[domain]], trendDiffs = TRUE, add = list ( domain = domain )) })) ### Fourth example: similar to example 3. trend estimation using a linking ### error data.frame linkErrs < data.frame ( trendLevel1 = 2010, trendLevel2 = 2015, depVar = "comp", unique(lsa[,c("domain", "comp", "leComp")]), stringsAsFactors = FALSE) colnames(linkErrs) < car::recode(colnames(linkErrs), "'comp'='parameter'; 'leComp'='linkingError'") freq.tab4 < by ( data = lsa, INDICES = lsa[,"domain"], FUN = function (subdat) { repTable(datL = subdat, ID = "idstud", wgt = "wgt", type="none", imp="imp", nest = "nest", groups = "country", group.differences.by = "country", group.splits = 0:1, cross.differences = FALSE, dependent = "comp", chiSquare = FALSE, trend = "year", linkErr = linkErrs[which(linkErrs[,"domain"] == subdat[1,"domain"]),]) }) res4 < do.call("rbind", lapply(names(freq.tab4), FUN = function (domain) { report(freq.tab4[[domain]], trendDiffs = TRUE, add = list ( domain = domain )) })) ### Fifth example: minimal example for three measurement occasions ### borrow data from the eatGADS package trenddat1 < system.file("extdata", "trend_gads_2010.db", package = "eatGADS") trenddat2 < system.file("extdata", "trend_gads_2015.db", package = "eatGADS") trenddat3 < system.file("extdata", "trend_gads_2020.db", package = "eatGADS") trenddat < eatGADS::getTrendGADS(filePaths = c(trenddat1, trenddat2, trenddat3), years = c(2010, 2015, 2020), fast=FALSE) dat < eatGADS::extractData(trenddat) ### use template linking Error Object load(system.file("extdata", "linking_error.rda", package = "eatRep")) ### check consistency of data and linking error object check1 < checkLEs(c(trenddat1, trenddat2, trenddat3), lErr) ### Analysis for reading comprehension freq.tab5 < repTable(datL = dat[which(dat[,"dimension"] == "reading"),], ID = "idstud", type="none", imp="imp", dependent = "traitLevel", chiSquare = FALSE, trend = "year", linkErr = lErr[which(lErr[,"domain"] == "reading"),]) res5 < report(freq.tab5, trendDiffs = TRUE, add = list ( domain = "reading" ))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.