aggregateData: Aggregate Datasets With Several Kinds of Missing Values
In eatPrep: Prepare Data for IRT Analyses

Description Usage Arguments Details Value Warning Author(s) See Also Examples

Aggregate datasets with constraints on missing values

1
2
3

aggregateData(dat, subunits, units, aggregatemissings = NULL, 
    rename = FALSE, recodedData = TRUE, suppressErr = FALSE, 
    recodeErr = "mci", verbose = FALSE)

`dat`	A data frame containing the data to be aggregated.
`subunits`	A data frame with subunit information. See ‘Details’.
`units`	A data frame with unit information. See ‘Details’.
`aggregatemissings`	Optional: A symmetrical n x n matrix with information on how missing values should be aggregated. If no matrix is given, the default will be used. See 'Examples'.
`rename`	Logical indicating whether units with only one subunit should be renamed to their unit name? Default is `FALSE`.
`recodedData`	Logical indicating whether colnames in `dat` are the subunit names (as in `subunits$subunit`) or recoded subunit names (as in `subunits$subunitRecoded`). Default is `TRUE`, meaning that colnames are recoded subitem names.
`suppressErr`	Logical indicating whether aggregated cells with `err` (see ‘Details’) should be recoded to another value.
`recodeErr`	Character vector of length 1 indicating to which `err` should be recoded. This argument is only evaluated when `suppressErr = TRUE`
`verbose`	Logical. If `TRUE` additional information is printed.

aggregateData aggregates units in data frames with special consideration of missing values.The aggregation of missing values is specified in the argument aggregatemissings. The rownames and colnames of this n x n matrix correspond to the missing codes in the data (see collapseMissings for supported missing values). Additionally, the values vc (for valid code) and err (for error) are used. If aggregatemissings is a data frame, it will be coerced to a matrix with the first column of the data frame being transformed into the rownames of the matrix. A warning will be given if the matrix is not symmetrical.

aggregateData combines the subunits one by one, i.e. it aggregates the first two subunits of a unit, then adds the third subunit to the new aggregated variable and continues in this manner until all subunits are aggregated. In every step during the process a value of the first variable (e.g., the aggregated variable from the previous step) is matched with the rownames of aggregatemissings and the corresponding value of the second variable (e.g., the next subitem to be aggregated) is matched with the colnames of aggregatemissings. The new value of the aggregated variable will therefore be the value in aggregatemissings[firstVar, secondVar].If the value in the final aggregated variable is vc, either the mean or the sum of subunits will be calculated. The rule given in units$unitAggregateRule determines which one will be chosen, with SUM being the default if column units$unitAggregateRule is empty.

The user can specify combinations of missing values that cannot occur simultaneously in one unit by setting the respective cell in aggregatemissings to err. For example, it is unlikely that one subunit is not administered (missing by design, mbd) and another subunit of the same unit was intentionally left blank by the person working on the test booklet (missing by intention mbi). Thus, this combination of missing values is defaulted to produce an error (err) in the aggregated variable. If the aggregation produces err at any point, it will produce a warning. Values err can be recoded to a different value by specifying the arguments suppressErr and recodeErr.

Examples of data frames subunits and units can be found via data(inputList).

A data frame with aggregated units and, if rename = TRUE, renamed subunits.

Missings are only correctly aggregated if their values correspond to the values in aggregatemissings. aggregateData does not check for value types or whether codes are valid. Use of checkData and recodeData before using aggregateData is therefore strongly recommended.

Nicole Haag, Anna Lenski

recodeData, checkData

data(inputDat)
data(inputList)

dat1 <- inputDat[[1]]  # get first dataset from inputDat

# recode data
datRec <- recodeData(dat1, inputList$values, inputList$subunits)
  
# define matrix for missing aggregation (note: this is the default matrix)
am <- matrix(c(
    "vc" , "mvi", "vc" , "mci", "err", "vc" , "vc" , "err",
    "mvi", "mvi", "err", "mci", "err", "err", "err", "err",
    "vc" , "err", "mnr", "mci", "err", "mir", "mnr", "err",
    "mci", "mci", "mci", "mci", "err", "mci", "mci", "err",
    "err", "err", "err", "err", "mbd", "err", "err", "err",
    "vc" , "err", "mir", "mci", "err", "mir", "mir", "err",
    "vc" , "err", "mnr", "mci", "err", "mir", "mbi", "err",
    "err", "err", "err", "err", "err", "err", "err", "err" ),
    nrow = 8, ncol = 8, byrow = TRUE) 

dimnames(am) <- list(
    c("vc" ,"mvi", "mnr", "mci",  "mbd", "mir", "mbi", "err"),
    c("vc" ,"mvi", "mnr", "mci",  "mbd", "mir", "mbi", "err"))
  
print(am)
  
datAggr <- aggregateData(datRec, inputList$subunits, inputList$units, 
    aggregatemissings = am, rename = TRUE, recodedData = TRUE,
    suppressErr = TRUE, recodeErr = "mci", verbose = TRUE)