impimp: Imprecise Imputation for Statistical Matching

Description Usage Arguments Details Value Reserved characters Note References See Also Examples

Description

Impute a data frame imprecisely

Usage

1
2
3
4
5
6
7
impimp(recipient, donor, method = c("variable_wise", "case_wise",
  "domain"), matchvars = NULL, vardomains = NULL)

## S3 method for class 'impimp'
print(x, ...)

is.impimp(z)

Arguments

recipient

a data.frame acting as recipient; see details.

donor

a data.frame acting as donor; see details.

method

1-character string of the desired imputation method. The following values are possible, see details for an explanantion: "variable_wise" (default), "case_wise" and "domain".

matchvars

a character vector containing the variable names to be used as matching variables. If NULL (default) all variables, present in both donor and recipient are used as matching variables.

vardomains

a named list containing the possible values of all variable in donor that are not present in recipient.
If set to NULL (default) the list is generated by first coercing all those variables to type factor and then storing their levels.

x

object of class 'impimp'

...

further arguments passed down to print.data.frame

z

object to test for class "impimp"

Details

As in the context of statistical matching the data.frames recipient and donor are assumed to contain an overlapping set of variables.

The missing values in recipient are subsituted with observed values in donor for approaches based on donation classes and otherwise with the set of all possible values for the variable in question.

For method = "domain" a missing value of a variable in recipient is imputed by the set of all possible values of that variable.

The other methods are based on donation classes which are formed based on the matching variables whose names are provided by matchvars. They need to be present in both recipient and donor: For method = "variable_wise" a missing value of a variable in recipient is imputed by the set of all observed values of that variable in donor. For method = "case_wise" the variables only present in donor are represented as tuples. A missing tuple in recipient is then imputed by the set of all observed tuples in donor.

Value

The data.frame resulting in an imprecise imputation of donor into recipient. It is also of class "impimp" and stores the imputation method in its attribute "impmethod", the names of the variables of the resulting object containing imputed values in the attribute "imputedvarnames", as well as the list of (guessed) levels of each underlying variable in "varlevels".

Reserved characters

The variable names and observations in recipient and donor must not contain characters that are reserved for internal purpose. The actual characters that are internally used are stored in the options options("impimp.obssep") and options("impimp.varssep"). The former is used to separate the values of a set-valued observation, while the other is used for a concise tupel representation.

Note

This method does not require that all variables in recipient and donor are factor variables, however, the imputation methods apply coercion to factor, so purely numerical variables will be treated as factors eventually. It does assume (and test for it) that there are no missing values present in the matching variables.

References

Endres, E., Fink, P. and Augustin, T. (2018), Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data, Department of Statistics (LMU Munich): Technical Reports, No. 214. URL https://epub.ub.uni-muenchen.de/42423/.

See Also

for the estimation of probabilities impest and impestcond; rbindimpimp for joining two impimp objects

Examples

1
2
3
4
5
6
7
8
A <- data.frame(x1 = c(1,0), x2 = c(0,0),
                y1 = c(1,0), y2 = c(2,2))
B <- data.frame(x1 = c(1,1,0), x2 = c(0,0,0),
                z1 = c(0,1,1), z2 = c(0,1,2))
impimp(A, B, method = "variable_wise")

## Specifically setting the possible levels of 'z1'
impimp(A, B, method = "domain", vardomains = list(z1 = c(0:5)))

impimp documentation built on May 1, 2019, 10:13 p.m.

Related to impimp in impimp...