rankNND.hotdeck | R Documentation |
This function implements rank hot deck distance method. For each recipient record the closest donors is chosen by considering the distance between the percentage points of the empirical cumulative distribution function.
rankNND.hotdeck(data.rec, data.don, var.rec, var.don=var.rec,
don.class=NULL, weight.rec=NULL, weight.don=NULL,
constrained=FALSE, constr.alg="Hungarian",
keep.t=FALSE)
data.rec |
A numeric matrix or data frame that plays the role of recipient. This data frame must contain the variable Missing values ( |
data.don |
A matrix or data frame that plays the role of donor. This data frame must contain the variable |
var.rec |
A character vector with the name of the variable in |
var.don |
A character vector with the name of the variable |
don.class |
A character vector with the names of the variables (columns in both the data frames) that identify donation classes. In each donation class the computation of percentage points is carried out independently. Then only distances between percentage points of the units in the same donation class are computed. The case of empty donation classes should be avoided. It would be preferable that the variables used to form donation classes are defined as When not specified (default), no donation classes are used. |
weight.rec |
Eventual name of the variable in |
weight.don |
Eventual name of the variable in |
constrained |
Logical. When |
constr.alg |
A string that has to be specified when |
keep.t |
Logical, when donation classes are used by setting |
This function finds a donor record for each record in the recipient data set. The chosen donor is the one at the closest distance in terms of empirical cumulative distribution (Singh et al., 1990). In practice the distance is computed by considering the estimated empirical cumulative distribution for the reference variable (var.rec
and var.don
) in data.rec
and data.don
. The empirical cumulative distribution function is estimated by:
\hat{F}(y) = \frac{1}{n} \sum_{i=1}^{n} I(y_i\leq y)
being I()=1
if y_i\leq y
and 0 otherwise.
In presence of weights, the empirical cumulative distribution function is estimated by:
\hat{F}(y) = \frac{\sum_{i=1}^{n} w_i I(y_i\leq y)}{\sum_{i=1}^{n} w_i}
In the unconstrained case, when there are more donors at the same distance, one of them is chosen at random.
When the donation class are introduced, then the empirical cumulative distribution function is estimated independently in each donation classes and the search of a recipient is restricted to donors in the same donation class.
A donor can be chosen more than once. To avoid it set constrained=TRUE
. In such a case a donor can be chosen just once and the selection of the donors is carried out by solving a transportation problem with the objective of minimizing the overall matching distance (sum of the distances recipient-donor).
A R list with the following components:
mtc.ids |
A matrix with the same number of rows of |
dist.rd |
A vector with the distances between each recipient unit and the corresponding donor. |
noad |
The number of available donors at the minimum distance for each recipient unit (only in unconstrained case) |
call |
How the function has been called. |
Marcello D'Orazio mdo.statmatch@gmail.com
D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.
Singh, A.C., Mantel, H., Kinack, M. and Rowe, G. (1993). “Statistical matching: use of auxiliary information as an alternative to the conditional independence assumption”. Survey Methodology, 19, 59–79.
NND.hotdeck
data(samp.A, samp.B, package="StatMatch") #loads data sets
# samp.A plays the role of recipient
?samp.A
# samp.B plays the role of donor
?samp.B
# rankNND.hotdeck()
# donation classes formed using "area5"
# ecdf conputed on "age"
# UNCONSTRAINED case
out.1 <- rankNND.hotdeck(data.rec=samp.A, data.don=samp.B, var.rec="age",
don.class="area5")
fused.1 <- create.fused(data.rec=samp.A, data.don=samp.B,
mtc.ids=out.1$mtc.ids, z.vars="labour5")
head(fused.1)
# as before but ecdf estimated using weights
# UNCONSTRAINED case
out.2 <- rankNND.hotdeck(data.rec=samp.A, data.don=samp.B, var.rec="age",
don.class="area5",
weight.rec="ww", weight.don="ww")
fused.2 <- create.fused(data.rec=samp.A, data.don=samp.B,
mtc.ids=out.2$mtc.ids, z.vars="labour5")
head(fused.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.