create.fused: Creates a matched (synthetic) dataset

create.fusedR Documentation

Creates a matched (synthetic) dataset

Description

Creates a synthetic data frame after the statistical matching of two data sources at micro level.

Usage

create.fused(data.rec, data.don, mtc.ids, 
                z.vars, dup.x=FALSE, match.vars=NULL)  

Arguments

data.rec

A matrix or data frame that plays the role of recipient in the statistical matching application.

data.don

A matrix or data frame that that plays the role of donor in the statistical matching application.

mtc.ids

A matrix with two columns. Each row must contain the name or the index of the recipient record (row) in data.don and the name or the index of the corresponding donor record (row) in data.don. Note that this type of matrix is returned by the functions NND.hotdeck, RANDwNND.hotdeck, rankNND.hotdeck, and mixed.mtc.

z.vars

A character vector with the names of the variables available only in data.don that should be “donated” to data.rec.

dup.x

Logical. When TRUE the values of the matching variables in data.don are also “donated” to data.rec. The names of the matching variables have to be specified with the argument match.vars. To avoid confusion, the matching variables added to data.rec are renamed by adding the suffix “don”. By default dup.x=FALSE.

match.vars

A character vector with the names of the matching variables. It has to be specified only when dup.x=TRUE.

Details

This function allows to create the synthetic (or fused) data set after the application of a statistical matching in a micro framework. For details see D'Orazio et al. (2006).

Value

The data frame data.rec with the z.vars filled in and, when dup.x=TRUE, with the values of the matching variables match.vars observed on the donor records.

Author(s)

Marcello D'Orazio mdo.statmatch@gmail.com

References

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

See Also

NND.hotdeck RANDwNND.hotdeck rankNND.hotdeck

Examples


lab <- c(1:15, 51:65, 101:115)
iris.rec <- iris[lab, c(1:3,5)]  # recipient data.frame
iris.don <- iris[-lab, c(1:2,4:5)] # donor data.frame

# Now iris.rec and iris.don have the variables
# "Sepal.Length", "Sepal.Width"  and "Species"
# in common.
#  "Petal.Length" is available only in iris.rec
#  "Petal.Width"  is available only in iris.don

# find the closest donors using NND hot deck;
# distances are computed on "Sepal.Length" and "Sepal.Width"

out.NND <- NND.hotdeck(data.rec=iris.rec, data.don=iris.don,
            match.vars=c("Sepal.Length", "Sepal.Width"), 
            don.class="Species")

# create synthetic data.set, without the 
# duplication of the matching variables

fused.0 <- create.fused(data.rec=iris.rec, data.don=iris.don, 
            mtc.ids=out.NND$mtc.ids, z.vars="Petal.Width")

# create synthetic data.set, with the "duplication" 
# of the matching variables

fused.1 <- create.fused(data.rec=iris.rec, data.don=iris.don,
            mtc.ids=out.NND$mtc.ids, z.vars="Petal.Width",
            dup.x=TRUE, match.vars=c("Sepal.Length", "Sepal.Width"))

StatMatch documentation built on March 18, 2022, 6:55 p.m.