stat_match: Statistical matching with StatMatch.

stat_matchR Documentation

Statistical matching with StatMatch.

Description

Statistical matching with StatMatch.

Usage

stat_match(data.rec = data.rec, data.don = data.don, rec.id = rec.id,
  don.id = don.id, match.vars = match.vars, don.class = NULL,
  by.var = NULL, method = method, verbose = TRUE, parallel = FALSE, ...)

Arguments

data.rec

Recipient data.This data frame must contain the variables (columns) that should be used, directly or indirectly, in the matching application. Missing values (NA) are allowed.

data.don

Donor data. The variables (columns) involved, directly or indirectly, in the computation of distance must be the same and of the same type as those in data.rec.

rec.id

Recipient id in recipient data.

don.id

Donor id in donor data.

match.vars

A character vector with the names of the matching variables (the columns in both the data frames) that have to be used to compute distances among records (rows) in data.rec and those in data.don.

don.class

A character vector with the names of the variables (columns in both the data frames) that have to be used to identify the donation classes. In this case the computation of distances is limited to those units of data.rec and data.doc that belong to the same donation class. The case of empty donation classes should be avoided.The variables chosen for the creation of the donation clasess should not contain missing values (NAs).

by.var

A variable which segments both recipients and donors into the same groups, where the stat matching are conducted. By.var variable has to be the variable in both data.rec and data.don. If by.var is NULL, stat matching will run without grouping first. If by.var has many levels (for example > 20), parallel processing is recommended.

parallel

Logical. When TRUE, stat matching runs in parallel way (call parLapply function in parallel package). Default FALSE.

don.class2

The second level of don.class. If stat matching fails by using don.class, the function will rerun the stat match using don.class2 if provided.

don.class3

The third level of don.class. If stat matching fails by using don.class and don.class2, the function will rerun the stat match using don.class3 if provided.

dist.fun

Distance function. The following distances are allowed: "Manhattan" (aka "City block"; default), "Euclidean", "Mahalanobis","exact" or "exact matching", "Gower", "minimax" or one of the distance functions available in the package proxy.

k

The number of times that a unit in data.don can be selected as a donor when constrained=TRUE. Default value is 15.

constrained

Logical. When constrained=FALSE (default) each record in data.don can be used as a donor more than once. On the contrary, when constrained=TRUE each record in data.don can be used as a donor only k times. In this case, the set of donors is selected by solving an optimization problem, in order to minimize the overall matching distance. See description of the argument constr.alg for details. Set this option to FALSE if stat match keeps on failing.

constr.alg

A string that has to be specified when constrained=TRUE. Two choices are available: “lpSolve” and “hungarian”. Note that Hungarian algorithm is faster and more efficient if compared to constr.alg="lpSolve" but it allows selecting a donor just once, i.e. k = 1 . Default: "LpSolve".

Value

A synthetic data frame after the statistical matching of two data sources. The data frame includes all the columns in data.rec plus the don.id in data.don and a variable called "MatchLevel". MatchLevel can be 1 (matched on don.class level), 2 (matched on don.class2 level), 3 (matched on don.class3 level) and 0 (stat matching unsuccessful).

Examples

mag <- data(mag)

rec <- mag %>% filter(type==1)
don <- mag %>% filter(type==0)

don.class  = c("age", "gender", "WhiteNH")
don.class2 = c("age", "gender")
don.class3 = c("gender")

X.mtc   =  c("NFAC1_2", "NFAC2_2", "NFAC3_2", "NFAC4_2", "NFAC5_2", "NFAC6_2", "NFAC7_2", "childhh", "agemid", "incmid", "ethnic", "maritalstat",
             "educat", "homestat", "employstat", "dvryes", "cabdsl")

out.nnd.c <- stat_match(data.rec = rec1,
                        data.don = don1,
                        rec.id = "BOOK_ID",
                        don.id = "RESPID", 
                        match.vars = X.mtc,
                        don.class  = group.v,
                        don.class2 = don.class2,
                        don.class3 = don.class3,
                        by.var = "MAG",
                        k = 50,
                        constrained = T))


yangx227/SimmonsResearchR documentation built on April 24, 2022, 6:44 a.m.