otmatch: Statistical Matching using Optimal transport

View source: R/otmatch.R

otmatchR Documentation

Statistical Matching using Optimal transport

Description

This function computes the statistical matching between two complex survey samples with weighting schemes. The function uses the function transport of the package transport.

Usage

otmatch(
  X1,
  id1,
  X2,
  id2,
  w1,
  w2,
  dist_method = "Euclidean",
  transport_method = "shortsimplex",
  EPS = 1e-09
)

Arguments

X1

A matrix, the matching variables of sample 1.

id1

A character or numeric vector that contains the labels of the units in sample 1.

X2

A matrix, the matching variables of sample 2.

id2

A character or numeric vector that contains the labels of the units in sample 1.

w1

A numeric vector that contains the weights of the sample 1, harmonized by the function harmonize.

w2

A numeric vector that contains the weights of the sample 2, harmonized by the function harmonize.

dist_method

A string that specified the distance used by the function dist of the package proxy. Default "Euclidean".

transport_method

A string that specified the distance used by the function transport of the package transport. Default "shortsimplex".

EPS

an numeric scalar to determine if the value is rounded to 0.

Details

All details of the method can be seen in : Raphaël Jauslin and Yves Tillé (2021) <arXiv:2105.08379>.

Value

A data.frame that contains the matching. The first two columns contain the unit identities of the two samples. The third column is the final weights. All remaining columns are the matching variables.

Examples


#--- SET UP
N=1000
p=5
X=array(rnorm(N*p),c(N,p))
EPS= 1e-9

n1=100
n2=200

s1 = sampling::srswor(n1,N)
s2 = sampling::srswor(n2,N)


id1=(1:N)[s1==1]
id2=(1:N)[s2==1]

d1=rep(N/n1,n1)
d2=rep(N/n2,n2)

X1=X[s1==1,]
X2=X[s2==1,]

#--- HARMONIZATION

re=harmonize(X1,d1,id1,X2,d2,id2)
w1=re$w1
w2=re$w2

#--- STATISTICAL MATCHING WITH OT

object = otmatch(X1,id1,X2,id2,w1,w2)


round(colSums(object$weight*object[,4:ncol(object)]),3)
round(colSums(w1*X1),3)
round(colSums(w2*X2),3)

StratifiedSampling documentation built on Oct. 26, 2022, 5:09 p.m.