getdist: L1 Distance
In kbal: Kernel Balancing

View source: R/functions.R

getdist

R Documentation

L1 Distance

Description

Calculates the L1 distance between the treated or population units and the kernel balanced control or sampled units.

Usage

getdist(
  target,
  observed,
  K,
  w.pop = NULL,
  w = NULL,
  numdims = NULL,
  ebal.tol = 1e-06,
  ebal.maxit = 500,
  svd.U = NULL
)

Arguments

`target`	a numeric vector of length equal to the total number of units where population/treated units take a value of 1 and sample/control units take a value of 0.
`observed`	a numeric vector of length equal to the total number of units where sampled/control units take a value of 1 and population/treated units take a value of 0.
`K`	the kernel matrix
`w.pop`	an optional vector input to specify population weights. Must be of length equal to the total number of units (rows in `svd.U`) with all sampled units receiving a weight of 1. The sum of the weights for population units must be either 1 or the number of population units.
`w`	a optional numeric vector of weights for every observation. Note that these weights should sum to the total number of units, where treated or population units have a weight of 1 and control or sample units have appropriate weights derived from kernel balancing with mean 1, is consistent with the output of `getw()`. If unspecified, these weights are found internally using `numdims` dimensions of the SVD of the kernel matrix `svd.U` with `ebalance_custom()`.
`numdims`	an optional numeric input specifying the number of columns of the singular value decomposition of the kernel matrix to use when finding weights when `w` is not specified.
`ebal.tol`	an optional numeric input specifying the tolerance level used by custom entropy balancing function `ebalance_custom()` in the case that `w` is not specified. Default is `1e-6`.
`ebal.maxit`	maximum number of iterations in optimization search used by `ebalance_custom` when `w` is not specified. Default is `500`.
`svd.U`	an optional matrix of left singular vectors from performing `svd()` on the kernel matrix in the case that `w` is unspecified. If unspecified when `w` also not specified, internally computes the svd of `K`.

Value

`L1`	a numeric giving the L1 distance, the absolute difference between `pX_D1` and `pX_D0w`
`w`	numeric vector of weights used
`pX_D1`	a numeric vector of length equal to the total number of observations where the nth entry is the sum of the kernel distances from the nth unit to every treated or population unit. If population units are specified, this sum is weighted by `w.pop` accordingly.
`pX_D0`	a numeric vector of length equal to the total number of observations where the nth entry is the sum of the kernel distances from the nth unit to every control or sampled unit.
`pX_D0w`	a numeric vector of length equal to the total number of observations where the nth entry is the weighted sum of the kernel distances from the nth unit to every control or sampled unit. The weights are given by entropy balancing and produce mean balance on `\phi(X)`, the expanded features of `X` using a given kernel `\phi(.)`, for the control or sample group and treated group or target population.

Examples


#loading and cleaning lalonde data
set.seed(123)
data("lalonde")
# Select a random subset of 500 rows
lalonde_sample <- sample(1:nrow(lalonde), 500, replace = FALSE)
lalonde <- lalonde[lalonde_sample, ]

xvars=c("age","black","educ","hisp","married","re74","re75","nodegr","u74","u75")

#need to first build gaussian kernel matrix
K_pass <- makeK(allx = lalonde[,xvars])
#also need the SVD of this matrix
svd_pass <- svd(K_pass)

#running without passing weights in directly, using numdims=33
l1_lalonde <- getdist(target = lalonde$nsw,
                      observed = 1-lalonde$nsw,
                      K = K_pass,
                      svd.U = svd_pass$u,
                      numdims = 33)

 #alternatively, we can get the weights ourselves and pass them in directly
 #using the first 33 dims of svd_pass$u to match the above
w_opt <- getw(target= lalonde$nsw,
              observed = 1-lalonde$nsw,
              svd.U = svd_pass$u[,1:33])$w
l1_lalonde2 <- getdist(target = lalonde$nsw,
                 observed = 1-lalonde$nsw,
                 K = K_pass,
                 w = w_opt)

kbal documentation built on April 3, 2025, 6:04 p.m.