geoweight: Split weights across geographies
In donboyd5/microweight: Tools for Weighting Microdata Files

Description Usage Arguments Details Value Examples

geoweight calculates state weights for each household in a microdata file that add up to the household total weight, such that weighted state totals for selected characteristics hit or come close to desired targets

geoweight(
  wh,
  xmat,
  targets,
  dweights = get_dweights(targets),
  betavec = rep(0, length(targets)),
  method = "LM",
  maxiter = NULL,
  optlist = NULL,
  quiet = TRUE
)

`wh`	Household weights, 1 per household, numeric vector length h. Each household's geography weights must sum to its household weight.
`xmat`	Data for households. Matrix with 1 row per household and 1 column per characteristic (h x k matrix). Columns can be named.
`targets`	Targeted values. Matrix with 1 row per geographic area and 1 column per characteristic. If columns are named, names must match column names of `xmat`. Rownames can be used to identify geographic areas. If unnamed, rows will be named geo1, geo2, ..., geo_s
`dweights`	Difference weights: weights to be applied to Weighting factors for targets (h x k matrix).
`betavec`	optional vector of initial guess at parameters, length s * k; default is zero for all
`method`	optional parameter for approach to use; must be one of c('LM', 'Broyden', 'Newton'); default is 'LM'
`maxiter`	maximum number of iterations; integer; defaults vary by method: LM (default): 200 Broyden: 2000 Newton: 200
`optlist`	list of options that will update nelsqv or nls.lm options respectively
`quiet`	c(TRUE, FALSE) FALSE is default; TRUE provides newlsqv or nls.lm output

geoweight uses the solver nleqslv or the solver nls.lm depending on user choice.

The default method, LM, uses nls.lm as it appears to be the most robust of the methods, rarely failing and often producing a better optimum than Broyden or Newton. However, in some circumstances one of the latter may work better. It is hard to define guidelines for when a particular method will be better. The Broyden method can be faster or more robust than the Newton method but generally requires many more iterations than the Newton method, although iterations will be faster.

A list with the following elements:

h: number of households (or individuals, records, tax returns, etc.)
s: number of states (or other geographies or subgroups)
k: number of characteristics each household has
solver_message: message from the solver that was used
etime: elapsed time
beta_opt_mat: s x k matrix of optimal parameters
whs: h x s matrix of state weights for each household, computed using the optimal parameters
wh: the input vector of household total weights, length h
xmat: matrix of data for households, h x k
dweights: optional vector of weighting factors for targets, length s * k
output: list of output from the solver that was used

# Example 1: Determine state weights for a simple problem with random data
p <- make_problem(h=10, s=3, k=2)
dw <- get_dweights(p$targets)

res1 <- geoweight(wh = p$wh, xmat = p$xmat, targets = p$targets,
  dweights = dw)

res2 <- geoweight(wh = p$wh, xmat = p$xmat, targets = p$targets,
  dweights = dw, method = 'Newton')

res3 <- geoweight(wh = p$wh, xmat = p$xmat, targets = p$targets,
  dweights = dw, method = 'Broyden')

res1
res2
res3
c(res1$sse_unweighted, res2$sse_unweighted, res3$sse_unweighted)

# verify that the state weights produce the desired targets
t(res2$whs) %*% p$xmat
p$targets