Description Usage Arguments Details Value Examples
Calculate new weights for each household in a microdata file so that (1) selected variables, weighted with the new weights and summed, hit or come close to desired targets, and (2) a measure of distortion based on how much the new weights differ from an initial set of weights is minimized.
1 2 3 4 5 6 7 8 9 10 11 |
iweights |
Initial weights, 1 per household, numeric vector length h. |
xmat |
Data for households. Matrix with 1 row per household and 1 column per target (h x k matrix). Columns must be named. Each cell is the amount that a household, when weighted, will add to the corresponding target. For example, if the column corresponds to the target for total income, the cell value would be the income for the household. If the column corresponds to the target for number of married households, then the cell value would be 1 if the household is married, and 0 otherwise. |
targets |
Named numeric vector of length k. Each element must correspond
to a column of |
tol |
Additive tolerances, 1 per target. Numeric vector length k.
|
xlb |
Lower bounds for the ratio of new weights to initial weights. Either a vector of 1 value per household or a scalar that will be used for all households. Default is 0. |
xub |
Upper bounds for the ratio of new weights to initial weights. Either a vector of 1 value per household or a scalar that will be used for all households. Default is 50. |
method |
=c("auglag", "ipopt"). auglag is default. ipopt requires
installation of |
optlist |
Named list of allowable options:
If method = "auglag" (default) @seealso |
quiet |
TRUE (default) or FALSE. |
reweight
constructs a nonlinear program from the provided inputs. It
minimizes a distortion function that is based upon how much weights change
from the initial weights, while seeking to satisfy constraints that are the
targets +/- tolerances. The default distortion function is given below, where
w_i
is an initial weight and w_n
is a new weight:
∑ w_i * (w_n / w_i - 1)^2
Thus, we minimize the sum of differences between the ratio of each new weight and its corresponding initial weight and 1, with each term weighted by the initial weight. Future versions will allow alternative distortion functions.
The default is to use an augmented lagrangian approach
implemented with the nloptr
function in the nloptr
package, using the
L-BFGS algorithm. This usually works well on reasonably sized problems but
may have difficulty with very large or very difficult problems.
If you have ipoptr
installed, which uses the
IPOPT solver, you will be able to
solve much larger problems much more quickly by specifying method = "ipopt"
in the reweight
call.
A list with the following elements:
The message produced by IPOPT. See IPOPT output.
Elapsed time.
The objective function value at the solution.
Numeric vector of new weights.
Data frame with target names and values, values at the starting point, values at the solution, tolerances, differences, and percent differences. The suffix diff indicates the difference from a target and the suffix pdiff indicates the percent difference from a target.
List with output from the solver that was used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | # Determine new weights for a small problem using ACS data
library(tidyverse)
data(acs)
# let's focus on income group 5 and create and then try to hit targets for:
# number of records (nrecs -- to be created based on the weight, pwgtp)
# personal income (pincp)
# wages (wagp)
# number of people with wages (wagp_nnz -- to be created)
# supplemental security income (ssip)
# number of people with supplemental security income (ssip_nnz -- to
# be created)
# we also need to get pwgtp - the person weight for each record, which
# will be our initial weight
# for each "number of" variable we need to create an indicator variable that
# defines whether it is true for that record
# get the data and prepare it
data_df <- acs %>%
filter(incgroup == 5) %>%
select(pwgtp, pincp, wagp, ssip) %>%
# create the indicator variables
mutate(nrecs = 1, # indicator used for number of records
wagp_nnz = (wagp != 0) * 1.,
ssip_nnz = (ssip != 0) * 1.)
data_df # 1,000 records
iweights <- data_df$pwgtp # initial weights
# prepare targets: in practice we would get them from an external source but
# in this case we'll get actual sums on the file and perturb them randomly
# so that we will need new weights to hit these targets.
set.seed(1234)
targets_df <- data_df %>%
pivot_longer(-pwgtp) %>%
mutate(wtd_value = value * pwgtp) %>%
group_by(name) %>%
summarise(wtd_value = sum(wtd_value), .groups = "drop") %>%
mutate(target = wtd_value * (1 + rnorm(length(.), mean=0, sd=.02)))
# in practice we'd make sure that targets make sense (e.g., not negative)
targets_df
targets <- targets_df$target
names(targets) <- targets_df$name
targets
tol <- .005 * abs(targets) # use 0.5% as our tolerance
# Prepare the matrix of characteristics of each household. These
# characteristics must correspond to the targets. Columns must # either be in
# the same order as the targets, or must have the same names.
xmat <- data_df %>% # important that columns be in the same order as the targets
select(all_of(names(targets))) %>% as.matrix
res <- reweight(iweights = iweights,
xmat = xmat,
targets = targets,
tol = tol)
names(res)
res$etime
res$objective_unscaled
res$targets_df
quantile(res$weights)
quantile(iweights)
quantile(res$weights / iweights)
#' \dontrun{
# This example is not run because it uses `ipoptr`, which is not a
# requirement for `microweight`.
# You can write ipopt output to a text file and even monitor results of a
# long-running optimization by opening the file with a text editor, as long
# as the editor does not lock the file for writing. Normally you would specify
# the path to the output file in a location on your system but for this
# example we'll use a temporary file, which we'll call tfile.
tfile <- tempfile("tfile",fileext=".txt")
opts <- list(output_file = tfile,
file_print_level = 5,
linear_solver = "ma27")
res2 <- reweight(iweights = iweights,
xmat = xmat,
targets = targets,
tol = tol,
method = "ipopt",
optlist = opts,
quiet = TRUE) # write progress to tfile, not console
names(res2)
res2$solver_message
res2$etime
res2$objective_unscaled
res2$targets_df
# Normally you would examine the optimization output file in a text editor
# For this example, we display it in the console:
writeLines(readLines(tfile))
unlink(tfile) # delete the temporary file in this example
} # end dontrun
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.