reweight: Reweight a microdata file.

Description Usage Arguments Details Value Examples

View source: R/reweight.r

Description

Calculate new weights for each household in a microdata file so that (1) selected variables, weighted with the new weights and summed, hit or come close to desired targets, and (2) a measure of distortion based on how much the new weights differ from an initial set of weights is minimized.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
reweight(
  iweights,
  xmat,
  targets,
  tol,
  xlb = 0,
  xub = 50,
  method = "auglag",
  optlist = NULL,
  quiet = TRUE
)

Arguments

iweights

Initial weights, 1 per household, numeric vector length h.

xmat

Data for households. Matrix with 1 row per household and 1 column per target (h x k matrix). Columns must be named. Each cell is the amount that a household, when weighted, will add to the corresponding target. For example, if the column corresponds to the target for total income, the cell value would be the income for the household. If the column corresponds to the target for number of married households, then the cell value would be 1 if the household is married, and 0 otherwise.

targets

Named numeric vector of length k. Each element must correspond to a column of xmat.

tol

Additive tolerances, 1 per target. Numeric vector length k. reweight will seek to hit the targets, plus or minus tol.

xlb

Lower bounds for the ratio of new weights to initial weights. Either a vector of 1 value per household or a scalar that will be used for all households. Default is 0.

xub

Upper bounds for the ratio of new weights to initial weights. Either a vector of 1 value per household or a scalar that will be used for all households. Default is 50.

method

=c("auglag", "ipopt"). auglag is default. ipopt requires installation of ipoptr, which can be difficult. See details.

optlist

Named list of allowable options: If method = "auglag" (default) @seealso nloptr::nloptr() or run nloptr::nloptr.print.options() for all nloptr options. If method = "ipopt" @seealso ipoptr::ipoptr() and see IPOPT options .

quiet

TRUE (default) or FALSE.

Details

reweight constructs a nonlinear program from the provided inputs. It minimizes a distortion function that is based upon how much weights change from the initial weights, while seeking to satisfy constraints that are the targets +/- tolerances. The default distortion function is given below, where w_i is an initial weight and w_n is a new weight:

∑ w_i * (w_n / w_i - 1)^2

Thus, we minimize the sum of differences between the ratio of each new weight and its corresponding initial weight and 1, with each term weighted by the initial weight. Future versions will allow alternative distortion functions.

The default is to use an augmented lagrangian approach implemented with the nloptr function in the nloptr package, using the L-BFGS algorithm. This usually works well on reasonably sized problems but may have difficulty with very large or very difficult problems.

If you have ipoptr installed, which uses the IPOPT solver, you will be able to solve much larger problems much more quickly by specifying method = "ipopt" in the reweight call.

Value

A list with the following elements:

solver_message

The message produced by IPOPT. See IPOPT output.

etime

Elapsed time.

objective

The objective function value at the solution.

weights

Numeric vector of new weights.

targets_df

Data frame with target names and values, values at the starting point, values at the solution, tolerances, differences, and percent differences. The suffix diff indicates the difference from a target and the suffix pdiff indicates the percent difference from a target.

result

List with output from the solver that was used.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# Determine new weights for a small problem using ACS data
library(tidyverse)
data(acs)
# let's focus on income group 5 and create and then try to hit targets for:
#    number of records (nrecs -- to be created based on the weight, pwgtp)
#    personal income (pincp)
#    wages (wagp)
#    number of people with wages (wagp_nnz -- to be created)
#    supplemental security income (ssip)
#    number of people with supplemental security income (ssip_nnz -- to
#       be created)
# we also need to get pwgtp - the person weight for each record, which
#       will be our initial weight
# for each "number of" variable we need to create an indicator variable that
# defines whether it is true for that record

# get the data and prepare it
data_df <- acs %>%
  filter(incgroup == 5) %>%
  select(pwgtp, pincp, wagp, ssip) %>%
  # create the indicator variables
  mutate(nrecs = 1, # indicator used for number of records
         wagp_nnz = (wagp != 0) * 1.,
         ssip_nnz = (ssip != 0) * 1.)
data_df # 1,000 records

iweights <- data_df$pwgtp # initial weights

# prepare targets: in practice we would get them from an external source but
# in this case we'll get actual sums on the file and perturb them randomly
# so that we will need new weights to hit these targets.
set.seed(1234)
targets_df <- data_df %>%
  pivot_longer(-pwgtp) %>%
  mutate(wtd_value = value * pwgtp) %>%
  group_by(name) %>%
  summarise(wtd_value = sum(wtd_value), .groups = "drop") %>%
  mutate(target = wtd_value * (1 + rnorm(length(.), mean=0, sd=.02)))
# in practice we'd make sure that targets make sense (e.g., not negative)
targets_df

targets <- targets_df$target
names(targets) <- targets_df$name
targets

tol <- .005 * abs(targets) # use 0.5% as our tolerance

# Prepare the matrix of characteristics of each household. These
# characteristics must correspond to the targets. Columns must # either be in
# the same order as the targets, or must have the same names.
xmat <- data_df %>% # important that columns be in the same order as the targets
  select(all_of(names(targets))) %>% as.matrix

res <- reweight(iweights = iweights,
                xmat = xmat,
                targets = targets,
                tol = tol)
names(res)
res$etime
res$objective_unscaled
res$targets_df
quantile(res$weights)
quantile(iweights)
quantile(res$weights / iweights)

#' \dontrun{
# This example is not run because it uses `ipoptr`, which is not a
# requirement for `microweight`.

# You can write ipopt output to a text file and even monitor results of a
# long-running optimization by opening the file with a text editor, as long
# as the editor does not lock the file for writing. Normally you would specify
# the path to the output file in a location on your system but for this
# example we'll use a temporary file, which we'll call tfile.

tfile <- tempfile("tfile",fileext=".txt")

opts <- list(output_file = tfile,
             file_print_level = 5,
             linear_solver = "ma27")

res2 <- reweight(iweights = iweights,
                xmat = xmat,
                targets = targets,
                tol = tol,
                method = "ipopt",
                optlist = opts,
                quiet = TRUE) # write progress to tfile, not console
names(res2)
res2$solver_message
res2$etime
res2$objective_unscaled
res2$targets_df

# Normally you would examine the optimization output file in a text editor
# For this example, we display it in the console:
writeLines(readLines(tfile))
unlink(tfile) # delete the temporary file in this example
} # end dontrun

donboyd5/microweight documentation built on Aug. 17, 2020, 4:48 p.m.