calibration: Calibration on margins

View source: R/calibration.R

calibrationR Documentation

Calibration on margins

Description

Performs calibration on margins with several methods and customizable parameters

Usage

calibration(
  data,
  marginMatrix,
  colWeights,
  method = "linear",
  bounds = NULL,
  q = NULL,
  costs = NULL,
  gap = NULL,
  popTotal = NULL,
  pct = FALSE,
  scale = NULL,
  description = TRUE,
  maxIter = 2500,
  check = TRUE,
  calibTolerance = 1e-06,
  uCostPenalized = 1,
  lambda = NULL,
  precisionBounds = 1e-04,
  forceSimplex = FALSE,
  forceBisection = FALSE,
  colCalibratedWeights,
  exportDistributionImage = NULL,
  exportDistributionTable = NULL
)

Arguments

data

The dataframe containing the survey data

marginMatrix

The matrix giving the margins for each column variable included in the calibration problem

colWeights

The name of the column containing the initial weights in the survey dataframe

method

The method used to calibrate. Can be "linear", "raking", "logit"

bounds

Two-element vector containing the lower and upper bounds for bounded methods ("logit")

q

Vector of q_k weights described in Deville and Sarndal (1992)

costs

The penalized calibration method will be used, using costs defined by this vector. Must match the number of rows of marginMatrix. Negative of non-finite costs are given an infinite cost (coefficient of C^-1 matrix is 0)

gap

Only useful for penalized calibration. Sets the maximum gap between max and min calibrated weights / initial weights ratio (and thus is similar to the "bounds" parameter used in regular calibration)

popTotal

Precise the total population if margins are defined by relative value in marginMatrix (percentages)

pct

If TRUE, margins for categorical variables are considered to be entered as percentages. popTotal must then be set. (FALSE by default)

scale

If TRUE, stats (including bounds) on ratio calibrated weights / initial weights are done on a vector multiplied by the weighted non-response ratio (ratio population total / total of initial weights). Has same behavior as "ECHELLE=0" in Calmar.

description

If TRUE, output stats about the calibration process as well as the graph of the density of the ratio calibrated weights / initial weights

maxIter

The maximum number of iterations before stopping

check

performs a few check about the dataframe. TRUE by default

calibTolerance

Tolerance for the distance to an exact solution. Could be useful when there is a huge number of margins as the risk of inadvertently setting incompatible constraints is higher. Set to 1e-06 by default.

uCostPenalized

Unary cost by which every cost is "costs" column is multiplied

lambda

The initial ridge lambda used in penalized calibration. By default, the initial lambda is automatically chosen by the algorithm, but you can speed up the search for the optimum if you already know a lambda close to the lambda_opt corresponding to the gap you set. Be careful, the search zone is reduced when a lambda is set by the user, so the program may not converge if the lambda set is too far from the lambda_opt.

precisionBounds

Only used for calibration on minimum bounds. Desired precision for lower and upper reweighting factor, both bounds being as close to 1 as possible

forceSimplex

Only used for calibration on tight bounds.Bisection algorithm is used for matrices whose size exceed 1e8. forceSimplex = TRUE forces the use of the simplex algorithm whatever the size of the problem (you might want to set this parameter to TRUE if you have a large memory size)

forceBisection

Only used for calibration on tight bounds. Forces the use of the bisection algorithm to solve calibration on tight bounds

colCalibratedWeights

Deprecated. Only used in the scope of calibration function

exportDistributionImage

File name to which the density plot shown when description is TRUE is exported. Requires package "ggplot2"

exportDistributionTable

File name to which the distribution table of before/after weights shown when description is TRUE is exported. Requires package "xtable"

Value

column containing the final calibrated weights

References

Deville, Jean-Claude, and Carl-Erik Sarndal. "Calibration estimators in survey sampling." Journal of the American statistical Association 87.418 (1992): 376-382.

Bocci, J., and C. Beaumont. "Another look at ridge calibration." Metron 66.1 (2008): 5-20.

Vanderhoeft, Camille. Generalised calibration at statistics Belgium: SPSS Module G-CALIB-S and current practices. Inst. National de Statistique, 2001.

Le Guennec, Josiane, and Olivier Sautory. "Calmar 2: Une nouvelle version de la macro calmar de redressement d'echantillon par calage." Journees de Methodologie Statistique, Paris. INSEE (2002).

Examples

N <- 300 ## population total
## Horvitz Thompson estimator of the mean: 1.666667
weightedMean(data_employees$movies, data_employees$weight, N) 
## Enter calibration margins:
mar1 <- c("category",3,80,90,60)
mar2 <- c("sex",2,140,90,0)
mar3 <- c("department",2,100,130,0)
mar4 <- c("salary", 0, 470000,0,0)
margins <- rbind(mar1, mar2, mar3, mar4)
## Compute calibrated weights with raking ratio method
wCal <- calibration(data=data_employees, marginMatrix=margins, colWeights="weight"
                            , method="raking", description=FALSE)
## Calibrated estimate: 2.471917
weightedMean(data_employees$movies, wCal, N)


icarus documentation built on May 31, 2023, 9:43 p.m.