define_statistic_wrapper: Define a statistic wrapper
In martinchevalier/gustave: A User-Oriented Statistical Toolkit for Analytical Variance Estimation

View source: R/define_statistic_wrapper.R

define_statistic_wrapper

R Documentation

Define a statistic wrapper

Description

define_statistic_wrapper defines statistic wrappers to be used together with variance estimation wrappers. A statistic wrapper produces both the point estimator and the linearized variable associated with a given statistic to estimate variance on (Deville, 1999). define_statistic_wrapper is intended for advanced use only, standard statistic wrappers are included in the gustave package (see standard statistic wrappers)

Usage

define_statistic_wrapper(
  statistic_function,
  arg_type,
  arg_not_affected_by_domain = NULL,
  display_function = standard_display
)

Arguments

`statistic_function`	An R function specific to the statistic to calculate. It should produce at least the point estimator and the linearized variable associated with the statistic (see Details).
`arg_type`	A named list with three character vectors describing the type of each argument of `statistic_function` (see Details).
`arg_not_affected_by_domain`	A character vector indicating the arguments which should not be affected by domain-splitting. Such parameters may appear in some complex linearization formula, for instance when the At-Risk of Poverty Rate (ARPR) is estimated by region but with a poverty line calculated at the national level.
`display_function`	An R function which produces, for each variance estimation, the data.frame to be displayed by the variance estimation wrapper. The default display function (`standard_display`) uses standard metadata to display usual variance indicator (point estimate, variance, standard deviation, coefficient of variation, confidence interval) broken down by statistic wrapper, domain (if any) and level (if the variable is a factor).

Details

When the statistic to estimate is not a total, the application of analytical variance estimation formulae developed for the estimator of a total is not straightforward (Deville, 1999). An asymptotically unbiased variance estimator can nonetheless be obtained if the estimation of variance is performed on a variable obtained from the original data through a linearization step.

define_statistic_wrapper is the function used to create, for a given statistic, an easy-to-use function which calculates both the point estimator and the linearized variable associated with the statistic. These operations are implemented by the statistic_function, which can have any needed input (for example num and denom for a ratio estimator) and should output a list with at least two named elements:

point: the point estimator of the statistic
lin: the linearized variable to be passed on to the variance estimation formula. If several variables are to be associated with the statistics, lin can be a list itself.

All other named elements in the output of define_statistic_wrapper are treated as metadata (that may be used later on by display_function).

arg_type is a named list of three elements that describes the nature of the argument of statistic_function:

data: data argument(s), numerical vector(s) to be used to calculate the point estimate and the linearized variable associated with the statistic
weight: weight argument, numerical vector to be used as row weights
param: parameters, non-data arguments to be used to control some aspect of the computation

Statistic wrappers are quite flexible tools to apply a variance function to an estimator requiring a linearization step (e.g. all estimators except the estimator of a total) with virtually no additional complexity for the end-user.

standard statistic wrappers are included within the gustave package and automatically added to the variance estimation wrappers. New statistic wrappers can be defined using the define_statistic_wrapper and then explicitly added to the variance estimation wrappers using the objects_to_include argument.

Note: To some extent, statistic wrappers can be seen as ggplot2 geom_ and stat_ functions: they help the end-user in writing down what he or she wants without having to go too deep into the details of the corresponding layers.

Value

A function to be used within a variance estimation wrapper to estimate a specific statistic (see examples). Its formals are the ones of statistic_function with the addition of by and where (for domain estimation, set to NULL by default).

Author(s)

Martin Chevalier

References

Deville J.-C. (1999), "Variance estimation for complex statistics and estimators: linearization and residual techniques", Survey Methodology, 25:193–203

Examples

### Example from the Information and communication technologies (ICT) survey

# Let's define a variance wrapper asfor the ICT survey 
# as in the examples of the qvar function: 
precision_ict <- qvar(
  data = ict_sample,
  dissemination_dummy = "dissemination",
  dissemination_weight = "w_calib",
  id = "firm_id",
  scope_dummy = "scope",
  sampling_weight = "w_sample", 
  strata = "strata",
  nrc_weight = "w_nrc", 
  response_dummy = "resp", 
  hrg = "hrg",
  calibration_weight = "w_calib",
  calibration_var = c(paste0("N_", 58:63), paste0("turnover_", 58:63)),
  define = TRUE
)
precision_ict(ict_survey, mean(speed_quanti))

# Let's now redefine the mean statistic wrapper
mean2 <- define_statistic_wrapper(
  statistic_function = function(y, weight){
    point <- sum(y * weight) / sum(weight)
    lin <- (y - point) / sum(weight)
    list(point = point, lin = lin, metadata = list(n = length(y)))
  },
  arg_type = list(data = "y", weight = "weight")
)

# mean2 can now be used inside precision_ict (and yields
# the same results as the mean statistic wrapper)
precision_ict(ict_survey, mean(speed_quanti), mean2(speed_quanti))

martinchevalier/gustave documentation built on Jan. 15, 2024, 11:56 p.m.