kott.quantile: Estimation of quantiles
In DiegoZardetto/EVER: Estimation of Variance by Efficient Replication

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

Calculates estimates, standard errors and confidence intervals for quantiles in subpopulations.

1
2
3

kott.quantile(deskott, y, probs = c(0.25,0.50,0.75), by = NULL,
              vartype = c("se", "cv", "cvpct", "var"), 
              conf.int = FALSE, conf.lev = 0.95)

`deskott`	Object of class `kott.design` containing the replicated survey data.
`y`	Formula defining the variable of interest.
`probs`	Vector of probability values to be used to calculate the quantiles estimates. The default value selects the quartiles estimates.
`by`	Formula specifying the variables that define the "estimation domains". If `NULL` (the default option) estimates refer to the whole population.
`vartype`	`character` vector specifying the desired variability estimators. It is possible to choose one or more of: standard error (the default), coefficient of variation, percent coefficient of variation, or variance.
`conf.int`	Boolean (`logical`) value to request confidence intervals for the estimates: the default is `FALSE`.
`conf.lev`	Probability specifying the desired confidence level: the default value is `0.95`.

This function calculates weighted estimates for the quantiles of a quantitative variable using suitable weights depending on the class of deskott: calibrated weights for class kott.cal.design and direct weights otherwise. Standard errors are calculated using the extended DAGJK method [Kott 99-01].

The mandatory argument y identifies the variable of interest, that is the variable for which quantiles estimates are to be calculated. The deskott variable referenced by y must be numeric and must not contain any missing value (NA).

The optional argument probs specifies the probability values (0<=probs[i]<=1) for which quantiles estimates must be calculated; the default option selects quartiles estimates. If probs[i] is equal to 0 (1) the corresponding "estimate" produced by kott.quantile coincides with the smallest (largest) observed value for the y variable.

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by kottby refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. The deskott variables referenced by by (if any) must be factor and must not contain any missing value (NA).

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95. Given an input kott.design object with nrg random groups, kott.quantile builds the confidence intervals making use of a t distribution with nrg-1 degrees of freedom.

The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list (typically a list made up of data frames).

It may happen that, in certain subpopulations, some of the nrg replicate weights turn out to be all zero: for these replicates it is not possible to provide quantiles estimates. In these cases, kott.quantile (i) returns NaN for the corresponding standard errors and (ii) prints a warning message.

Let \hat{F}_y be the estimate of the cumulative distribution of the y variabile. If an observed value y^* exists such that \hat{F}_y(y^*)=probs[i] than the i-th quantile estimate provided by kott.quantile equals y^*. If this is not the case, the kott.quantile function (i) finds the two observed values y^- and y^+ (y^- < y^+) such that the corresponding values \hat{F}_y(y^-) and \hat{F}_y(y^+) are the closest to probs[i], (ii) linearly interpolates \hat{F}_y between \hat{F}_y(y^-) and \hat{F}_y(y^+) and (iii) estimates the i-th quantile by inverting the linear approximation in the point probs[i].

The rigorous results of [kott 99-01] show that the DAGJK variance estimator for a given estimator \hat{θ} is correct provided that PSUs are sampled with replacement and that \hat{θ} is a smooth function of total estimators. As a result, it is not possible to guarantee that the DAGJK quantile variance estimator provided by kott.quantile is not biased.

Diego Zardetto

Kott, Phillip S. (1999) "The Extended Delete-A-Group Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.

Kott, Phillip S. (2001) "The Delete-A-Group Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521-526.

kottby for estimating totals and means, kott.ratio for estimating ratios between totals, kott.regcoef for estimating regression coefficients and kottby.user for calculating estimates based on user-defined estimators.

data(data.examples)

# Creation of a kott.design object:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
      weights=~weight,nrg=15)

# Estimate of the deciles of the income variable for
# the whole population:
kott.quantile(kdes,~income,probs=seq(0.1,0.9,0.1))

# Estimate of the median of income by age5c:
kott.quantile(kdes,~income,probs=0.5,by=~age5c,conf.int=TRUE)

# "Estimate" of the minimum and maximum of income by sex
# (notice the value of SE): 
kott.quantile(kdes,~income,probs=c(0,1),by=~sex)