RRTCS-package: Randomized Response Techniques for Complex Surveys

Description Author(s) References

Description

The aim of this package is to calculate point and interval estimation for linear parameters with data obtained from randomized response surveys. Twenty one RR methods are implemented for complex surveys:

- Randomized response procedures to estimate parameters of a qualitative stigmatizing characteristic: Christofides model, Devore model, Forced-Response model, Horvitz model, Horvitz model with unknown B, Kuk model, Mangat model, Mangat model with unknown B, Mangat-Singh model, Mangat-Singh-Singh model, Mangat-Singh-Singh model with unknown B, Singh-Joarder model, SoberanisCruz model and Warner model.

- Randomized response procedures to estimate parameters of a quantitative stigmatizing characteristic: BarLev model, Chaudhuri-Christofides model, Diana-Perri-1 model, Diana-Perri-2 model, Eichhorn-Hayre model, Eriksson model and Saha model.

Using the usual notation in survey sampling, we consider a finite population U=\{1,…,i,…,N\}, consisting of N different elements. Let y_i be the value of the sensitive aspect under study for the ith population element. Our aim is to estimate the finite population total Y=∑_{i=1}^N y_i of the variable of interest y or the population mean \bar{Y}=\frac{1}{N}∑_{i=1}^N y_i. If we can estimate the proportion of the population presenting a certain stigmatized behaviour A, the variable y_i takes the value 1 if i\in G_A (the group with the stigmatized behaviour) and the value zero otherwise. Some qualitative models use an innocuous or related attribute B whose population proportion can be known or unknown.

Assume that a sample s is chosen according to a general design p with inclusion probabilities π_i=∑_{s\ni i}p(s),i\in U.

In order to include a wide variety of RR procedures, we consider the unified approach given by Arnab (1994). The interviews of individuals in the sample s are conducted in accordance with the RR model. For each i\in s the RR induces a random response z_i (denoted scrambled response) so that the revised randomized response r_i (Chaudhuri and Christofides, 2013) is an unbiased estimation of y_i. Then, an unbiased estimator for the population total of the sensitive characteristic y is given by

\widehat{Y}_R=∑_{i\in s}\frac{r_i}{π_i}

The variance of this estimator is given by:

V(\widehat{Y}_R)=∑_{i\in U}\frac{V_R(r_i)}{π_i}+V_{HT}(r)

where V_R(r_i) is the variance of r_i under the randomized device and V_{HT}(r) is the design-variance of the Horvitz Thompson estimator of r_i values.

This variance is estimated by:

\widehat{V}(\widehat{Y}_R)=∑_{i\in s}\frac{\widehat{V}_R(r_i)}{π_i}+\widehat{V}(r)

where \widehat{V}_R(r_i) varies with the RR device and the estimation of the design-variance, \widehat{V}(r), is obtained using Deville's method (Deville, 1993).

The confidence interval at (1-α) % level is given by

ci=≤ft(\widehat{Y}_R-z_{1-\frac{α}{2}}√{\widehat{V}(\widehat{Y}_R)},\widehat{Y}_R+z_{1-\frac{α}{2}}√{\widehat{V}(\widehat{Y}_R)}\right)

where z_{1-\frac{α}{2}} denotes the (1-α) % quantile of a standard normal distribution.

Similarly, an unbiased estimator for the population mean \bar{Y} is given by

\widehat{\bar{Y}}_R= \frac{1}{N}∑_{i\in s}\frac{r_i}{π_i}

and an unbiased estimator for its variance is calculated as:

\widehat{V}(\widehat{\bar{Y}}_R)=\frac{1}{N^2}≤ft(∑_{i\in s}\frac{\widehat{V}_R(r_i)}{π_i}+\widehat{V}(r)\right)

In cases where the population size N is unknown, we consider Hàjek-type estimators for the mean:

\widehat{\bar{Y}}_{RH}=\frac{∑_{i\in s}r_i}{∑_{i\in s}\frac{1}{π_i}}

and Taylor-series linearization variance estimation of the ratio (Wolter, 2007) is used.

In qualitative models, the values r_i and \widehat{V}_R(r_i) for i\in s are described in each model.

In some quantitative models, the values r_i and \widehat{V}_R(r_i) for i\in s are calculated in a general form (Arcos et al, 2015) as follows:

The randomized response given by the person i is

z_i=≤ft\{\begin{array}{lccc} y_i & \textrm{with probability } p_1\\ y_iS_1+S_2 & \textrm{with probability } p_2\\ S_3 & \textrm{with probability } p_3 \end{array} \right.

with p_1+p_2+p_3=1 and where S_1,S_2 and S_3 are scramble variables whose distributions are assumed to be known. We denote by μ_i and σ_i respectively the mean and standard deviation of the variable S_i,(i=1,2,3).

The transformed variable is

r_i=\frac{z_i-p_2μ_2-p_3μ_3}{p_1+p_2μ_1},

its variance is

V_R(r_i)=\frac{1}{(p_1+p_2μ_1)^2}(y_i^2A+y_iB+C)

where

A=p_1(1-p_1)+σ_1^2p_2+μ_1^2p_2-μ_1^2p_2^2-2p_1p_2μ_1

B=2p_2μ_1μ_2-2μ_1μ_2p_2^2-2p_1p_2μ_2-2μ_3p_1p_3-2μ_1μ_3p_2p_3

C=(σ_2^2+μ_2^2)p_2+(σ_3^2+μ_3^2)p_3-(μ_2p_2+μ_3p_3)^2

and the estimated variance is

\widehat{V}_R(r_i)=\frac{1}{(p_1+p_2μ_1)^2}(r_i^2A+r_iB+C).

Some of the quantitative techniques considered can be viewed as particular cases of the above described procedure. Other models are described in the respective function.

Alternatively, the variance can be estimated using certain resampling methods.

Author(s)

Beatriz Cobo Rodríguez, Department of Statistics and Operations Research. University of Granada beacr@ugr.es

María del Mar Rueda García, Department of Statistics and Operations Research. University of Granada mrueda@ugr.es

Antonio Arcos Cebrián, Department of Statistics and Operations Research. University of Granada arcos@ugr.es

Maintainer: Beatriz Cobo Rodríguez beacr@ugr.es

References

Arcos, A., Rueda, M., Singh, S. (2015). A generalized approach to randomised response for quantitative variables. Quality and Quantity 49, 1239-1256.

Arnab, R. (1994). Non-negative variance estimator in randomized response surveys. Comm. Stat. Theo. Math. 23, 1743-1752.

Chaudhuri, A., Christofides, T.C. (2013). Indirect Questioning in Sample Surveys Springer-Verlag Berlin Heidelberg.

Deville, J.C. (1993). Estimation de la variance pour les enquêtes en deux phases. Manuscript, INSEE, Paris.

Wolter, K.M. (2007). Introduction to Variance Estimation. 2nd Edition. Springer.


RRTCS documentation built on April 21, 2021, 9:06 a.m.