expgreg: Expected variance of the general regression estimator

View source: R/expgreg.R

expgregR Documentation

Expected variance of the general regression estimator

Description

Compute the expected design variance of the general regression estimator of the total of a study variable under different sampling designs.

Usage

expgreg(x, b11, b12, b21, b22, d12, Rfy, n, design = NULL, 
        stratum = NULL, x_des = NULL, inc.p = NULL, ...)

Arguments

x

design matrix with the variables to be used into the GREG estimator.

b11

a numeric vector of length equal to the number of variables in x giving the coefficients of the trend term in the true superpopulation model (see ‘Details’).

b12

a numeric vector of length equal to the number of variables in x giving the exponents of the trend term in the true superpopulation model (see ‘Details’).

b21

a numeric vector of length equal to the number of variables in x giving the coefficients of the spread term in the true superpopulation model (see ‘Details’).

b22

a numeric vector of length equal to the number of variables in x giving the exponents of the spread term in the true superpopulation model (see ‘Details’).

d12

a numeric vector of length equal to the number of variables in x giving the exponents of the trend term in the assumed superpopulation model (see ‘Details’).

Rfy

a number giving the square root of the coefficient of determination between the auxiliary variables and the study varible.

n

either a positive number indicating the (expected) sample size (when design is one of 'srs', 'poi', 'pips' or NULL) or a numeric vector indicating the sample size of the strata to which each element belongs (when design is 'stsi') (see ‘Examples’).

design

a character string giving the sampling design. It must be one of 'srs' (simple random sampling without replacement), 'poi' (Poisson sampling), 'stsi' (stratified simple random sampling), 'pips' (Pareto \pips sampling) or NULL (see ‘Details’).

stratum

a vector indicating the stratum to which every unit belongs. Only used if design is 'stsi'.

x_des

a positive numeric vector giving the values of the auxiliary variable that is used for defining the inclusion probabilities. Only used if design is 'poi' or 'pips'.

inc.p

a matrix giving the first and second order inclusion probabilities. Only used if design is NULL.

...

other arguments passed to lm (see ‘Details’).

Details

The expected variance of the general regression estimator under different sampling designs is computed.

It is assumed that the underlying superpopulation model is of the form

Y_{k} = f(x_{k}|\delta_{1}) + \epsilon_{k}

with E\epsilon_{k}=0, V\epsilon_{k}= \sigma_{0}^{2}g^{2}(x_{k}|\delta_{2}) and Cov(\epsilon_{k},\epsilon_{l})=0.

But the true generating model is in fact of the form

Y_{k} = f(x_{k}|\beta_{1}) + \epsilon_{k}

with E\epsilon_{k}=0, V\epsilon_{k}= \sigma^{2}g^{2}(x_{k}|\beta_{2}) and Cov(\epsilon_{k},\epsilon_{l})=0.

Where

f(x_{k}|\delta_{1}) = \sum_{j=1}^{J}\delta_{1,j}x_{jk}^{\delta_{1,J+j}},

g(x_{k}|\delta_{2}) = \sum_{j=1}^{J}\delta_{2,j}x_{jk}^{\delta_{2,J+j}},

f(x_{k}|\beta_{1}) = \sum_{j=1}^{J}\beta_{1,j}x_{jk}^{\beta_{1,J+j}},

g(x_{k}|\beta_{2}) = \sum_{j=1}^{J}\beta_{2,j}x_{jk}^{\beta_{2,J+j}}.

  • the coefficients \beta_{1,j} (j=1,\cdots,J) are given by b11;

  • the exponents \beta_{1,j} (j=J+1,\cdots,2J) are given by b12;

  • the coefficients \beta_{2,j} (j=1,\cdots,J) are given by b21;

  • the exponents \beta_{2,j} (j=J+1,\cdots,2J) are given by b22;

  • the exponents \delta_{1,j} (j=J+1,\cdots,2J) are given by d12.

The expected variance of the GREG estimator is approximated by

E\left(V\left(\hat{t}\right)\right) = V\left(\hat{t}_{z}\right) + \hat{\sigma}^{2}\sum_{k=1}^{N}\left(\frac{1}{\pi_{k}}-1\right)g^{2}(x_{k}|\beta_{2})

where

V\left(\hat{t}_{z}\right) = \sum_{k=1}^{N}\sum_{l=1}^{N}\pi_{kl}\frac{z_{k}}{\pi_{k}}\frac{z_{l}}{\pi_{l}} - \left(\sum_{k=1}^{N}z_{k}\right)^{2}

and

\hat{\sigma}^{2} = \frac{S^{2}_{f}}{\bar{g^{2}}}\left(\frac{1}{R^{2}_{fy}}-1\right),

z_{k} = \left(x_{k}^{\beta}-x_{k}^{\delta}A\right)\beta_{1}^{**},

S^{2}_{f} = \sum_{k=1}^{N}(f(x_{k}|\beta_{1})-\bar{f})^{2}/N,

\bar{g^{2}} = \sum_{k=1}^{N}g(x_{k}|\beta_{2})^{2}/N,

x_{k}^{\beta} = \left(x_{1k}^{\beta_{1,J+1}},\cdots,x_{Jk}^{\beta_{1,2J}}\right),

x_{k}^{\delta} = \left(x_{1k}^{\delta_{1,J+1}},\cdots,x_{Jk}^{\delta_{1,2J}}\right),

\beta_{1}^{**} = (\beta_{1,1},\cdots,\beta_{1,J})',

A = \left(\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\delta}\right)^{-1}\sum_{k=1}^{N}w_{k}x_{k}^{\delta'}x_{k}^{\beta}.

N is the population size and \pi_{k} and \pi_{kl} are, respectively, the first and second order inclusion probabilities. w_{k} is a weight associated to each element and it represents the inverse of the conditional variance (up to a scalar) of the underlying superpopulation model (see ‘Examples’).

If design=NULL, the matrix of inclusion probabilities is obtained proportional to the matrix p.inc. If design is other than NULL, the formula for the variance is simplified in such a way that the inclusion probabilities matrix is no longer necessary. In particular:

  • if design='srs', only the sample size n is required;

  • if design='stsi', both the stratum ID stratum and the sample size per stratum n, are required;

  • if design is either 'pips' or 'poi', the inclusion probabilities are obtained proportional to the values of x_des, corrected if necessary.

Value

A numeric value giving the expected variance of the general regression estimator for the desired design under the working and true models.

References

Bueno, E. (2018). A Comparison of Stratified Simple Random Sampling and Probability Proportional-to-size Sampling. Research Report, Department of Statistics, Stockholm University 2018:6. http://gauss.stat.su.se/rr/RR2018_6.pdf.

See Also

expvar for the simultaneous calculation of the expected variance of five sampling strategies under a superpopulation model; vargreg for the variance of the GREG estimator; desvar for the simultaneous calculation of the variance of six sampling strategies; optimApp for an interactive application of expgreg.

Examples

x1<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x2<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x3<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x<- cbind(x1,x2,x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2,weights=1/x1)

st1<- optiallo(n=150,x=x3,H=6)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum,weights=1/x1)

optimStrat documentation built on Aug. 24, 2023, 9:09 a.m.