# expgreg: Expected variance of the general regression estimator In optimStrat: Choosing the Sample Strategy

## Description

Compute the expected design variance of the general regression estimator of the total of a study variable under different sampling designs.

## Usage

 ```1 2``` ```expgreg(x, b11, b12, b21, b22, d12, Rfy, n, design = NULL, stratum = NULL, x_des = NULL, inc.p = NULL, ...) ```

## Arguments

 `x` design matrix with the variables to be used into the GREG estimator. `b11` a numeric vector of length equal to the number of variables in `x` giving the coefficients of the trend term in the true superpopulation model (see ‘Details’). `b12` a numeric vector of length equal to the number of variables in `x` giving the exponents of the trend term in the true superpopulation model (see ‘Details’). `b21` a numeric vector of length equal to the number of variables in `x` giving the coefficients of the spread term in the true superpopulation model (see ‘Details’). `b22` a numeric vector of length equal to the number of variables in `x` giving the exponents of the spread term in the true superpopulation model (see ‘Details’). `d12` a numeric vector of length equal to the number of variables in `x` giving the exponents of the trend term in the assumed superpopulation model (see ‘Details’). `Rfy` a number giving the square root of the coefficient of determination between the auxiliary variables and the study varible. `n` either a positive number indicating the (expected) sample size (when `design` is one of 'srs', 'poi', 'pips' or `NULL`) or a numeric vector indicating the sample size of the strata to which each element belongs (when `design` is 'stsi') (see ‘Examples’). `design` a character string giving the sampling design. It must be one of 'srs' (simple random sampling without replacement), 'poi' (Poisson sampling), 'stsi' (stratified simple random sampling), 'pips' (Pareto πps sampling) or `NULL` (see ‘Details’). `stratum` a vector indicating the stratum to which every unit belongs. Only used if `design` is 'stsi'. `x_des` a positive numeric vector giving the values of the auxiliary variable that is used for defining the inclusion probabilities. Only used if `design` is 'poi' or 'pips'. `inc.p` a matrix giving the first and second order inclusion probabilities. Only used if `design` is `NULL`. `...` other arguments passed to `lm` (see ‘Details’).

## Details

The expected variance of the general regression estimator under different sampling designs is computed.

It is assumed that the underlying superpopulation model is of the form

Y_k = f(x_k|δ_1) + ε_k

with Eε_k = 0, Vε_k = σ_0^2 g(x_k|δ_2)^2 and Cov(ε_k , ε_l) = 0.

But the true generating model is in fact of the form

Y_k = f(x_k|β_1) + ε_k

with Eε_k = 0, Vε_k = σ^2 g(x_k|β_2)^2 and Cov(ε_k , ε_l) = 0.

Where

f(x_k|δ_1) = Σ_[j=1]^J δ_[1,j] x_[j,k]^δ_[1,J+j],

f(g_k|δ_2) = Σ_[j=1]^J δ_[2,j] x_[j,k]^δ_[2,J+j],

f(x_k|β_1) = Σ_[j=1]^J β_[1,j] x_[j,k]^β_[1,J+j],

f(g_k|β_2) = Σ_[j=1]^J β_[2,j] x_[j,k]^β_[2,J+j].

• the coefficients β_[1,j] (j = 1,...,J) are given by `b11`;

• the exponents β_[1,j] (j=J+1,...,2J) are given by `b12`;

• the coefficients β_[2,j] (j = 1,...,J) are given by `b21`;

• the exponents β_[2,j] (j = J+1,...,2J) are given by `b22`;

• the exponents δ_[1,j] (j = J+1,...,2J) are given by `d12`.

The expected variance of the GREG estimator is approximated by

E(V(t_hat)) = V(t*_hat) + σ*^2 Σ_[k=1]^N (1/π_k - 1)g(x_k|β_2)^2

where

V(t*_hat) = Σ_[k=1]^N Σ_[l=1]^N π_kl (z_k*z_l)/(π_k*π_l) - (Σ_[k=1]^N z_k)^2

and

σ*^2 = S^2_f/(g^2)_bar*(1/R_fy^2 - 1),

z_k = (x_k^β - x_k^δ*A)*β**_1,

S^2_f = Σ_[k=1]^N (f(x_k|β_1) - f_bar)^2 / N,

(g^2)_bar = Σ_[k=1]^N g(x_k|β_2)^2 / N,

x_k^β = (x_[1k]^(β_[1,J+1]),…,x_[Jk]^(β_[1,2J])),

x_k^δ = (x_[1k]^(δ_[1,J+1]),…,x_[Jk]^(δ_[1,2J])),

β**_1 = (β_[1,1],…,β_[1,J])',

A = (Σ_[k=1]^N w_k*x_k^δ'*x_k^δ)^-1 Σ_[k=1]^N w_k*x_k^δ'*x_k^β.

N is the population size and π_k and π_kl are, respectively, the first and second order inclusion probabilities. w_k is a weight associated to each element and it represents the inverse of the conditional variance (up to a scalar) of the underlying superpopulation model (see ‘Examples’).

If `design=NULL`, the matrix of inclusion probabilities is obtained proportional to the matrix `p.inc`. If `design` is other than `NULL`, the formula for the variance is simplified in such a way that the inclusion probabilities matrix is no longer necessary. In particular:

• if `design='srs'`, only the sample size `n` is required;

• if `design='stsi'`, both the stratum ID `stratum` and the sample size per stratum `n`, are required;

• if `design` is either `'pips'` or `'poi'`, the inclusion probabilities are obtained proportional to the values of `x_des`, corrected if necessary.

## Value

A numeric value giving the expected variance of the general regression estimator for the desired design under the working and true models.

## References

Bueno, E. (2018). A Comparison of Stratified Simple Random Sampling and Probability Proportional-to-size Sampling. Research Report, Department of Statistics, Stockholm University 2018:6. http://gauss.stat.su.se/rr/RR2018_6.pdf.

`expvar` for the simultaneous calculation of the expected variance of five sampling strategies under a superpopulation model; `vargreg` for the variance of the GREG estimator; `desvar` for the simultaneous calculation of the variance of six sampling strategies; `optimApp` for an interactive application of `expgreg`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```x1<- 1 + sort( rgamma(5000, shape=4/9, scale=108) ) x2<- 1 + sort( rgamma(5000, shape=4/9, scale=108) ) x3<- 1 + sort( rgamma(5000, shape=4/9, scale=108) ) x<- cbind(x1,x2,x3) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x3) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2,weights=1/x1) st1<- optiallo(n=150,x=x3,H=6) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,1,0),Rfy=0.8,n=st1\$nh,"stsi",stratum=st1\$stratum) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,0,1),Rfy=0.8,n=st1\$nh,"stsi",stratum=st1\$stratum) expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5), d12=c(1,0,1),Rfy=0.8,n=st1\$nh,"stsi",stratum=st1\$stratum,weights=1/x1) ```