expgreg: Expected variance of the general regression estimator

Description Usage Arguments Details Value References See Also Examples

View source: R/expgreg.R

Description

Compute the expected design variance of the general regression estimator of the total of a study variable under different sampling designs.

Usage

1
2
expgreg(x, b11, b12, b21, b22, d12, Rfy, n, design = NULL, 
        stratum = NULL, x_des = NULL, inc.p = NULL, ...)

Arguments

x

design matrix with the variables to be used into the GREG estimator.

b11

a numeric vector of length equal to the number of variables in x giving the coefficients of the trend term in the true superpopulation model (see ‘Details’).

b12

a numeric vector of length equal to the number of variables in x giving the exponents of the trend term in the true superpopulation model (see ‘Details’).

b21

a numeric vector of length equal to the number of variables in x giving the coefficients of the spread term in the true superpopulation model (see ‘Details’).

b22

a numeric vector of length equal to the number of variables in x giving the exponents of the spread term in the true superpopulation model (see ‘Details’).

d12

a numeric vector of length equal to the number of variables in x giving the exponents of the trend term in the assumed superpopulation model (see ‘Details’).

Rfy

a number giving the square root of the coefficient of determination between the auxiliary variables and the study varible.

n

either a positive number indicating the (expected) sample size (when design is one of 'srs', 'poi', 'pips' or NULL) or a numeric vector indicating the sample size of the strata to which each element belongs (when design is 'stsi') (see ‘Examples’).

design

a character string giving the sampling design. It must be one of 'srs' (simple random sampling without replacement), 'poi' (Poisson sampling), 'stsi' (stratified simple random sampling), 'pips' (Pareto πps sampling) or NULL (see ‘Details’).

stratum

a vector indicating the stratum to which every unit belongs. Only used if design is 'stsi'.

x_des

a positive numeric vector giving the values of the auxiliary variable that is used for defining the inclusion probabilities. Only used if design is 'poi' or 'pips'.

inc.p

a matrix giving the first and second order inclusion probabilities. Only used if design is NULL.

...

other arguments passed to lm (see ‘Details’).

Details

The expected variance of the general regression estimator under different sampling designs is computed.

It is assumed that the underlying superpopulation model is of the form

Y_k = f(x_k|δ_1) + ε_k

with Eε_k = 0, Vε_k = σ_0^2 g(x_k|δ_2)^2 and Cov(ε_k , ε_l) = 0.

But the true generating model is in fact of the form

Y_k = f(x_k|β_1) + ε_k

with Eε_k = 0, Vε_k = σ^2 g(x_k|β_2)^2 and Cov(ε_k , ε_l) = 0.

Where

f(x_k|δ_1) = Σ_[j=1]^J δ_[1,j] x_[j,k]^δ_[1,J+j],

f(g_k|δ_2) = Σ_[j=1]^J δ_[2,j] x_[j,k]^δ_[2,J+j],

f(x_k|β_1) = Σ_[j=1]^J β_[1,j] x_[j,k]^β_[1,J+j],

f(g_k|β_2) = Σ_[j=1]^J β_[2,j] x_[j,k]^β_[2,J+j].

The expected variance of the GREG estimator is approximated by

E(V(t_hat)) = V(t*_hat) + σ*^2 Σ_[k=1]^N (1/π_k - 1)g(x_k|β_2)^2

where

V(t*_hat) = Σ_[k=1]^N Σ_[l=1]^N π_kl (z_k*z_l)/(π_k*π_l) - (Σ_[k=1]^N z_k)^2

and

σ*^2 = S^2_f/(g^2)_bar*(1/R_fy^2 - 1),

z_k = (x_k^β - x_k^δ*A)*β**_1,

S^2_f = Σ_[k=1]^N (f(x_k|β_1) - f_bar)^2 / N,

(g^2)_bar = Σ_[k=1]^N g(x_k|β_2)^2 / N,

x_k^β = (x_[1k]^(β_[1,J+1]),…,x_[Jk]^(β_[1,2J])),

x_k^δ = (x_[1k]^(δ_[1,J+1]),…,x_[Jk]^(δ_[1,2J])),

β**_1 = (β_[1,1],…,β_[1,J])',

A = (Σ_[k=1]^N w_k*x_k^δ'*x_k^δ)^-1 Σ_[k=1]^N w_k*x_k^δ'*x_k^β.

N is the population size and π_k and π_kl are, respectively, the first and second order inclusion probabilities. w_k is a weight associated to each element and it represents the inverse of the conditional variance (up to a scalar) of the underlying superpopulation model (see ‘Examples’).

If design=NULL, the matrix of inclusion probabilities is obtained proportional to the matrix p.inc. If design is other than NULL, the formula for the variance is simplified in such a way that the inclusion probabilities matrix is no longer necessary. In particular:

Value

A numeric value giving the expected variance of the general regression estimator for the desired design under the working and true models.

References

Bueno, E. (2018). A Comparison of Stratified Simple Random Sampling and Probability Proportional-to-size Sampling. Research Report, Department of Statistics, Stockholm University 2018:6. http://gauss.stat.su.se/rr/RR2018_6.pdf.

See Also

expvar for the simultaneous calculation of the expected variance of five sampling strategies under a superpopulation model; vargreg for the variance of the GREG estimator; desvar for the simultaneous calculation of the variance of six sampling strategies; optimApp for an interactive application of expgreg.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
x1<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x2<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x3<- 1 + sort( rgamma(5000, shape=4/9, scale=108) )
x<- cbind(x1,x2,x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x3)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=150,"pips",x_des=x2,weights=1/x1)

st1<- optiallo(n=150,x=x3,H=6)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,1,0),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum)
expgreg(x,b11=c(1,-1,0),b12=c(1,1,0),b21=c(0,0,1),b22=c(0,0,0.5),
        d12=c(1,0,1),Rfy=0.8,n=st1$nh,"stsi",stratum=st1$stratum,weights=1/x1)

optimStrat documentation built on Nov. 11, 2020, 5:07 p.m.