htestimate: Horvitz-Thompson Estimator

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/htestimate.R

Description

Calculates Horvitz-Thompson estimate with different methods for variance estimation such as Yates and Grundy, Hansen-Hurwitz and Hajek.

Usage

1
htestimate(y, N, PI, pk, pik, method = 'yg')

Arguments

y

vector of observations

N

integer for population size

PI

square matrix of second order inclusion probabilities with n rows and cols. It is necessary to be specified for variance estimation by methods 'ht' and 'yg'.

pk

vector of first order inclusion probabilities of length n for the sample elements. It is necessary to be specified for variance estimation by methods 'hh' and 'ha'.

pik

an optional vector of first order inclusion probabilities of length N for the population elements . It can be used for variance estimation by method 'ha'.

method

method to be used for variance estimation. Options are 'yg' (Yates and Grundy) and 'ht' (Horvitz-Thompson), approximate options are 'hh' (Hansen-Hurwitz) and 'ha' (Hajek).

Details

For using methods 'yg' or 'ht' has to be provided matrix PI, and for 'hh' and 'ha' has to be specified vector pk of inclusion probabilities. Additionally, for Hajek method 'ha' can be specified pik. Unless, an approximate Hajek method is used.

Value

The function htestimate returns a value, which is a list consisting of the components

call

is a list of call components: y observations, N population size, PI inclusion probabilities, pk inclusion probabilities of sample, pik full inclusion probabilities and method method for variance estimation

mean

mean estimate

se

standard error of the mean estimate

Author(s)

Juliane Manitz

References

Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.

See Also

pps.sampling

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
data(influenza)
summary(influenza)

# pps.sampling()
set.seed(108506)
pps <- pps.sampling(z=influenza$population,n=20,method='midzuno')
sample <- influenza[pps$sample,]
# htestimate()
N <- nrow(influenza)
# exact variance estimate
PI <- pps$PI
htestimate(sample$cases, N=N, PI=PI, method='yg')
htestimate(sample$cases, N=N, PI=PI, method='ht')
# approximate variance estimate
pk <- pps$pik[pps$sample]
htestimate(sample$cases, N=N, pk=pk, method='hh')
pik <- pps$pik
htestimate(sample$cases, N=N, pk=pk, pik=pik, method='ha')
# without pik just approximate calculation of Hajek method
htestimate(sample$cases, N=N, pk=pk, method='ha') 
# calculate confidence interval based on normal distribution for number of cases
est.ht <- htestimate(sample$cases, N=N, PI=PI, method='ht')
est.ht$mean*N  
lower <- est.ht$mean*N - qnorm(0.975)*N*est.ht$se
upper <- est.ht$mean*N + qnorm(0.975)*N*est.ht$se
c(lower,upper) 
# true number of influenza cases
sum(influenza$cases)

Example output

Loading required package: pps
Loading required package: sampling
Loading required package: survey
Loading required package: grid
Loading required package: Matrix
Loading required package: survival

Attaching package: 'survival'

The following objects are masked from 'package:sampling':

    cluster, strata


Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart

       id                        district     population          cases       
 Min.   : 1001   LK Aachen           :  1   Min.   :  34719   Min.   :  0.00  
 1st Qu.: 5877   LK Ahrweiler        :  1   1st Qu.: 104553   1st Qu.:  9.00  
 Median : 8331   LK Aichach-Friedberg:  1   Median : 145130   Median : 27.00  
 Mean   : 8468   LK Alb-Donau-Kreis  :  1   Mean   : 193910   Mean   : 44.58  
 3rd Qu.: 9778   LK Altenburger Land :  1   3rd Qu.: 244154   3rd Qu.: 59.00  
 Max.   :16077   LK Altenkirchen     :  1   Max.   :1770629   Max.   :410.00  
                 (Other)             :418                                     

htestimate object: Estimator for samples with probabilities proportional to size
Method of Yates and Grundy:

Mean estimator: 40.36766
Standard Error: 8.059507


htestimate object: Estimator for samples with probabilities proportional to size
Method of Horvitz-Thompson:

Mean estimator: 40.36766
Standard Error: 8.227719


htestimate object: Estimator for samples with probabilities proportional to size
Method of Hansen-Hurwitz (approximate variance):

Mean estimator: 40.36766
Standard Error: 8.534792


htestimate object: Estimator for samples with probabilities proportional to size
Method of Hajek (approximate variance):

Mean estimator: 40.36766
Standard Error: 8.262482


htestimate object: Estimator for samples with probabilities proportional to size
Method of Hajek (approximate variance):

Mean estimator: 40.36766
Standard Error: 8.244296

Warning message:
In htestimate(sample$cases, N = N, pk = pk, method = "ha") :
  Without input of 'pik' just approximative calculation of Hajek method is possible.
[1] 17115.89
[1] 10278.45 23953.33
[1] 18900

samplingbook documentation built on April 3, 2021, 1:06 a.m.