postStratify: Post-stratify a survey

Description Usage Arguments Details Value Note References See Also Examples

View source: R/surveyrep.R

Description

Post-stratification adjusts the sampling and replicate weights so that the joint distribution of a set of post-stratifying variables matches the known population joint distribution. Use rake when the full joint distribution is not available.

Usage

1
2
3
4
5
postStratify(design, strata, population, partial = FALSE, ...)
## S3 method for class 'svyrep.design'
postStratify(design, strata, population, partial = FALSE, compress=NULL,...)
## S3 method for class 'survey.design'
postStratify(design, strata, population, partial = FALSE, ...)

Arguments

design

A survey design with replicate weights

strata

A formula or data frame of post-stratifying variables, which must not contain missing values.

population

A table, xtabs or data.frame with population frequencies

partial

if TRUE, ignore population strata not present in the sample

compress

Attempt to compress the replicate weight matrix? When NULL will attempt to compress if the original weight matrix was compressed

...

arguments for future expansion

Details

The population totals can be specified as a table with the strata variables in the margins, or as a data frame where one column lists frequencies and the other columns list the unique combinations of strata variables (the format produced by as.data.frame acting on a table object). A table must have named dimnames to indicate the variable names.

Compressing the replicate weights will take time and may even increase memory use if there is actually little redundancy in the weight matrix (in particular if the post-stratification variables have many values and cut across PSUs).

If a svydesign object is to be converted to a replication design the post-stratification should be performed after conversion.

The variance estimate for replication designs follows the same procedure as Valliant (1993) described for estimating totals. Rao et al (2002) describe this procedure for estimating functions (and also the GREG or g-calibration procedure, see calibrate)

Value

A new survey design object.

Note

If the sampling weights are already post-stratified there will be no change in point estimates after postStratify but the standard error estimates will decrease to correctly reflect the post-stratification.

References

Valliant R (1993) Post-stratification and conditional variance estimation. JASA 88: 89-96

Rao JNK, Yung W, Hidiroglou MA (2002) Estimating equations for the analysis of survey data using poststratification information. Sankhya 64 Series A Part 2, 364-378.

See Also

rake, calibrate for other things to do with auxiliary information

compressWeights for information on compressing weights

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
rclus1<-as.svrepdesign(dclus1)

svymean(~api00, rclus1)
svytotal(~enroll, rclus1)

# post-stratify on school type
pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018))
#or: pop.types <- xtabs(~stype, data=apipop)
#or: pop.types <- table(stype=apipop$stype)

rclus1p<-postStratify(rclus1, ~stype, pop.types)
summary(rclus1p)
svymean(~api00, rclus1p)
svytotal(~enroll, rclus1p)


## and for svydesign objects
dclus1p<-postStratify(dclus1, ~stype, pop.types)
summary(dclus1p)
svymean(~api00, dclus1p)
svytotal(~enroll, dclus1p)

Example output

Loading required package: grid
Loading required package: Matrix
Loading required package: survival

Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart

        mean     SE
api00 644.17 26.329
         total     SE
enroll 3404940 932235
Call: postStratify(rclus1, ~stype, pop.types)
Unstratified cluster jacknife (JK1) with 15 replicates.
Variables: 
 [1] "cds"      "stype"    "name"     "sname"    "snum"     "dname"   
 [7] "dnum"     "cname"    "cnum"     "flag"     "pcttest"  "api00"   
[13] "api99"    "target"   "growth"   "sch.wide" "comp.imp" "both"    
[19] "awards"   "meals"    "ell"      "yr.rnd"   "mobility" "acs.k3"  
[25] "acs.46"   "acs.core" "pct.resp" "not.hsg"  "hsg"      "some.col"
[31] "col.grad" "grad.sch" "avg.ed"   "full"     "emer"     "enroll"  
[37] "api.stu"  "fpc"      "pw"      
        mean     SE
api00 642.31 26.934
         total     SE
enroll 3680893 473431
1 - level Cluster Sampling design
With (15) clusters.
postStratify(dclus1, ~stype, pop.types)
Probabilities:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.01854 0.03257 0.03257 0.03040 0.03257 0.03257 
Population size (PSUs): 757 
Data variables:
 [1] "cds"      "stype"    "name"     "sname"    "snum"     "dname"   
 [7] "dnum"     "cname"    "cnum"     "flag"     "pcttest"  "api00"   
[13] "api99"    "target"   "growth"   "sch.wide" "comp.imp" "both"    
[19] "awards"   "meals"    "ell"      "yr.rnd"   "mobility" "acs.k3"  
[25] "acs.46"   "acs.core" "pct.resp" "not.hsg"  "hsg"      "some.col"
[31] "col.grad" "grad.sch" "avg.ed"   "full"     "emer"     "enroll"  
[37] "api.stu"  "fpc"      "pw"      
        mean     SE
api00 642.31 23.921
         total     SE
enroll 3680893 406293

survey documentation built on July 19, 2021, 9:06 a.m.