Survey sample analysis.
Description
Specify a complex survey design.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14  svydesign(ids, probs=NULL, strata = NULL, variables = NULL, fpc=NULL,
data = NULL, nest = FALSE, check.strata = !nest, weights=NULL,pps=FALSE,...)
## Default S3 method:
svydesign(ids, probs=NULL, strata = NULL, variables = NULL,
fpc=NULL,data = NULL, nest = FALSE, check.strata = !nest, weights=NULL,
pps=FALSE,variance=c("HT","YG"),...)
## S3 method for class 'imputationList'
svydesign(ids, probs = NULL, strata = NULL, variables = NULL,
fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL, pps=FALSE,
...)
## S3 method for class 'character'
svydesign(ids, probs = NULL, strata = NULL, variables = NULL,
fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL, pps=FALSE,
dbtype = "SQLite", dbname, ...)

Arguments
ids 
Formula or data frame specifying cluster ids from largest
level to smallest level, 
probs 
Formula or data frame specifying cluster sampling probabilities 
strata 
Formula or vector specifying strata, use 
variables 
Formula or data frame specifying the variables
measured in the survey. If 
fpc 
Finite population correction: see Details below 
weights 
Formula or vector specifying sampling weights as an
alternative to 
data 
Data frame to look up variables in the formula
arguments, or database table name, or 
nest 
If 
check.strata 
If 
.
pps 

dbtype 
name of database driver to pass to 
dbname 
name of database (eg file name for SQLite) 
variance 
For 
... 
for future expansion 
Details
The svydesign
object combines a data frame and all the survey
design information needed to analyse it. These objects are used by
the survey modelling and summary functions. The
id
argument is always required, the strata
,
fpc
, weights
and probs
arguments are
optional. If these variables are specified they must not have any
missing values.
By default, svydesign
assumes that all PSUs, even those in
different strata, have a unique value of the id
variable. This allows some data errors to be detected. If your PSUs
reuse the same identifiers across strata then set nest=TRUE
.
The finite population correction (fpc) is used to reduce the variance when a substantial fraction of the total population of interest has been sampled. It may not be appropriate if the target of inference is the process generating the data rather than the statistics of a particular finite population.
The finite population correction can be specified either as the total population size in each stratum or as the fraction of the total population that has been sampled. In either case the relevant population size is the sampling units. That is, sampling 100 units from a population stratum of size 500 can be specified as 500 or as 100/500=0.2. The exception is for PPS sampling without replacement, where the sampling probability (which will be different for each PSU) must be used.
If population sizes are specified but not sampling probabilities or weights, the sampling probabilities will be computed from the population sizes assuming simple random sampling within strata.
For multistage sampling the id
argument should specify a
formula with the cluster identifiers at each stage. If subsequent
stages are stratified strata
should also be specified as a
formula with stratum identifiers at each stage. The population size
for each level of sampling should also be specified in fpc
.
If fpc
is not specified then sampling is assumed to be with
replacement at the top level and only the first stage of cluster is
used in computing variances. If fpc
is specified but for fewer
stages than id
, sampling is assumed to be complete for
subsequent stages. The variance calculations for
multistage sampling assume simple or stratified random sampling
within clusters at each stage except possibly the last.
For PPS sampling without replacement it is necessary to specify the
probabilities for each stage of sampling using the fpc
arguments, and an overall weight
argument should not be
given. At the moment, multistage or stratified PPS sampling without
replacement is supported only with pps="brewer"
, or by
giving the full joint probability matrix using
ppsmat
. [Cluster sampling is supported by all
methods, but not subsampling within clusters].
The dim
, "["
, "[<"
and na.action methods for
survey.design
objects operate on the dataframe specified by
variables
and ensure that the design information is properly
updated to correspond to the new data frame. With the "[<"
method the new value can be a survey.design
object instead of a
data frame, but only the data frame is used. See also
subset.survey.design
for a simple way to select
subpopulations.
The model.frame
method extracts the observed data.
If the strata with only one PSU are not selfrepresenting (or they are,
but svydesign
cannot tell based on fpc
) then the handling
of these strata for variance computation is determined by
options("survey.lonely.psu")
. See svyCprod
for
details.
data
may be a character string giving the name of a table or view
in a relational database that can be accessed through the DBI
or ODBC
interfaces. For DBI interfaces dbtype
should be the name of the database
driver and dbname
should be the name by which the driver identifies
the specific database (eg file name for SQLite). For ODBC databases
dbtype
should be "ODBC"
and dbname
should be the
registed DSN for the database. On the Windows GUI, dbname=""
will
produce a dialog box for interactive selection.
The appropriate database interface package must already be loaded (eg
RSQLite
for SQLite, RODBC
for ODBC). The survey design
object will contain only the design metadata, and actual variables will
be loaded from the database as needed. Use
close
to close the database connection and
open
to reopen the connection, eg, after
loading a saved object.
The database interface does not attempt to modify the underlying database and so can be used with readonly permissions on the database.
If data
is an imputationList
object (from the "mitools"
package), svydesign
will return a svyimputationList
object
containing a set of designs. Use with.svyimputationList
to
do analyses on these designs and MIcombine
to combine the results.
Value
An object of class survey.design
.
Author(s)
Thomas Lumley
See Also
as.svrepdesign
for converting to replicate weight designs,
subset.survey.design
for domain estimates,
update.survey.design
to add variables.
mitools
package for using multiple imputations
svyrecvar
and svyCprod
for details of
variance estimation
election
for examples of PPS sampling without replacement.
http://faculty.washington.edu/tlumley/survey/ for examples of databasebacked objects.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  data(api)
# stratified sample
dstrat<svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
# onestage cluster sample
dclus1<svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
# twostage cluster sample: weights computed from population sizes.
dclus2<svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
## multistage sampling has no effect when fpc is not given, so
## these are equivalent.
dclus2wr<svydesign(id=~dnum+snum, weights=weights(dclus2), data=apiclus2)
dclus2wr2<svydesign(id=~dnum, weights=weights(dclus2), data=apiclus2)
## syntax for stratified cluster sample
##(though the data weren't really sampled this way)
svydesign(id=~dnum, strata=~stype, weights=~pw, data=apistrat,
nest=TRUE)
## PPS sampling without replacement
data(election)
dpps< svydesign(id=~1, fpc=~p, data=election_pps, pps="brewer")
##database example: requires RSQLite
## Not run:
library(RSQLite)
dbclus1<svydesign(id=~dnum, weights=~pw, fpc=~fpc,
data="apiclus1",dbtype="SQLite", dbname=system.file("api.db",package="survey"))
## End(Not run)
