svyby: Survey statistics on subsets

View source: R/surveyby.R

svybyR Documentation

Survey statistics on subsets

Description

Compute survey statistics on subsets of a survey defined by factors.

Usage

svyby(formula, by ,design,...)
## Default S3 method:
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
## S3 method for class 'survey.design2'
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, influence=covmat, 
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
 multicore=getOption("survey.multicore"))

## S3 method for class 'svyby'
SE(object,...)
## S3 method for class 'svyby'
deff(object,...)
## S3 method for class 'svyby'
coef(object,...)
## S3 method for class 'svyby'
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula,  bys,  design, FUN, ...)

Arguments

formula,x

A formula specifying the variables to pass to FUN (or a matrix, data frame, or vector)

by

A formula specifying factors that define subsets, or a list of factors.

design

A svydesign or svrepdesign object

FUN

A function taking a formula and survey design object as its first two arguments.

...

Other arguments to FUN. NOTE: if any of the names of these are partial matches to formula,by, or design, you must specify the formula,by, or design argument by name, not just by position.

deff

Request a design effect from FUN

keep.var

If FUN returns a svystat object, extract standard errors from it

keep.names

Define row names based on the subsets

verbose

If TRUE, print a label for each subset as it is processed.

vartype

Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance

drop.empty.groups

If FALSE, report NA for empty groups, if TRUE drop them from the output

na.rm.by

If true, omit groups defined by NA values of the by variables

.

na.rm.all

If true, check for groups with no non-missing observations for variables defined by formula and treat these groups as empty. Doesn't make much sense without na.rm=TRUE

covmat

If TRUE, compute covariances between estimates for different subsets. Allows svycontrast to be used on output. Requires that FUN supports either return.replicates=TRUE or influence=TRUE

return.replicates

Only for replicate-weight designs. If TRUE, return all the replicates as the "replicates" attribute of the result

influence

Return the influence functions of the result

multicore

Use multicore package to distribute subsets over multiple processors?

stringsAsFactors

Convert any string variables in formula to factors before calling FUN, so that the factor levels will be the same in all groups (See Note below). Potentially slow.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

df

degrees of freedom for t-distribution in confidence interval, use degf(design) for number of PSUs minus number of strata

object

An object of class "svyby"

bys

one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

Details

The variance type "ci" asks for confidence intervals, which are produced by confint. In some cases additional options to FUN will be needed to produce confidence intervals, for example, svyquantile needs ci=TRUE or keep.var=FALSE.

unwtd.count is designed to be passed to svyby to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with multicore=TRUE is useful only for fairly large problems and on computers with sufficient memory. The multicore package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

The variant svybys creates a separate table for each term in bys rather than creating a joint table.

Value

An object of class "svyby": a data frame showing the factors and the results of FUN.

For unwtd.count, the unweighted number of non-missing observations in the data matrix specified by x for the design.

Note

The function works by making a lot of calls of the form FUN(formula, subset(design, by==i)), where formula is re-evaluated in each subset, so it is unwise to use data-dependent terms in formula. In particular, svyby(~factor(a), ~b, design=d, svymean), will create factor variables whose levels are only those values of a present in each subset. If a is a character variable then svyby(~a, ~b, design=d, svymean) creates factor variables implicitly and so has the same problem. Either use update.survey.design to add variables to the design object instead or specify the levels explicitly in the call to factor. The stringsAsFactors=TRUE option converts all character variables to factors, which can be slow, set it to FALSE if you have predefined factors where necessary.

Note

Asking for a design effect (deff=TRUE) from a function that does not produce one will cause an error or incorrect formatting of the output. The same will occur with keep.var=TRUE if the function does not compute a standard error.

See Also

svytable and ftable.svystat for contingency tables, ftable.svyby for pretty-printing of svyby

Examples

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
svyby(~api99, list(school.type=apiclus1$stype), dclus1, svymean)
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
svyby(~api99, list(school.type=apiclus1$stype), rclus1, svymean, vartype="cv")
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
svycontrast(mns, c(E = 1, M = -1))

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
svycontrast(tots, quote(E/H))


## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))


## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE, 
  vartype=c("se","cv","cvpct","var")))
deff(a)
SE(a)
cv(a)
coef(a)
confint(a, df=degf(rclus1))

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)

## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)




survey documentation built on May 3, 2023, 9:12 a.m.