# svyby: Survey statistics on subsets In survey: Analysis of Complex Survey Samples

## Description

Compute survey statistics on subsets of a survey defined by factors.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23``` ```svyby(formula, by ,design,...) ## Default S3 method: svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE, keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"), drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE, na.rm.by=FALSE, na.rm.all=FALSE, multicore=getOption("survey.multicore")) ## S3 method for class 'survey.design2' svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE, keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"), drop.empty.groups=TRUE, covmat=FALSE, influence=covmat, na.rm.by=FALSE, na.rm.all=FALSE, multicore=getOption("survey.multicore")) ## S3 method for class 'svyby' SE(object,...) ## S3 method for class 'svyby' deff(object,...) ## S3 method for class 'svyby' coef(object,...) ## S3 method for class 'svyby' confint(object, parm, level = 0.95,df =Inf,...) unwtd.count(x, design, ...) svybys(formula, bys, design, FUN, ...) ```

## Arguments

 `formula,x` A formula specifying the variables to pass to `FUN` (or a matrix, data frame, or vector) `by` A formula specifying factors that define subsets, or a list of factors. `design` A `svydesign` or `svrepdesign` object `FUN` A function taking a formula and survey design object as its first two arguments. `...` Other arguments to `FUN`. NOTE: if any of the names of these are partial matches to `formula`,`by`, or `design`, you must specify the `formula`,`by`, or `design` argument by name, not just by position. `deff` Request a design effect from `FUN` `keep.var` If `FUN` returns a `svystat` object, extract standard errors from it `keep.names` Define row names based on the subsets `verbose` If `TRUE`, print a label for each subset as it is processed. `vartype` Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance `drop.empty.groups` If `FALSE`, report `NA` for empty groups, if `TRUE` drop them from the output `na.rm.by` If true, omit groups defined by `NA` values of the `by` variables

.

 `na.rm.all` If true, check for groups with no non-missing observations for variables defined by `formula` and treat these groups as empty `covmat` If `TRUE`, compute covariances between estimates for different subsets. Allows `svycontrast` to be used on output. Requires that `FUN` supports either `return.replicates=TRUE` or `influence=TRUE` `return.replicates` Only for replicate-weight designs. If `TRUE`, return all the replicates as the "replicates" attribute of the result `influence` Return the influence functions of the result `multicore` Use `multicore` package to distribute subsets over multiple processors? `parm` a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. `level` the confidence level required. `df` degrees of freedom for t-distribution in confidence interval, use `degf(design)` for number of PSUs minus number of strata `object` An object of class `"svyby"` `bys` one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

## Details

The variance type "ci" asks for confidence intervals, which are produced by `confint`. In some cases additional options to `FUN` will be needed to produce confidence intervals, for example, `svyquantile` needs `ci=TRUE` or `keep.var=FALSE`.

`unwtd.count` is designed to be passed to `svyby` to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with `multicore=TRUE` is useful only for fairly large problems and on computers with sufficient memory. The `multicore` package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

The variant `svybys` creates a separate table for each term in `bys` rather than creating a joint table.

## Value

An object of class `"svyby"`: a data frame showing the factors and the results of `FUN`.

For `unwtd.count`, the unweighted number of non-missing observations in the data matrix specified by `x` for the design.

## Note

The function works by making a lot of calls of the form `FUN(formula, subset(design, by==i))`, where `formula` is re-evaluated in each subset, so it is unwise to use data-dependent terms in `formula`. In particular, ```svyby(~factor(a), ~b, design=d, svymean)```, will create factor variables whose levels are only those values of `a` present in each subset. Either use `update.survey.design` to add variables to the design object instead or specify the levels explicitly in the call to `factor`.

## Note

Asking for a design effect (`deff=TRUE`) from a function that does not produce one will cause an error or incorrect formatting of the output. The same will occur with `keep.var=TRUE` if the function does not compute a standard error.

`svytable` and `ftable.svystat` for contingency tables, `ftable.svyby` for pretty-printing of `svyby`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63``` ```data(api) dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) svyby(~api99, ~stype, dclus1, svymean) svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci") ## without ci=TRUE svyquantile does not compute standard errors svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE) svyby(~api99, list(school.type=apiclus1\$stype), dclus1, svymean) svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci") svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE) ## report raw number of observations svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE) rclus1<-as.svrepdesign(dclus1) svyby(~api99, ~stype, rclus1, svymean) svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5) svyby(~api99, list(school.type=apiclus1\$stype), rclus1, svymean, vartype="cv") svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE) svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE) ##report raw number of observations svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE) ## comparing subgroups using covmat=TRUE mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE) vcov(mns) svycontrast(mns, c(E = 1, M = -1)) str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE)) tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE) vcov(tots) svycontrast(tots, quote(E/H)) ## comparing subgroups uses the delta method unless replicates are present meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE) svycontrast(meanlogs, quote(exp(E-H))) meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE) svycontrast(meanlogs, quote(exp(E-H))) ## extractor functions (a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE, vartype=c("se","cv","cvpct","var"))) deff(a) SE(a) cv(a) coef(a) confint(a, df=degf(rclus1)) ## ratio estimates svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio) ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE) vcov(ratios) ## empty groups svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean) svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE) ## Multiple tables svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean) ```