In benjaminrich/table1: Tables of Descriptive Statistics in HTML

library(table1, quietly=TRUE)
try(detach('package:printr', unload = TRUE), silent=TRUE) # Make sure printr is not loaded

Introduction

Weighted descriptive statistics are required in some contexts, for instance, in the analysis of survey data. This can be accomplished by using the provided weighted wrapper class. Internally, this will cause the functions wtd.mean, wtd.var, wtd.quantile and wtd.table from the Hmisc package (which is optional in general but required to use this functionality) to be used in place of their standard non-weighted counterparts.

Example

We take an example from the survey package (note that this is for illustration purposes only, it is not meant to be a real application):

library(survey, quietly=TRUE)
data(myco)

myco$Leprosy <- factor(myco$leprosy, levels=1:0, labels=c("Leprosy Cases", "Controls"))

myco$AgeCat <- factor(myco$Age,
    levels=c(7.5,      12.5,       17.5,       22.5,       27.5,       32.5      ),
    labels=c("5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34")
)

myco$ScarL <- as.logical(myco$Scar)

label(myco$Age) <- "Age"
units(myco$Age) <- "years"
label(myco$AgeCat) <- "Age Group"
label(myco$ScarL) <- "BCG vaccination scar"

table1(~ ScarL + Age + AgeCat | Leprosy, data=weighted(myco, wt), big.mark=",")

It also works in "transpose" mode:

table1(~ Age + ScarL | Leprosy, data=weighted(myco, wt), transpose=T, big.mark=",")

For more flexibility, we may not want the weighting to be applied globally, but only to some of the variables? We can do this as well, by using weighted on individual variables:

table1(~ weighted(ScarL, wt) + Age + AgeCat | Leprosy, data=myco, big.mark=",")

This implementation allows for simple weighted statistics, but does not currently support more complex designs from the survey package like stratified sampling or cluster sampling.

The `weighted` and `indexed` classes

The weighted class is just a wrapper around a vector or data.frame that adds a vector of weights as an attribute. These weights are carried along or subsetted appropriately during operations like slicing or subsetting. See ?weighted for some examples.

The indexed class is similar, but it simply maintains the indices of a vector (row indices for a data.frame) when a subset or slide is taken. This leeds to some interesting possibilities when we want to do more complex things.

The following example also comes from the survey package:

data(api)

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99+api00, ~stype, dclus1, svymean)

svytable(~sch.wide+stype, dclus1)

Using table1, the same results can be presented more beautifully:

myrender <- function(x, name, ...) {
    if (is.numeric(x)) {
        r <- svymean(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- c(Mean=as.numeric(r), SE=sqrt(attr(r, "var", exact=T)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=","))
    } else {
        r <- svytable(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=",", digits=1, rounding.fn=round_pad))
    }
    c("", r)
}

apiclus1$stype2 <- factor(apiclus1$stype, levels=c("E", "M", "H"),
    labels=c("Elementary", "Middle School", "High School"))

label(apiclus1$api99)    <- "API in 1999"
label(apiclus1$api00)    <- "API in 2000"
label(apiclus1$sch.wide) <- "Met school-wide growth target?"

table1(~ api99 + api00 + sch.wide | stype2, indexed(apiclus1), render=myrender,
    render.strat=names)