Create an object summarizing both continuous and categorical variables

Description

Create an object summarizing all baseline variables (both continuous and categorical) optionally stratifying by one or more startifying variables and performing statistical tests. The object gives a table that is easy to use in medical research papers.

Usage

1
2
3
4
5
CreateTableOne(vars, strata, data, factorVars, includeNA = FALSE,
  test = TRUE, testApprox = chisq.test, argsApprox = list(correct = TRUE),
  testExact = fisher.test, argsExact = list(workspace = 2 * 10^5),
  testNormal = oneway.test, argsNormal = list(var.equal = TRUE),
  testNonNormal = kruskal.test, argsNonNormal = list(NULL), smd = TRUE)

Arguments

vars

Variables to be summarized given as a character vector. Factors are handled as categorical variables, whereas numeric variables are handled as continuous variables. If empty, all variables in the data frame specified in the data argument are used.

strata

Stratifying (grouping) variable name(s) given as a character vector. If omitted, the overall results are returned.

data

A data frame in which these variables exist. All variables (both vars and strata) must be in this data frame.

factorVars

Numerically coded variables that should be handled as categorical variables given as a character vector. If omitted, only factors are considered categorical variables. If all categorical variables in the dataset are already factors, this option is not necessary. The variables specified here must also be specified in the vars argument.

includeNA

If TRUE, NA is handled as a regular factor level rather than missing. NA is shown as the last factor level in the table. Only effective for categorical variables.

test

If TRUE, as in the default and there are more than two groups, groupwise comparisons are performed.

testApprox

A function used to perform the large sample approximation based tests. The default is chisq.test. This is not recommended when some of the cell have small counts like fewer than 5.

argsApprox

A named list of arguments passed to the function specified in testApprox. The default is list(correct = TRUE), which turns on the continuity correction for chisq.test.

testExact

A function used to perform the exact tests. The default is fisher.test. If the cells have large numbers, it will fail because of memory limitation. In this situation, the large sample approximation based should suffice.

argsExact

A named list of arguments passed to the function specified in testExact. The default is list(workspace = 2*10^5), which specifies the memory space allocated for fisher.test.

testNormal

A function used to perform the normal assumption based tests. The default is oneway.test. This is equivalent of the t-test when there are only two groups.

argsNormal

A named list of arguments passed to the function specified in testNormal. The default is list(var.equal = TRUE), which makes it the ordinary ANOVA that assumes equal variance across groups.

testNonNormal

A function used to perform the nonparametric tests. The default is kruskal.test (Kruskal-Wallis Rank Sum Test). This is equivalent of the wilcox.test (Man-Whitney U test) when there are only two groups.

argsNonNormal

A named list of arguments passed to the function specified in testNonNormal. The default is list(NULL), which is just a placeholder.

smd

If TRUE, as in the default and there are more than two groups, standardized mean differences for all pairwise comparisons are calculated.

Details

The definitions of the standardized mean difference (SMD) are available in Flury et al 1986 for the univariate case and the multivariate case (essentially the square root of the Mahalanobis distance). Extension to binary variables is discussed in Austin 2009 and extension to multinomival variables is suggested in Yang et al 2012. This multinomial extesion treats a single multinomial variable as multiple non-redundant dichotomous variables and use the Mahalanobis distance. The off diagonal elements of the covariance matrix on page 3 have an error, and need negation. In weighted data, the same definitions can be used except that the mean and standard deviation estimates are weighted estimates (Li et al 2013 and Austin et al 2015). In tableone, all weighted estimates are calculated by weighted estimation functions in the survey package.

Value

An object of class TableOne, which is a list of three objects.

ContTable

object of class ContTable, containing continuous variables only

CatTable

object of class CatTable, containing categorical variables only

MetaData

list of metadata regarding variables

Author(s)

Kazuki Yoshida, Justin Bohn

References

Flury, BK. and Riedwyl, H. (1986). Standard distance in univariate and multivariate analysis. The American Statistician, 40, 249-251.

Austin, PC. (2009). Using the Standardized Difference to Compare the Prevalence of a Binary Variable Between Two Groups in Observational Research. Communications in Statistics - Simulation and Computation, 38, 1228-1234.

Yang, D. and Dalton, JE. (2012). A unified approach to measuring the effect size between two groups using SAS. SAS Global Forum 2012, Paper 335-2012.

Li, L. and Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. International Journal of Biostatistics, 9, 215-234.

Austin, PC. and Stuart, EA. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, Online on August 3, 2015.

See Also

print.TableOne, summary.TableOne

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## Load
library(tableone)

## Load Mayo Clinic Primary Biliary Cirrhosis Data
library(survival)
data(pbc)
## Check variables
head(pbc)

## Make categorical variables factors
varsToFactor <- c("status","trt","ascites","hepato","spiders","edema","stage")
pbc[varsToFactor] <- lapply(pbc[varsToFactor], factor)

## Create a variable list
dput(names(pbc))
vars <- c("time","status","age","sex","ascites","hepato",
          "spiders","edema","bili","chol","albumin",
          "copper","alk.phos","ast","trig","platelet",
          "protime","stage")

## Create Table 1 stratified by trt
tableOne <- CreateTableOne(vars = vars, strata = c("trt"), data = pbc)

## Just typing the object name will invoke the print.TableOne method
tableOne

## Specifying nonnormal variables will show the variables appropriately,
## and show nonparametric test p-values. Specify variables in the exact
## argument to obtain the exact test p-values. cramVars can be used to
## show both levels for a 2-level categorical variables.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
      exact = c("status","stage"), cramVars = "hepato", smd = TRUE)

## Use the summary.TableOne method for detailed summary
summary(tableOne)

## See the categorical part only using $ operator
tableOne$CatTable
summary(tableOne$CatTable)

## See the continuous part only using $ operator
tableOne$ContTable
summary(tableOne$ContTable)

## If your work flow includes copying to Excel and Word when writing manuscripts,
## you may benefit from the quote argument. This will quote everything so that
## Excel does not mess up the cells.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
      exact = c("status","stage"), quote = TRUE)

## If you want to center-align values in Word, use noSpaces option.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
      exact = c("status","stage"), quote = TRUE, noSpaces = TRUE)

## If SMDs are needed as numericals, use ExtractSmd()
ExtractSmd(tableOne)