Create an object summarizing all baseline variables (both continuous and categorical) optionally stratifying by one or more startifying variables and performing statistical tests. The object gives a table that is easy to use in medical research papers.

1 2 3 4 5 | ```
CreateTableOne(vars, strata, data, factorVars, includeNA = FALSE,
test = TRUE, testApprox = chisq.test, argsApprox = list(correct = TRUE),
testExact = fisher.test, argsExact = list(workspace = 2 * 10^5),
testNormal = oneway.test, argsNormal = list(var.equal = TRUE),
testNonNormal = kruskal.test, argsNonNormal = list(NULL), smd = TRUE)
``` |

`vars` |
Variables to be summarized given as a character vector. Factors are handled as categorical variables, whereas numeric variables are handled as continuous variables. If empty, all variables in the data frame specified in the data argument are used. |

`strata` |
Stratifying (grouping) variable name(s) given as a character vector. If omitted, the overall results are returned. |

`data` |
A data frame in which these variables exist. All variables (both vars and strata) must be in this data frame. |

`factorVars` |
Numerically coded variables that should be handled as categorical variables given as a character vector. If omitted, only factors are considered categorical variables. If all categorical variables in the dataset are already factors, this option is not necessary. The variables specified here must also be specified in the |

`includeNA` |
If TRUE, NA is handled as a regular factor level rather than missing. NA is shown as the last factor level in the table. Only effective for categorical variables. |

`test` |
If TRUE, as in the default and there are more than two groups, groupwise comparisons are performed. |

`testApprox` |
A function used to perform the large sample approximation based tests. The default is |

`argsApprox` |
A named list of arguments passed to the function specified in testApprox. The default is |

`testExact` |
A function used to perform the exact tests. The default is |

`argsExact` |
A named list of arguments passed to the function specified in testExact. The default is |

`testNormal` |
A function used to perform the normal assumption based tests. The default is |

`argsNormal` |
A named list of arguments passed to the function specified in |

`testNonNormal` |
A function used to perform the nonparametric tests. The default is |

`argsNonNormal` |
A named list of arguments passed to the function specified in |

`smd` |
If TRUE, as in the default and there are more than two groups, standardized mean differences for all pairwise comparisons are calculated. |

The definitions of the standardized mean difference (SMD) are available in Flury *et al* 1986 for the univariate case and the multivariate case (essentially the square root of the Mahalanobis distance). Extension to binary variables is discussed in Austin 2009 and extension to multinomival variables is suggested in Yang *et al* 2012. This multinomial extesion treats a single multinomial variable as multiple non-redundant dichotomous variables and use the Mahalanobis distance. The off diagonal elements of the covariance matrix on page 3 have an error, and need negation. In weighted data, the same definitions can be used except that the mean and standard deviation estimates are weighted estimates (Li *et al* 2013 and Austin *et al* 2015). In tableone, all weighted estimates are calculated by weighted estimation functions in the `survey`

package.

An object of class `TableOne`

, which is a list of three objects.

`ContTable` |
object of class |

`CatTable` |
object of class |

`MetaData` |
list of metadata regarding variables |

Kazuki Yoshida, Justin Bohn

Flury, BK. and Riedwyl, H. (1986). Standard distance in univariate and multivariate analysis. *The American Statistician*, **40**, 249-251.

Austin, PC. (2009). Using the Standardized Difference to Compare the Prevalence of a Binary Variable Between Two Groups in Observational Research. *Communications in Statistics - Simulation and Computation*, **38**, 1228-1234.

Yang, D. and Dalton, JE. (2012). A unified approach to measuring the effect size between two groups using SAS. SAS Global Forum 2012, Paper 335-2012.

Li, L. and Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. *International Journal of Biostatistics*, **9**, 215-234.

Austin, PC. and Stuart, EA. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. *Statistics in Medicine*, Online on August 3, 2015.

`print.TableOne`

, `summary.TableOne`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ```
## Load
library(tableone)
## Load Mayo Clinic Primary Biliary Cirrhosis Data
library(survival)
data(pbc)
## Check variables
head(pbc)
## Make categorical variables factors
varsToFactor <- c("status","trt","ascites","hepato","spiders","edema","stage")
pbc[varsToFactor] <- lapply(pbc[varsToFactor], factor)
## Create a variable list
dput(names(pbc))
vars <- c("time","status","age","sex","ascites","hepato",
"spiders","edema","bili","chol","albumin",
"copper","alk.phos","ast","trig","platelet",
"protime","stage")
## Create Table 1 stratified by trt
tableOne <- CreateTableOne(vars = vars, strata = c("trt"), data = pbc)
## Just typing the object name will invoke the print.TableOne method
tableOne
## Specifying nonnormal variables will show the variables appropriately,
## and show nonparametric test p-values. Specify variables in the exact
## argument to obtain the exact test p-values. cramVars can be used to
## show both levels for a 2-level categorical variables.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
exact = c("status","stage"), cramVars = "hepato", smd = TRUE)
## Use the summary.TableOne method for detailed summary
summary(tableOne)
## See the categorical part only using $ operator
tableOne$CatTable
summary(tableOne$CatTable)
## See the continuous part only using $ operator
tableOne$ContTable
summary(tableOne$ContTable)
## If your work flow includes copying to Excel and Word when writing manuscripts,
## you may benefit from the quote argument. This will quote everything so that
## Excel does not mess up the cells.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
exact = c("status","stage"), quote = TRUE)
## If you want to center-align values in Word, use noSpaces option.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
exact = c("status","stage"), quote = TRUE, noSpaces = TRUE)
## If SMDs are needed as numericals, use ExtractSmd()
ExtractSmd(tableOne)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.