tabmeans: Generate Summary Tables of Mean Comparisons for Statistical...
In tab: Functions for Creating Summary Tables for Statistical Reports

Description Usage Arguments Details Value Note Author(s) References See Also Examples

This function compares the mean of a continuous variable across levels of a categorical variable and summarizes the results in a clean table (or figure) for a statistical report.

tabmeans(x, y, latex = FALSE, variance = "unequal", xname = NULL,
         xlevels = NULL, yname = NULL, quantiles = NULL, quantile.vals = FALSE,
         parenth = "sd", text.label = NULL, parenth.sep = "-", decimals = NULL,
         p.include = TRUE, p.decimals = c(2, 3), p.cuts = 0.01,
         p.lowerbound = 0.001, p.leading0 = TRUE, p.avoid1 = FALSE,
         overall.column = TRUE, n.column = FALSE, n.headings = TRUE,
         bold.colnames = TRUE, bold.varnames = FALSE,
         variable.colname = "Variable", fig = FALSE, fig.errorbars = "z.ci",
         fig.title = NULL, print.html = FALSE, html.filename = "table1.html")

`x`	Vector of values for the categorical `x` variable.
`y`	Vector of values for the continuous `y` variable.
`latex`	If `TRUE`, object returned is formatted for printing in LaTeX using `xtable` [1]; if `FALSE`, formatted for copy-and-pasting from RStudio into a word processor.
`variance`	Controls whether equal variance t-test or unequal variance t-test is used when `x` has two levels. Possible values are `"equal"` for equal variance, `"unequal"` for unequal variance, and `"ftest"` for F test to determine which version of the t-test to use. Note that unequal variance t-test is less restrictive than equal variance t-test, and the F test is only valid when `y` is normally distributed in both `x` groups.
`xname`	Label for the categorical variable. Only used if `fig = TRUE`.
`xlevels`	Optional character vector to label the levels of `x`, used in the column headings. If unspecified, the function uses the values that `x` takes on.
`yname`	Optional label for the continuous `y` variable. If unspecified, variable name of `y` is used.
`quantiles`	If specified, function compares means of the `y` variable across quantiles of the `x` variable. For example, if `x` contains continuous BMI values and `y` contains continuous HDL cholesterol levels, setting `quantiles = 3` would result in mean HDL being compared across tertiles of BMI.
`quantile.vals`	If `TRUE`, labels for `x` show quantile number and corresponding range of the `x` variable, e.g. Q1 [0.00, 0.25). If `FALSE`, labels for quantiles just show quantile number, e.g. Q1. Only used if `xlevels` is not specified.
`parenth`	Controls what values (if any) are placed in parentheses after the means in each cell. Possible values are `"none"`, `"sd"` for standard deviation, `"se"` for standard error, `"t.ci"` for 95% confidence interval for population mean based on t distribution, and `"z.ci"` for 95% confidence interval for population mean based on z distribution.
`text.label`	Optional text to put after the `y` variable name, identifying what cell values and parentheses indicate in the table. If unspecified, function uses default labels based on `parenth`, e.g. M (SD) if `parenth = "sd"`. Set to `"none"` for no text labels.
`parenth.sep`	Optional character specifying the separator between lower and upper bound of confidence interval (when requested). Usually either `"-"` or `", "` depending on user preference.
`decimals`	Number of decimal places for numeric values in the table (except p-values). If unspecified, function uses 0 decimal places if the largest mean (in magnitude) is in [1,000, Inf), 1 decimal place if [10, 1,000), 2 decimal places if [0.1, 10), 3 decimal places if [0.01, 0.1), 4 decimal places if [0.001, 0.01), 5 decimal places if [0.0001, 0.001), and 6 decimal places if [0, 0.0001).
`p.include`	If `FALSE`, t-test is not performed and p-value is not returned.
`p.decimals`	Number of decimal places for p-values. If a vector is provided rather than a single value, number of decimal places will depend on what range the p-value lies in. See `p.cuts`.
`p.cuts`	Cut-point(s) to control number of decimal places used for p-values. For example, by default `p.cuts = 0.1` and `p.decimals = c(2, 3)`. This means that p-values in the range [0.1, 1] will be printed to two decimal places, while p-values in the range [0, 0.1) will be printed to three decimal places.
`p.lowerbound`	Controls cut-point at which p-values are no longer printed as their value, but rather <lowerbound. For example, by default `p.lowerbound = 0.001`. Under this setting, p-values less than 0.001 are printed as `<0.001`.
`p.leading0`	If `TRUE`, p-values are printed with 0 before decimal place; if `FALSE`, the leading 0 is omitted.
`p.avoid1`	If `TRUE`, p-values rounded to 1 are not printed as 1, but as `>0.99` (or similarly depending on `p.decimals` and `p.cuts`).
`overall.column`	If `FALSE`, column showing mean of `y` in full sample is suppressed.
`n.column`	If `TRUE`, the table will have a column for sample size.
`n.headings`	If `TRUE`, the table will indicate the sample size overall and in each group in parentheses after the column headings.
`bold.colnames`	If `TRUE`, column headings are printed in bold font. Only applies if `latex = TRUE`.
`bold.varnames`	If `TRUE`, variable name in the first column of the table is printed in bold font. Only applies if `latex = TRUE`.
`variable.colname`	Character string with desired heading for first column of table, which shows the `y` variable name.
`fig`	If `TRUE`, a figure is returned rather than a table. The figure shows mean (95% confidence interval) for each level of `x`.
`fig.errorbars`	Controls error bars around mean when `fig = TRUE`. Possible values are `"sd"` for +/- 1 standard deviation, `"se"` for +/- 1 standard error, `"t.ci"` for 95% confidence interval based on t distribution, `"z.ci"` for 95% confidence interval based on z distribution, and `"none"` for no error bars.
`fig.title`	Title of figure. If unspecified, title is set to `"Mean yname by xname"`.
`print.html`	If `TRUE`, function prints a .html file to the current working directory.
`html.filename`	Character string indicating the name of the .html file that gets printed if `print.html = TRUE`.

If x has two levels, a t-test is used to test for a difference in means. If x has more than two levels, a one-way analysis of variance is used to test for a difference in means across the groups.

Both x and y can have missing values. The function drops observations with missing x or y.

A character matrix with the requested table comparing mean y across levels of x. If latex = TRUE, the character matrix will be formatted for inserting into a Markdown/Sweave/knitr report using the xtable package [1].

If you wish to paste your tables into Word, you can use either of these approaches:

1. Use the write.cb function in the Kmisc package [2]. If your table is stored in a character matrix named table1, use write.cb(table1) to copy the table to your clipboard. Paste the result into Word, then highlight the text and go to Insert - Table - Convert Text to Table... OK.

2. Set print.html = TRUE. This will result in a .html file writing to your current working directory. When you open this file, you will see a nice looking table that you can copy and paste into Word. You can control the name of this file with html.filename.

If you wish to use LaTeX, R Markdown, knitr, Sweave, etc., set latex = TRUE and then use xtable [1]. You may have to set sanitize.text.function = identity when calling print.xtable.

If you have suggestions for additional options or features, or if you would like some help using any function in tab, please e-mail me at vandomed@gmail.com. Thanks!

Dane R. Van Domelen

1. Dahl DB (2013). xtable: Export tables to LaTeX or HTML. R package version 1.7-1, https://cran.r-project.org/package=xtable.

2. Kevin Ushey (2013). Kmisc: Kevin Miscellaneous. R package version 0.5.0. https://CRAN.R-project.org/package=Kmisc.

Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.

tabfreq
tabmedians
tabmulti
tabglm
tabcox
tabgee
tabfreq.svy
tabmeans.svy
tabmedians.svy
tabmulti.svy
tabglm.svy

# Load in sample dataset d and drop rows with missing values
data(d)
d <- d[complete.cases(d), ]

# Compare mean BMI in control group vs. treatment group - table and figure
meanstable1 <- tabmeans(x = d$Group, y = d$BMI)
meansfig1 <- tabmeans(x = d$Group, y = d$BMI, fig = TRUE)

# Compare mean BMI by race - table and figure
meanstable2 <- tabmeans(x = d$Race, y = d$BMI)
meansfig2 <- tabmeans(x = d$Race, y = d$BMI, fig = TRUE)

# Compare mean baseline systolic BP across tertiles of BMI - table and figure
meanstable3 <- tabmeans(x = d$BMI, y = d$bp.1, yname = "Systolic BP",
                        quantiles = 3)
meansfig3 <- tabmeans(x = d$BMI, y = d$bp.1, quantiles = 3, fig = TRUE,
                      yname = "Systolic BP", xname = "BMI Tertile")

# Create single table comparing mean BMI and mean age in control vs. treatment
# group
meanstable4 <- rbind(tabmeans(x = d$Group, y = d$BMI),
                     tabmeans(x = d$Group, y = d$Age))

# An easier way to make the above table is to call the tabmulti function
meanstable5 <- tabmulti(dataset = d, xvarname = "Group",
                        yvarnames = c("BMI", "Age"))

# meanstable4 and meanstable5 are equivalent
all(meanstable4 == meanstable5)