Generate Summary Tables of Mean Comparisons for Statistical Reports

Share:

Description

This function compares the mean of a continuous variable across levels of a categorical variable and summarizes the results in a clean table (or figure) for a statistical report.

Usage

1
2
3
4
5
6
7
8
tabmeans(x, y, latex = FALSE, variance = "unequal", xname = NULL, xlevels = NULL, 
         yname = NULL, quantiles = NULL, quantile.vals = FALSE, parenth = "sd", 
         text.label = NULL, parenth.sep = "-", decimals = NULL, p.include = TRUE, 
         p.decimals = c(2, 3), p.cuts = 0.01, p.lowerbound = 0.001, p.leading0 = TRUE,
         p.avoid1 = FALSE, overall.column = TRUE, n.column = FALSE, n.headings = TRUE,
         bold.colnames = TRUE, bold.varnames = FALSE, variable.colname = "Variable", 
         fig = FALSE, fig.errorbars = "z.ci", fig.title = NULL, print.html = FALSE,
         html.filename = "table1.html")

Arguments

x

Vector of values for the categorical x variable.

y

Vector of values for the continuous y variable.

latex

If TRUE, object returned is formatted for printing in LaTeX using xtable [1]; if FALSE, formatted for copy-and-pasting from RStudio into a word processor.

variance

Controls whether equal variance t-test or unequal variance t-test is used when x has two levels. Possible values are "equal" for equal variance, "unequal" for unequal variance, or "ftest" for F test to determine which version of the t-test to use. Note that unequal variance t-test is less restrictive than equal variance t-test, and the F test is only valid when y is normally distributed in both x groups.

xname

Label for the categorical variable. Only used if fig is TRUE.

xlevels

Optional character vector to label the levels of x, used in the column headings. If unspecified, the function uses the values that x takes on.

yname

Optional label for the continuous y variable. If unspecified, variable name of y is used.

quantiles

If specified, function compares means of the y variable across quantiles of the x variable. For example, if x contains continuous BMI values and y contains continuous HDL cholesterol levels, setting quantiles to 3 would result in mean HDL being compared across tertiles of BMI.

quantile.vals

If TRUE, labels for x show quantile number and corresponding range of the x variable. For example, Q1 [0.00, 0.25). If FALSE, labels for quantiles just show quantile number (e.g. Q1). Only used if xlevels is not specified.

parenth

Controls what values (if any) are placed in parentheses after the means in each cell. Possible values are "none", "sd" for standard deviation, "se" for standard error, "t.ci" for 95% confidence interval for population mean based on t distribution, and "z.ci" for 95% confidence interval for population mean based on z distribution.

text.label

Optional text to put after the y variable name, identifying what cell values and parentheses indicate in the table. If unspecified, function uses default labels based on parenth, e.g. M (SD) if parenth is "sd". Set to "none" for no text labels.

parenth.sep

Optional character specifying the separator between lower and upper bound of confidence interval (when requested). Usually either "-" or ", " depending on user preference.

decimals

Number of decimal places for means and standard deviations/standard errors/confidence intervals. If unspecified, function uses 0 decimal places if the largest mean (in magnitude) is in [1,000, Inf), 1 decimal place if [10, 1,000), 2 decimal places if [0.1, 10), 3 decimal places if [0.01, 0.1), 4 decimal places if [0.001, 0.01), 5 decimal places if [0.0001, 0.001), and 6 decimal places if [0, 0.0001).

p.include

If FALSE, t-test is not performed and p-value is not returned.

p.decimals

Number of decimal places for p-values. If a vector is provided rather than a single value, number of decimal places will depend on what range the p-value lies in. See p.cuts.

p.cuts

Cut-point(s) to control number of decimal places used for p-values. For example, by default p.cuts is 0.1 and p.decimals is c(2, 3). This means that p-values in the range [0.1, 1] will be printed to two decimal places, while p-values in the range [0, 0.1) will be printed to three decimal places.

p.lowerbound

Controls cut-point at which p-values are no longer printed as their value, but rather <lowerbound. For example, by default p.lowerbound is 0.001. Under this setting, p-values less than 0.001 are printed as <0.001.

p.leading0

If TRUE, p-values are printed with 0 before decimal place; if FALSE, the leading 0 is omitted.

p.avoid1

If TRUE, p-values rounded to 1 are not printed as 1, but as >0.99 (or similarly depending on values for p.decimals and p.cuts).

overall.column

If FALSE, column showing mean of y in full sample is suppressed.

n.column

If TRUE, the table will have a column for (unweighted) sample size.

n.headings

If TRUE, the table will indicate the (unweighted) sample size overall and in each group in parentheses after the column headings.

bold.colnames

If TRUE, column headings are printed in bold font. Only applies if latex = TRUE.

bold.varnames

If TRUE, variable name in the first column of the table is printed in bold font. Only applies if latex = TRUE.

variable.colname

Character string with desired heading for first column of table, which shows the y variable name.

fig

If TRUE, a figure is returned rather than a table. The figure shows mean (95 percent confidence interval) for each level of x.

fig.errorbars

Controls error bars around mean when fig is TRUE. Possible values are "sd" for +/- 1 standard deviation, "se" for +/- 1 standard error, "t.ci" for 95% confidence interval based on t distribution, "z.ci" for 95% confidence interval based on z distribution, and "none" for no error bars.

fig.title

Title of figure. If unspecified, title is set to "Mean yname by xname".

print.html

If TRUE, function prints a .html file to the current working directory.

html.filename

Character string indicating the name of the .html file that gets printed if print.html is set to TRUE.

Details

If x has two levels, a t-test is used to test for a difference in means. If x has more than two levels, a one-way analysis of variance is used to test for a difference in means across the groups.

Both x and y can have missing values. The function drops observations with missing x or y.

Value

A character matrix with the requested table comparing mean y across levels of x. If latex is set to TRUE, the character matrix will be formatted for inserting into a Markdown/Sweave/knitr report using the xtable package [1].

Note

If you wish to paste your tables into Word, you can use either of these approaches:

1. Use the write.cb function in the Kmisc package [2]. If your table is stored in a character matrix named table1, use write.cb(table1) to copy the table to your clipboard. Paste the result into Word, then highlight the text and go to Insert - Table - Convert Text to Table... OK.

2. Set the print.html input to TRUE. This will result in a .html file writing to your current working directory. When you open this file, you will see a nice looking table that you can copy and paste into Word. You can control the name of this file with the html.filename input.

If you wish to use LaTeX, R Markdown, knitr, Sweave, etc., please see the package vignette for examples. In most cases, you have to set the latex input to TRUE and then use the xtable package [1].

If you have suggestions for additional options or features, or if you would like some help using any function in the package tab, please e-mail me at vandomed@gmail.com. Thanks!

Author(s)

Dane R. Van Domelen

References

1. Dahl DB (2013). xtable: Export tables to LaTeX or HTML. R package version 1.7-1, https://cran.r-project.org/package=xtable.

2. Kevin Ushey (2013). Kmisc: Kevin Miscellaneous. R package version 0.5.0. https://CRAN.R-project.org/package=Kmisc.

Acknowledgment: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-0940903.

See Also

tabfreq, tabmedians, tabmulti, tabglm, tabcox, tabgee, tabfreq.svy, tabmeans.svy, tabmedians.svy, tabmulti.svy, tabglm.svy

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Load in sample dataset d and drop rows with missing values
data(d)
d <- d[complete.cases(d), ]

# Compare mean BMI in control group vs. treatment group - table and figure
meanstable1 <- tabmeans(x = d$Group, y = d$BMI)
meansfig1 <- tabmeans(x = d$Group, y = d$BMI, fig = TRUE)

# Compare mean BMI by race - table and figure
meanstable2 <- tabmeans(x = d$Race, y = d$BMI)
meansfig2 <- tabmeans(x = d$Race, y = d$BMI, fig = TRUE)

# Compare mean baseline systolic BP across tertiles of BMI - table and figure
meanstable3 <- tabmeans(x = d$BMI, y = d$bp.1, yname = "Systolic BP", quantiles = 3)
meansfig3 <- tabmeans(x = d$BMI, y = d$bp.1, quantiles = 3, fig = TRUE, 
                      yname = "Systolic BP", xname = "BMI Tertile")

# Create single table comparing mean BMI and mean age in control vs. treatment group
meanstable4 <- rbind(tabmeans(x = d$Group, y = d$BMI), tabmeans(x = d$Group, y = d$Age))
                     
# An easier way to make the above table is to call the tabmulti function
meanstable5 <- tabmulti(dataset = d, xvarname = "Group", yvarnames = c("BMI", "Age"))
                        
# meanstable4 and meanstable5 are equivalent
all(meanstable4 == meanstable5)

# To move meanstable1 into Word, run write.cb(meanstable1) to copy the table onto your
# clipboard. Paste into Word, highlight the table and go to Insert - Table - Convert Text 
# to Table... OK. Alternatively, if you set print.html to TRUE, the function will write 
# a html file named html.filename to your current working directory. You can open this 
# file, copy the table, and paste it into Word.