knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In health science research, producing tables of baseline characteristics (commonly called Table 1) of patient cohorts is a routine procedure. Much of the analysis-specific data cleaning work is required in the steps leading up to table formation, but without a streamlined process for manipulating and organizing values of resulting calculations, considerable time can be spent pasting results together. The make_table_one
function attempts to make this calculation, manipulation, and organization of summary values as streamlined as possible.
Pre-function data cleaning:
Using make_table_one
:
make_table_one
functionmake_table_one
make_table_one(df, grouping_var, num_vars = NULL, num_display = "PM", binary_cat_vars = NULL, multiple_cat_vars = NULL, cat_display = "CP", subgroups_m = NULL, mean_vars_for_subgroups = NULL, subgroups_c = NULL, count_vars_for_subgroups = NULL, order_of_vars = NULL, export_rtf = FALSE, rtf_filename = NULL, show_pval = TRUE, digits = 2)
df
: Dataset containing covariatesgrouping_var
: Variable to group by (will be columns of table). Must be included (ie. one column summary tables currently not allowed).num_vars
: Vector of numeric variablesnum_display
: How should numeric variable results be displayed? ('PM'
for mean +- sd, 'PRS'
for mean (sd), defaults to 'PM'
)binary_cat_vars
: Vector of binary variables (in 0-1 format)multiple_cat_vars
: Vector of multiple level categorical variablescat_display
: How should categorical results be displayed? ('CP'
= counts and proportions, 'C'
= counts,
'P'
= proportions, defaults to 'CP'
)subgroups_m
: Vector of subgroup variables (must be factors) of interest for calculating means of a numeric variable. NOTE: If you plan to analyze a categorical variable by itself and then also use it for subgroup analysis, it is advised to copy and rename the columns of the categorical variables you wish to use as subgroups (ie. rename variable to something like 'variable_sub_m") so Table One can be properly sorted to reflect your specification in 'order_of_vars'. If the subgroup does not have a unique name, it will be sorted directly below the categorical variable.mean_vars_for_subgroups
: Vector of numeric variables from which to calculate mean and sd (must match vector position of corresponding subgroups_m
variable)subgroups_c
: Vector of subgroup variables (must be factors) of interest for calculating counts of a BINARY variable. NOTE: Follow same naming convention of subgroups as that listed in subgroups_m
.count_vars_for_subgroups
: Vector of BINARY variables from which to calculate mean and sd (must match vector position of corresponding subgroups_c
variable.) Results are displayed for the case where the binary count variable is equal to 1.order_of_vars
: A vector of all variables listed in the order of the rows you desire for your Table One.export_rtf
: Logical. Do you want to export an RTF to location specified in 'rtf_filename'
? (Defaults to FALSE
)rtf_filename
: Must be quoted and include '.rtf' extension show_pval
: Logical. Should the p-value results be displayed? Defaults to TRUE
.digits
: Number of digits to round decimals. Defaults to 2
.Once you have imported your data, clean it appropriately:
baseline <- read_sas("my_sas_file.sas7bdat") my_cleaned_baseline <- baseline %>% mutate(var1 = if_else(ex_var1, 0, 1), var2 = if_else(ex_var1, 0, 1), var3 = if_else(ex_var3 == 1, 1, 0)) %>% select(grouping_var, some_vector_of_other_variables)
Now make vectors of the names of your different types of variables. NOTE: If you plan to analyze a categorical variable by itself and then also use it for subgroup analysis, it is advised to copy and rename the columns of the categorical variables you wish to use as subgroups (ie. rename variable to something like 'variable_sub") so make_table_one
can be properly sorted to reflect your specification in 'order_of_vars'. If the subgroup does not have a unique name, it will be sorted directly below the categorical variable.
num_vars <- c('num_var1', 'numvar2', ...) binary_cat_vars <- c('bin_var1', 'bin_var2', ...) multiple_cat_vars <- c('mc_var1', 'mc_var2', ...) subgroups_m <- c('subgroup_m_1', subgroup_m_2', ...) mean_vars <- c('calulate_mean_of_this_for_subgroup_m_1', 'calulate_mean_of_this_for_subgroup_m_2', ...) subgroups_c <- c('subgroup_c_1', subgroup_c_2', ...) count_vars <- c('calulate_count_of_this_for_subgroup_c_1', 'calulate_count_of_this_for_subgroup_c_2', ...))
Convert your categorical variables into factors (VERY IMPORTANT):
my_cleaned_baseline[binary_cat_vars] <- lapply(my_cleaned_baseline[binary_cat_vars], factor) my_cleaned_baseline[multiple_cat_vars] <- lapply(my_cleaned_baseline[multiple_cat_vars], factor) my_cleaned_baseline[subgroups_m] <- lapply(my_cleaned_baseline[subgroups_m], factor) my_cleaned_baseline[subgroups_c] <- lapply(my_cleaned_baseline[subgroups_c], factor)
Create a vector containing the order of variables as you wish them to be displayed in your final table:
var_order <- c('var_for_first_row', 'var_for_second_row', ...)
make_table_one
Once you have your data cleaned and vectors of variable names as above, substitute into make_table_one
and save as an object for viewing within R.
table_1 <- make_table_one(df = my_cleaned_baseline, grouping_var = random_assignment_group, num_vars = num_vars, num_display = 'PM', binary_cat_vars = binary_cat_vars, multiple_cat_vars = multiple_cat_vars, cat_display = 'CP', subgroups_m = subgroups_m, mean_vars_for_subgroups = mean_vars, subgroups_c = subgroups_c, count_vars_for_subgroups = count_vars, order_of_vars = var_order, export_rtf = TRUE, rtf_filename = 'my_location/my_filename.rtf', show_pval = TRUE, digits = 2)
Notice that export_rtf = TRUE
so an RTF will be generated at the location specified in rtf_filename = 'my_location/my_filename.rtf'
.
In the future, I hope to add the following:
make_table_one
!Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.