# summary_factorlist: Summarise a set of factors (or continuous variables) by a... In finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling

 summary_factorlist R Documentation

## Summarise a set of factors (or continuous variables) by a dependent variable

### Description

A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

### Usage

```summary_factorlist(
.data,
dependent = NULL,
explanatory = NULL,
formula = NULL,
cont = "mean",
cont_nonpara = NULL,
cont_cut = 5,
cont_range = TRUE,
p = FALSE,
p_cont_para = "aov",
p_cat = "chisq",
column = TRUE,
total_col = FALSE,
orderbytotal = FALSE,
digits = c(1, 1, 3, 1, 0),
na_include = FALSE,
na_include_dependent = FALSE,
na_complete_cases = FALSE,
na_to_p = FALSE,
na_to_prop = TRUE,
fit_id = FALSE,
dependent_label_prefix = "Dependent: ",
dependent_label_suffix = "",
include_col_totals_percent = TRUE,
col_totals_rowname = NULL,
col_totals_prefix = "",
include_row_totals_percent = TRUE,
include_row_missing_col = TRUE,
row_totals_colname = "Total N",
row_missing_colname = "Missing N",
catTest = NULL,
weights = NULL
)
```

### Arguments

 `.data` Dataframe. `dependent` Character vector of length 1: name of dependent variable (2 to 5 factor levels). `explanatory` Character vector of any length: name(s) of explanatory variables. `formula` an object of class "formula" (or one that can be coerced to that class). Optional instead of standard dependent/explanatory format. Do not include if using dependent/explanatory. `cont` Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below). `cont_nonpara` Numeric vector of form e.g. `c(1,2)`. Specify which variables to perform non-parametric hypothesis tests on and summarise with "median". `cont_cut` Numeric: number of unique values in continuous variable at which to consider it a factor. `cont_range` Logical. Median is show with 1st and 3rd quartiles. `p` Logical: Include null hypothesis statistical test. `p_cont_para` Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test. For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient. `p_cat` Character. Categorical variable test. One of either "chisq" or "fisher". `column` Logical: Compute margins by column rather than row. `total_col` Logical: include a total column summing across factor levels. `orderbytotal` Logical: order final table by total column high to low. `digits` Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage, (5) weighted count. `na_include` Logical: make explanatory variables missing data explicit (`NA`). `na_include_dependent` Logical: make dependent variable missing data explicit. `na_complete_cases` Logical: include only rows with complete data. `na_to_p` Logical: include missing as group in statistical test. `na_to_prop` Logical: include missing in calculation of column proportions. `fit_id` Logical: allows merging via `finalfit_merge`. `add_dependent_label` Add the name of the dependent label to the top left of table. `dependent_label_prefix` Add text before dependent label. `dependent_label_suffix` Add text after dependent label. `add_col_totals` Logical. Include column total n. `include_col_totals_percent` Include column percentage of total. `col_totals_rowname` Logical. Row name for column totals. `col_totals_prefix` Character. Prefix to column totals, e.g. "N=". `add_row_totals` Logical. Include row totals. Note this differs from `total_col` above particularly for continuous explanatory variables. `include_row_totals_percent` Include row percentage of total. `include_row_missing_col` Logical. Include missing data total for each row. Only used when `add_row_totals` is `TRUE`. `row_totals_colname` Character. Column name for row totals. `row_missing_colname` Character. Column name for missing data totals for each row. `catTest` Deprecated. See `p_cat` above. `weights` Character vector of length 1: name of column to use for weights. Explanatory continuous variables are multiplied by weights. Explanatory categorical variables are counted with a frequency weight (sum(weights)).

### Details

This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

### Value

Returns a `factorlist` dataframe.

`fit2df` `ff_column_totals` `ff_row_totals` `ff_label` `ff_glimpse` `ff_percent_only`. For lots of examples, see https://finalfit.org/

### Examples

```library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)

# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
summary_factorlist(dependent, explanatory, p=TRUE)

# summary.factorlist() is also commonly used to summarise any number of
# variables by an outcome variable (say dead yes/no).

# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "mort_5yr"
colon_s %>%
summary_factorlist(dependent, explanatory)
```

finalfit documentation built on Jan. 14, 2023, 5:07 p.m.