# summary_factorlist: Summarise a set of factors (or continuous variables) by a... In finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling

## Description

A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```summary_factorlist( .data, dependent = NULL, explanatory, cont = "mean", cont_nonpara = NULL, cont_cut = 5, cont_range = TRUE, p = FALSE, p_cont_para = "aov", p_cat = "chisq", column = TRUE, total_col = FALSE, orderbytotal = FALSE, digits = c(1, 1, 3, 1), na_include = FALSE, na_include_dependent = FALSE, na_complete_cases = FALSE, na_to_p = FALSE, na_to_prop = TRUE, fit_id = FALSE, add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ", dependent_label_suffix = "", add_col_totals = FALSE, include_col_totals_percent = TRUE, col_totals_rowname = NULL, col_totals_prefix = "", add_row_totals = FALSE, include_row_totals_percent = TRUE, include_row_missing_col = TRUE, row_totals_colname = "Total N", row_missing_colname = "Missing N", catTest = NULL ) ```

## Arguments

 `.data` Dataframe. `dependent` Character vector of length 1: name of dependent variable (2 to 5 factor levels). `explanatory` Character vector of any length: name(s) of explanatory variables. `cont` Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below). `cont_nonpara` Numeric vector of form e.g. `c(1,2)`. Specify which variables to perform non-parametric hypothesis tests on and summarise with "median". `cont_cut` Numeric: number of unique values in continuous variable at which to consider it a factor. `cont_range` Logical. Median is show with 1st and 3rd quartiles. `p` Logical: Include null hypothesis statistical test. `p_cont_para` Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test. For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient. `p_cat` Character. Categorical variable test. One of either "chisq" or "fisher". `column` Logical: Compute margins by column rather than row. `total_col` Logical: include a total column summing across factor levels. `orderbytotal` Logical: order final table by total column high to low. `digits` Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage. `na_include` Logical: make explanatory variables missing data explicit (`NA`). `na_include_dependent` Logical: make dependent variable missing data explicit. `na_complete_cases` Logical: include only rows with complete data. `na_to_p` Logical: include missing as group in statistical test. `na_to_prop` Logical: include missing in calculation of column proportions. `fit_id` Logical: allows merging via `finalfit_merge`. `add_dependent_label` Add the name of the dependent label to the top left of table. `dependent_label_prefix` Add text before dependent label. `dependent_label_suffix` Add text after dependent label. `add_col_totals` Logical. Include column total n. `include_col_totals_percent` Include column percentage of total. `col_totals_rowname` Logical. Row name for column totals. `col_totals_prefix` Character. Prefix to column totals, e.g. "N=". `add_row_totals` Logical. Include row totals. Note this differs from `total_col` above particularly for continuous explanatory variables. `include_row_totals_percent` Include row percentage of total. `include_row_missing_col` Logical. Include missing data total for each row. Only used when `add_row_totals` is `TRUE`. `row_totals_colname` Character. Column name for row totals. `row_missing_colname` Character. Column name for missing data totals for each row. `catTest` Deprecated. See `p_cat` above.

## Details

This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

## Value

Returns a `factorlist` dataframe.

`fit2df` `ff_column_totals` `ff_row_totals` `ff_label` `ff_glimpse` `ff_percent_only`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE) # summary.factorlist() is also commonly used to summarise any number of # variables by an outcome variable (say dead yes/no). # Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "mort_5yr" colon_s %>% summary_factorlist(dependent, explanatory) ```

### Example output

```Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

label      levels          No         Yes     p
Age (years)   Mean (SD) 59.8 (11.9) 58.4 (13.3) 0.542
Age   <40 years    68 (7.5)     2 (7.4) 1.000
40-59 years  334 (37.0)   10 (37.0)
60+ years  500 (55.4)   15 (55.6)
Sex      Female  432 (47.9)   13 (48.1) 1.000
Male  470 (52.1)   14 (51.9)
Obstruction          No  715 (81.2)   17 (63.0) 0.035
Yes  166 (18.8)   10 (37.0)
Warning message:
In chisq.test(age.factor, perfor.factor) :
Chi-squared approximation may be incorrect
Note: dependent includes missing data. These are dropped.
label      levels      Alive       Died
Age   <40 years   31 (6.1)   36 (8.9)
40-59 years 208 (40.7) 131 (32.4)
60+ years 272 (53.2) 237 (58.7)
Sex      Female 243 (47.6) 194 (48.0)
Male 268 (52.4) 210 (52.0)
Obstruction          No 408 (82.1) 312 (78.6)
Yes  89 (17.9)  85 (21.4)
Perforation          No 497 (97.3) 391 (96.8)
Yes   14 (2.7)   13 (3.2)
```

finalfit documentation built on June 11, 2021, 5:17 p.m.