tabler_stat: Description statistics 'tabler'

View source: R/utils2.R

tabler_statR Documentation

Description statistics tabler

Description

Wrapper function to create table of description statistics with optional tests of association.

Usage

tabler_stat(
  data,
  varname,
  byvar = NULL,
  digits = 0L,
  FUN = NULL,
  format_pval = TRUE,
  color_pval = TRUE,
  color_missing = TRUE,
  dagger = TRUE,
  color_cell_by = c("none", "value", "pct"),
  cell_color = palette()[1:2],
  confint = FALSE,
  survmedian = FALSE,
  survtime = FALSE,
  time = 0,
  include_na_in_prop = TRUE,
  iqr = FALSE,
  total = TRUE,
  continuous_fn = function(...) Gmisc::describeMedian(..., iqr = iqr),
  factor_fn = if (!include_na_in_prop) describeFactors else Gmisc::describeFactors,
  ...
)

Arguments

data

a matrix or data frame with variables varname and byvar

varname, byvar

the row and column variable, respectively

digits

number of digits past the decimal point to keep

FUN

a function performing the test of association between varname and byvar; FALSE will suppress the test but keep a column for p-values; NA will suppress the test and drop the column for p-values; or a character string; see details

format_pval

logical; if TRUE, p-values will be formatted using pvalr; alternatively, a function may by used which will be applied to each p-value

color_pval

logical; if TRUE, p-values will be colored by significance; see color_pval; alternatively, a vector of colors passed to color_pval)

color_missing

logical; if TRUE, rows summarizing missing values will be shown in light grey; alternatively, a color string can be used for a custom color

dagger

logical or a character string giving the character to associate with FUN; if FALSE, none are used; if TRUE, the defaults are used ("*" is used if FUN is given)

color_cell_by

apply a color gradient to each cell (for html output); one of "none" for no coloring, "value" to color by numeric summary (e.g., for continuous variables), or "pct" to color by proportions (e.g., for factors)

cell_color

a vector of colors used for color_cell_by

confint

logical or varname; if TRUE (or varname) rows will be formatted as confidence intervals; see binconr

include_na_in_prop

logical; if TRUE (default), the number of missing values is included when calculating proportions for factor levels; if FALSE, only non-missing levels count towards proportions

iqr

logical; if TRUE, the interquartile range is used instead of the full range (default) for continuous variables

total

logical; if TRUE, total column will be shown

continuous_fn, factor_fn

functions to describe continuous and factor-like variables (default is to show median and range for continuous); see getDescriptionStatsBy

...

additional arguments passed to getDescriptionStatsBy

Details

If FUN is FALSE, no test will be performed but a column for p-values will still be added; FUN = NA will prevent both the test from being performed and the column from being added to the output.

For FUN = NULL (default), the correct test will be guessed based on the row and column data. The current options are fisher.test, wilcox.test, and kruskal.test. If the row data is continuous, one of the latter two tests will be used based on the number of unique values in the column data.

For special cases, the function is not always guessed correctly (e.g., if the row data contains few unique values, a Fisher test may be used where not appropriate). One of the default tests can be given explicitly with a character string, one of "fisher", "wilcoxon", "ttest", "kruskal", "chisq", "anova", "cuzick", "jt", "ca", or "kw" (can be abbreviated).

If FUN is a function, it must take two vector arguments: the row variable vector, data$varname, and the column variable vector, data$byvar, in this order and return a numeric p-value only.

Value

A matrix with additional attributes:

attr(,"FUN")

the test passed to FUN or the test selected based on varname and byvar if FUN = NULL

attr(,"p.value")

the numeric p-value returned by FUN

attr(,"fnames")

a vector of the default FUN options with names to match the appropriate dagger character; see examples; if FUN is given, the function name will be added with a new dagger symbol ("*" by default or dagger if given)

attr(,"tfoot")

a footnote for the table using each dagger and corresponding test name

See Also

fisher.test; wilcox.test; t.test; kruskal.test; chisq.test; anova; cuzick.test; jt.test; kw.test; ca.test

Other tabler: tabler_by(), tabler_resp(), tabler_stat2(), tabler()

Examples

tabler_stat(mtcars, 'mpg', 'cyl')                 # picks kruskal-wallis
tabler_stat(mtcars, 'mpg', 'cyl', FUN = NA)       # no test, no p-value column
tabler_stat(mtcars, 'mpg', 'cyl', FUN = FALSE)    # no test, p-value column
tabler_stat(mtcars, 'mpg', 'cyl', FUN = 'fisher') # force fisher test
tabler_stat(mtcars, 'mpg', 'cyl', FUN = 'anova')  # force anova test

## use of a custom function - see ?rawr::cuzick.test
tabler_stat(mtcars, 'mpg', 'cyl',
  continuous_fn = Gmisc::describeMean,
  FUN = function(x, y)
    cuzick.test(x ~ y, data.frame(x, y))$p.value)

## "cuzick" is also an option for FUN
tabler_stat(mtcars, 'mpg', 'cyl', FUN = 'cuzick')


## typical usage
mt <- within(mtcars, {
  mpg[1:5] <- carb[1:5] <- drat[1:20] <- NA
  carb <- factor(carb, ordered = TRUE)
  cyl  <- factor(cyl)
})

tbl <- lapply(names(mt)[-10L], function(x)
  tabler_stat(mt, x, 'gear', percentage_sign = FALSE,
              color_cell_by = ifelse(is.factor(mt[, x]), 'pct', 'none'),
              continuous_fn = Gmisc::describeMean))

ht <- htmlTable::htmlTable(
  do.call('rbind', tbl),
  cgroup = c('', 'Gear', ''), n.cgroup = c(1, 3, 1),
  rgroup = names(mt)[-10L], n.rgroup = sapply(tbl, nrow),
  tfoot = toString(unique(unlist(strsplit(sapply(seq_along(tbl), function(ii)
    attr(tbl[[ii]], 'tfoot')), ', '))))
)
structure(ht, class = 'htmlTable')


## survival object (median, 95% CI, log-rank test)
library('survival')
mt <- within(mt, {
  surv <- Surv(wt, vs)
})
tabler_stat(mt, 'surv', 'gear')


## use the tabler_stat2 wrapper for convenience
tabler_stat2(mt, names(mt)[-10L], 'gear')

tabler_stat2(mt, names(mt)[-10L], 'gear', FUN = c(cyl = 'jt'))

mt$gear <- factor(mt$gear, ordered = TRUE)
tabler_stat2(mt, names(mt)[-10L], 'gear',
  format_pval = function(x) format.pval(x, digits = 3))


raredd/rawr documentation built on April 29, 2024, 10:29 a.m.