report_sample: Sample Description
In report: Automated Reporting of Results and Statistical Models

report_sample

R Documentation

Sample Description

Description

Create sample description table (also referred to as "Table 1").

Usage

report_sample(
  data,
  by = NULL,
  centrality = "mean",
  ci = NULL,
  ci_method = "wilson",
  ci_correct = FALSE,
  select = NULL,
  exclude = NULL,
  weights = NULL,
  total = TRUE,
  digits = 2,
  n = FALSE,
  group_by = NULL,
  ...
)

Arguments

`data`	A data frame for which descriptive statistics should be created.
`by`	Character vector, indicating the column(s) for possible grouping of the descriptive table. Note that weighting (see `weights`) does not work with more than one grouping column.
`centrality`	Character, indicates the statistics that should be calculated for numeric variables. May be `"mean"` (for mean and standard deviation) or `"median"` (for median and median absolute deviation) as summary.
`ci`	Level of confidence interval for relative frequencies (proportions). If not `NULL`, confidence intervals are shown for proportions of factor levels.
`ci_method`	Character, indicating the method how to calculate confidence intervals for proportions. Currently implemented methods are `"wald"` and `"wilson"`. Note that `"wald"` can produce intervals outside the plausible range of [0, 1], and thus it is recommended to prefer the `"wilson"` method. The formulae for the confidence intervals are: `"wald"`: `p \pm z \sqrt{\frac{p (1 - p)}{n}}` `"wilson"`: `\frac{2np + z^2 \pm z \sqrt{z^2 + 4npq}}{2(n + z^2)}` where `p` is the proportion (of a factor level), `q` is `1-p`, `z` is the critical z-score based on the interval level and `n` is the length of the vector (cf. Newcombe 1998, Wilson 1927).
`ci_correct`	Logical, it `TRUE`, applies continuity correction. See Newcombe 1998 for different correction-methods based on the chosen `ci_method`.
`select`	Character vector, with column names that should be included in the descriptive table.
`exclude`	Character vector, with column names that should be excluded from the descriptive table.
`weights`	Character vector, indicating the name of a potential weight-variable. Reported descriptive statistics will be weighted by `weight`.
`total`	Add a `Total` column.
`digits`	Number of decimals.
`n`	Logical, actual sample size used in the calculation of the reported descriptive statistics (i.e., without the missing values).
`group_by`	Deprecated. Use `by` instead.
`...`	Arguments passed to or from other methods.

Value

A data frame of class report_sample with variable names and their related summary statistics.

References

Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in Medicine. 17 (8): 857–872
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association. 22 (158): 209–212

Examples

library(report)

report_sample(iris[, 1:4])
report_sample(iris, select = c("Sepal.Length", "Petal.Length", "Species"))
report_sample(iris, by = "Species")
report_sample(airquality, by = "Month", n = TRUE, total = FALSE)

# confidence intervals for proportions
set.seed(123)
d <- data.frame(x = factor(sample(letters[1:3], 100, TRUE, c(0.01, 0.39, 0.6))))
report_sample(d, ci = 0.95, ci_method = "wald") # ups, negative CI
report_sample(d, ci = 0.95, ci_method = "wilson") # negative CI fixed
report_sample(d, ci = 0.95, ci_correct = TRUE) # continuity correction

report documentation built on April 3, 2025, 7:34 p.m.