summarize_data: Summarize Data
In metrumresearchgroup/pmforest: Create forest plots

View source: R/summarize-data.R

summarize_data

R Documentation

Summarize Data

Description

Summarize input data to prepare for passing to plot_forest(). Takes a data.frame or tibble, calculates the relevant confidence intervals, and returns a tibble that can be passed directly to plot_forest(). See Details section for data specification and format.

Usage

summarize_data(
  data,
  value,
  group,
  group_level = NULL,
  metagroup = NULL,
  replicate = NULL,
  probs = c(0.05, 0.95),
  statistic = c("median", "mean", "geo_mean"),
  rep_probs = c(0.025, 0.975),
  rep_statistic = c("median", "mean", "geo_mean")
)

Arguments

`data`	A dataframe or tibble to summarize. See Details section for required format.
`value`	name of the column in `data` to perform calculations on (i.e. median/mean, lower, and upper CI)
`group`	name of the column in `data` that defines groups within the data. Often, this will contain the names of the covariates you are grouping by.
`group_level`	(optional) name of the column in `data` that contains subgroups to group by. For example, if your `group` column contains covariates like `WEIGHT` and `AGE`, this column could contain categories like `underweight`, `average`, `overweight`, `young`, `mid`, `elderly`, etc.
`metagroup`	(optional) name of the column in `data` that contains `metagroups`. Similar to facet wrap, if passed, this will cause `plot_forest()` to produce independent plots per metagroup.
`replicate`	(optional) name of the column in `data` that contains to an index of replicates, for example with multiple simulations or bootstrapping. If specified, `plot_forest()` will draw additional CI's of the individual statistics, as small lines above each primary line.
`probs`	numeric vector of length two, both between 0 and 1, corresponding to your lower and upper tail probabilities. Defaults to `c(0.05, 0.95)`
`statistic`	is the actual statistic to output (i.e. median/mean)
`rep_probs`	same as `probs` but used only when `replicate` is passed for the minor intervals (i.e. the small lines) above the major interval (i.e. the big lines).
`rep_statistic`	same as `statistic` but used only when `replicate` is passed for the minor intervals (i.e. the small lines) above the major interval (i.e. the big lines).

Details

Input Data

The tibble passed to data must be in a "long" format and has 2-5 columns: value, group, and optionally any of group_level, metagroup, and/or replicate. These are each described in detail in the input arguments section.

Output Data

The tibble output from this function has one of two formats, depending on whether replicate was passed (details below).

Either way, the output tibble has a column named group, containing the values in the column you passed to the group argument, and optionally analogous columns for group_level and metagroup if those were passed.

Without replicate If replicate is not passed, the output data has three additional columns mid, lo, and hi, containing the summarized values corresponding to what was passed to statistic (mid) and probs (lo/hi).
With replicate If replicate is passed, the output data has nine additional columns mid_mid, mid_lo, mid_hi, plus three more each for ⁠lo_*⁠ and ⁠hi_*⁠, containing the summarized values. In this case, the mid_mid, lo_mid, and hi_mid correspond to the values of the major interval (i.e. the big lines and data point) and the ⁠*_mid⁠, ⁠*_lo⁠, and ⁠*_hi⁠ correspond to the values for each minor interval (i.e. the small lines).