aggregate_df: Aggregate data across an entire data frame using sufficient...
In MAPCtools: Multivariate Age-Period-Cohort (MAPC) Modeling for Health Data

View source: R/export-DataPreprocessing.R

aggregate_df

R Documentation

Aggregate data across an entire data frame using sufficient statistics

Description

Aggregates specified columns of a data frame into summarizing statistics, preserving the potentially complex structure returned by aggregator functions (like data frames or inla.mdata objects) within list-columns. Aggregation is performed according to sufficient statistics for the specified distribution of the columns. Possible distributions: Gaussian, binomial. This function aggregates the entire data frame into a single row result.

Usage

aggregate_df(
  data,
  gaussian = NULL,
  gaussian.precision.scales = NULL,
  binomial = NULL
)

Arguments

`data`	A data frame.
`gaussian`	Gaussian columns in `data` to be aggregated. The Gaussian observations are collapsed into an `inla.mdata` object compatible with the `agaussian` family, see the documentation for the `agaussian` family in `INLA` for details. Defaults to `NULL` (optional).
`gaussian.precision.scales`	Scales for the precision of Gaussian observations. Must be one of: `NULL`: Use default scales of 1 for all observations in all `gaussian` columns. A single numeric vector: Applied only if exactly one column is specified in `gaussian`. Length must match `nrow(data)`. A named list: Where `names(gaussian.precision.scales)` are the names of the Gaussian columns (must match columns specified in `gaussian`). Each list element must be a numeric vector of scales for that column, with length matching `nrow(data)`. Defaults to NULL (optional).
`binomial`	Binomial columns in `data` to be aggregated. Defaults to `NULL` (optional).

Value

A single-row data frame (tibble) containing:

A column n with the total number of rows in the input data.
For each specified column in gaussian, binomial, a corresponding list-column (named e.g., colname_gaussian, colname_binomial. Each element of these list-columns can be accessed by using the $ operator twice, e.g. through data$colname_gaussian$Y1 for the first element of the Gaussian summary.