aggregate_df: Aggregate data across an entire data frame using sufficient...

View source: R/export-DataPreprocessing.R

aggregate_dfR Documentation

Aggregate data across an entire data frame using sufficient statistics

Description

Aggregates specified columns of a data frame into summarizing statistics, preserving the potentially complex structure returned by aggregator functions (like data frames or inla.mdata objects) within list-columns. Aggregation is performed according to sufficient statistics for the specified distribution of the columns. Possible distributions: Gaussian, binomial. This function aggregates the entire data frame into a single row result.

Usage

aggregate_df(
  data,
  gaussian = NULL,
  gaussian.precision.scales = NULL,
  binomial = NULL
)

Arguments

data

A data frame.

gaussian

Gaussian columns in data to be aggregated. The Gaussian observations are collapsed into an inla.mdata object compatible with the agaussian family, see the documentation for the agaussian family in INLA for details. Defaults to NULL (optional).

gaussian.precision.scales

Scales for the precision of Gaussian observations.
Must be one of:

  • NULL: Use default scales of 1 for all observations in all gaussian columns.

  • A single numeric vector: Applied only if exactly one column is specified in gaussian. Length must match nrow(data).

  • A named list: Where names(gaussian.precision.scales) are the names of the Gaussian columns (must match columns specified in gaussian). Each list element must be a numeric vector of scales for that column, with length matching nrow(data).
    Defaults to NULL (optional).

binomial

Binomial columns in data to be aggregated. Defaults to NULL (optional).

Value

A single-row data frame (tibble) containing:

  • A column n with the total number of rows in the input data.

  • For each specified column in gaussian, binomial, a corresponding list-column (named e.g., colname_gaussian, colname_binomial. Each element of these list-columns can be accessed by using the $ operator twice, e.g. through data$colname_gaussian$Y1 for the first element of the Gaussian summary.


MAPCtools documentation built on June 25, 2025, 5:09 p.m.