prep_groups: Prepare Groups for Imputation

View source: R/group_imp.R

prep_groupsR Documentation

Prepare Groups for Imputation

Description

Normalize and validate a grouping specification for use with group_imp(). Converts long-format or canonical list-column input into a validated slideimp_tbl, enforcing set relationships, pruning dropped columns, and optionally padding small groups.

Usage

prep_groups(
  obj_cn,
  group,
  subset = NULL,
  min_group_size = 0,
  allow_unmapped = FALSE,
  seed = NULL
)

Arguments

obj_cn

Character vector of column names from the data matrix (e.g., colnames(obj)). Every element must appear in group$feature unless allow_unmapped = TRUE.

group

Specification of how features should be grouped for imputation. Accepts three formats:

  • character: string naming a supported Illumina platform; see the Note section.

  • data.frame (Long format):

    • group: Column identifying the group for each feature.

    • feature: Character column of individual feature names.

  • data.frame (List-column format):

    • feature: List-column of character vectors to impute. A row is a group.

    • aux: (Optional) List-column of auxiliary names used for context.

    • parameters: (Optional) List-column of group-specific parameter lists.

subset

Character vector of feature names to impute (default NULL means impute all features). Must be a subset of obj_cn (colnames(obj)) and must appear in at least one group's feature. Features in a group but not in subset are demoted to auxiliary columns for that group. Groups left with zero features after demotion are dropped with a message.

min_group_size

Integer or NULL. Minimum column count (features + aux) per group. Groups smaller than this are padded with randomly sampled columns from obj. Passed to prep_groups() internally.

allow_unmapped

Logical. If FALSE, every column in colnames(obj) must appear in group. If TRUE, columns with no group assignment are left untouched (neither imputed nor used as auxiliary columns) and a message is issued instead of an error.

seed

Numeric or NULL. Random seed for reproducibility.

Details

Set Validation

Let A = obj_cn and B = the union of all feature and auxiliary names in group. The function enforces A \subseteq B: every column in the matrix must appear somewhere in the manifest.

  • ⁠Pruning:⁠ Elements in B but not in A (e.g., QC-dropped probes) are silently pruned from each group.

  • ⁠Dropping:⁠ Groups left with zero features after pruning are removed entirely with a diagnostic message.

Value

A data.frame of class slideimp_tbl containing:

  • group: Original group labels (if provided) or sequential group labels.

  • feature: A list-column of character vectors (feature names).

  • aux: A list-column of character vectors (auxiliary names).

  • parameters: A list-column of per-group configuration lists.

See Also

group_imp()


slideimp documentation built on April 17, 2026, 1:07 a.m.