simplicaCV: Test Simplivariate Components with Cross-Validation Pattern...

View source: R/simplicaCV.R

simplicaCVR Documentation

Test Simplivariate Components with Cross-Validation Pattern Selection

Description

This function performs cross-validation-based pattern testing for Simplivariate Components in a SIMPLICA object. It evaluates different pattern functions using cross-validation and selects the best performing pattern for each component. Fitters are required for all patterns with no fallback options.

Usage

simplicaCV(
  foundObject,
  df,
  patternFunctions = defaultPatternFunctions(),
  patternFitters = defaultPatternFitters(),
  preferenceOrder = names(patternFunctions),
  nRepeats = 40,
  testFraction = 0.2,
  minCellsForModels = 25,
  parsimonyMargin = 0.05,
  requireFitters = TRUE,
  updateObject = TRUE,
  verbose = FALSE,
  ignoreNaComponents = TRUE
)

Arguments

foundObject

A simplica object containing Simplivariate Components

df

Data frame or matrix with the original data

patternFunctions

List of pattern functions to evaluate (default: defaultPatternFunctions())

patternFitters

List of pattern fitting functions (default: defaultPatternFitters())

preferenceOrder

Character vector specifying preference order for pattern selection (default: names(patternFunctions))

nRepeats

Integer, number of cross-validation repeats (default: 40)

testFraction

Numeric, fraction of data to use for testing (default: 0.2)

minCellsForModels

Integer, minimum number of cells required for model fitting (default: 25)

parsimonyMargin

Numeric, margin for parsimony-based model selection (default: 0.05)

requireFitters

Logical, whether fitters are required for all patterns (default: TRUE)

updateObject

Logical, whether to update and return the input object (default: TRUE)

verbose

Logical, whether to print progress messages (default: FALSE)

ignoreNaComponents

Logical, whether to skip components with NA patterns (default: TRUE)

Details

The function performs the following steps:

  • Validates the input simplica object and data dimensions

  • Checks that all pattern functions have corresponding fitters

  • For each simplivariate component, performs cross-validation pattern evaluation

  • Selects the best performing pattern based on RMSE and parsimony

  • Updates component patterns and provides detailed test information

Value

If updateObject = TRUE, returns the input simplica object with two new fields:

componentPatternsUpdated

Character vector with the selected pattern per component after cross-validation. If a component is skipped or empty, the entry is NA.

componentAudit

Data frame containing detailed cross-validation results for each component, with the following columns:

componentId

Numeric ID of the component.

originalPattern

Pattern label originally assigned.

selectedPattern

Pattern chosen after CV-based evaluation.

reason

Explanation of why a pattern was selected or skipped.

nRows, nCols, nCells

Dimensions of the component.

nRepeats, testFraction, parsimonyMargin

CV settings used.

cvMean_<pattern>

Mean RMSE over CV folds for each tested pattern.

cvSd_<pattern>

Standard deviation of RMSE across CV folds.

winFrac_<pattern>

Fraction of CV repeats where the pattern was the best performer.

If updateObject = FALSE, returns a list with the same two elements (componentPatternsUpdated, componentAudit).


SIMPLICA documentation built on Sept. 11, 2025, 1:08 a.m.