svem_significance_test: SVEM Significance Test with Mixture Support

View source: R/svem_significance_test.R

svem_significance_testR Documentation

SVEM Significance Test with Mixture Support

Description

Performs a whole-model significance test using the SVEM framework and allows the user to specify mixture factor groups. Mixture factors are sets of continuous variables that are constrained to sum to a constant (the mixture total) and have optional lower and upper bounds. When mixture groups are supplied, the grid of evaluation points is generated by sampling Dirichlet variates over the mixture simplex rather than by independently sampling each continuous predictor. Non-mixture continuous predictors are sampled via a maximin Latin hypercube over their observed ranges, and categorical predictors are sampled from their observed levels.

Usage

svem_significance_test(
  formula,
  data,
  mixture_groups = NULL,
  nPoint = 2000,
  nSVEM = 10,
  nPerm = 150,
  percent = 90,
  nBoot = 100,
  glmnet_alpha = c(1),
  weight_scheme = c("SVEM"),
  objective = c("auto", "wAIC", "wBIC", "wGIC", "wSSE"),
  auto_ratio_cutoff = 1.3,
  gamma = 2,
  relaxed = FALSE,
  verbose = TRUE,
  ...
)

Arguments

formula

A formula specifying the model to be tested.

data

A data frame containing the variables in the model.

mixture_groups

Optional list describing one or more mixture factor groups. Each element of the list should be a list with components vars (character vector of column names), lower (numeric vector of lower bounds of the same length as vars), upper (numeric vector of upper bounds of the same length), and total (scalar specifying the sum of the mixture variables). All mixture variables must be included in vars, and no variable can appear in more than one mixture group. Defaults to NULL (no mixtures).

nPoint

Number of random points in the factor space (default: 2000).

nSVEM

Number of SVEM fits on the original data (default: 10).

nPerm

Number of SVEM fits on permuted responses for the reference distribution (default: 150).

percent

Percentage of variance to capture in the SVD (default: 90).

nBoot

Number of bootstrap iterations within each SVEM fit (default: 100).

glmnet_alpha

The alpha parameter(s) for glmnet (default: c(1)).

weight_scheme

Weighting scheme for SVEM (default: "SVEM").

objective

Objective used inside SVEMnet() to pick the bootstrap path solution. One of "auto", "wAIC", "wBIC", "wGIC", "wSSE" (default: "auto").

auto_ratio_cutoff

Single cutoff for the automatic rule when objective = "auto" (default 1.3). With r = n_X/p_X, if r >= auto_ratio_cutoff use wAIC; else wBIC. Passed to SVEMnet().

gamma

Penalty weight used only when objective = "wGIC" (default 2). Passed to SVEMnet().

relaxed

Logical; default FALSE. When TRUE, inner SVEMnet() fits use glmnet's relaxed elastic net path and select both lambda and relaxed gamma on each bootstrap. When FALSE, the standard glmnet path is used. This value is passed through to SVEMnet(). Note: if relaxed = TRUE and glmnet_alpha includes 0, ridge (alpha = 0) is dropped by SVEMnet() for relaxed fits.

verbose

Logical; if TRUE, displays progress messages (default: TRUE).

...

Additional arguments passed to SVEMnet() and then to glmnet() (for example: penalty.factor, offset, lower.limits, upper.limits, standardize.response, etc.). The relaxed setting is controlled by the relaxed argument of this function and any relaxed value passed via ... is ignored with a warning.

Details

If no mixture groups are supplied, this function behaves identically to a standard SVEM-based whole-model test, sampling non-mixture continuous variables via a maximin Latin hypercube within their observed ranges, and categorical variables from their observed levels.

Internally, predictions at evaluation points use predict.svem_model() with se.fit = TRUE. Rows with unseen categorical levels are returned as NA and are excluded from distance summaries via complete.cases().

Value

A list of class svem_significance_test containing:

  • p_value: median p-value across evaluation points.

  • p_values: vector of per-point p-values.

  • d_Y: distances for original fits.

  • d_pi_Y: distances for permutation fits.

  • distribution_fit: fitted SHASHo distribution object.

  • data_d: data frame combining distances and labels.

See Also

SVEMnet(), predict.svem_model()


SVEMnet documentation built on Sept. 9, 2025, 5:38 p.m.