cv.hfr: Cross validation for a hierarchical feature regression

View source: R/cv.hfr.R

cv.hfrR Documentation

Cross validation for a hierarchical feature regression

Description

HFR is a regularized regression estimator that decomposes a least squares regression along a supervised hierarchical graph, and shrinks the edges of the estimated graph to regularize parameters. The algorithm leads to group shrinkage in the regression parameters and a reduction in the effective model degrees of freedom.

Usage

cv.hfr(
  x,
  y,
  weights = NULL,
  kappa = seq(0, 1, by = 0.1),
  q = NULL,
  intercept = TRUE,
  standardize = TRUE,
  nfolds = 10,
  foldid = NULL,
  partial_method = c("pairwise", "shrinkage"),
  ridge_lambda = 0,
  ...
)

Arguments

x

Input matrix or data.frame, of dimension (N x p); each row is an observation vector.

y

Response variable.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used for the level-specific regressions.

kappa

A vector of target effective degrees of freedom of the regression.

q

Thinning parameter representing the quantile cut-off (in terms of contributed variance) above which to consider levels in the hierarchy. This can used to reduce the number of levels in high-dimensional problems. Default is no thinning.

intercept

Should intercept be fitted. Default is intercept=TRUE.

standardize

Logical flag for x variable standardization prior to fitting the model. The coefficients are always returned on the original scale. Default is standardize=TRUE.

nfolds

The number of folds for k-fold cross validation. Default is nfolds=10.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.

partial_method

Indicate whether to use pairwise partial correlations, or shrinkage partial correlations.

ridge_lambda

Optional penalty for level-specific regressions (useful in high-dimensional case)

...

Additional arguments passed to hclust.

Details

This function fits an HFR to a grid of kappa hyperparameter values. The result is a matrix of coefficients with one column for each hyperparameter. By evaluating all hyperparameters in a single function, the speed of the cross-validation procedure is improved substantially (since level-specific regressions are estimated only once).

When nfolds > 1, a cross validation is performed with shuffled data. Alternatively, test slices can be passed to the function using the foldid argument. The result of the cross validation is given by best_kappa in the output object.

Value

A 'cv.hfr' regression object.

Author(s)

Johann Pfitzinger

References

Pfitzinger, J. (2022). Cluster Regularization via a Hierarchical Feature Regression. arXiv 2107.04831[statML]

See Also

hfr, coef, plot and predict methods

Examples

x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit = cv.hfr(x, y, kappa = seq(0, 1, by = 0.1))
coef(fit)


hfr documentation built on Jan. 22, 2023, 1:46 a.m.

Related to cv.hfr in hfr...