cv.savvyPR: Cross-Validation for Parity Regression Model Estimation
In savvyPR: Savvy Parity Regression Model Estimation with 'savvyPR'

cv.savvyPR

R Documentation

Cross-Validation for Parity Regression Model Estimation

Description

Performs k-fold cross-validation for Parity Regression (PR) models to select optimal tuning parameters. The underlying PR methodology distributes the total prediction error evenly across all parameters, ensuring stability in the presence of high multicollinearity and substantial noise (such as time series data with structural changes and evolving trends). This function supports both Budget-based and Target-based parameterizations and evaluates models across a variety of loss metrics.

Usage

cv.savvyPR(
  x,
  y,
  method = c("budget", "target"),
  vals = NULL,
  nval = 100,
  lambda_vals = NULL,
  nlambda = 100,
  folds = 10,
  model_type = c("PR3", "PR1", "PR2"),
  measure_type = c("mse", "mae", "rmse", "mape"),
  foldid = FALSE,
  use_feature_selection = FALSE,
  standardize = FALSE,
  intercept = TRUE,
  exclude = NULL
)

Arguments

`x`	A matrix of predictors with rows as observations and columns as variables. Must not contain `NA` values, and should not include an intercept column of ones.
`y`	A numeric vector of the response variable, should have the same number of observations as `x`. Must not contain `NA` values.
`method`	Character string specifying the parameterization method to use: `"budget"` (default) or `"target"`.
`vals`	Optional; a numeric vector of values for tuning the PR model (acts as `c` for budget, or `t` for target). If `NULL`, a default sequence is generated based on the selected method. Must contain at least two values.
`nval`	Numeric value specifying the number of tuning values to try in the optimization process if `vals=NULL`. Defaults to 100.
`lambda_vals`	Optional; a numeric vector of `lambda` values used for regularization in the `"PR2"` and `"PR3"` model types. If `NULL` and model_type is `"PR2"` or `"PR3"`, a default sequence is used. Must contain at least two values.
`nlambda`	Numeric value specifying the number of `lambda_val` values to try in the optimization process if `lambda_vals=NULL`.
`folds`	The number of folds to be used in the cross-validation, default is `10`. Must be an integer `>= 3`.
`model_type`	Character string specifying the type of model to fit. Defaults to `"PR3"`. Can be one of `"PR3"`, `"PR1"`, or `"PR2"`. See details for further clarification.
`measure_type`	Character vector specifying the measure to use for model evaluation. Defaults to `"mse"`. Supported types include `"mse"`, `"mae"`, `"rmse"`, and `"mape"`.
`foldid`	Logical indicating whether to return fold assignments. Defaults to `FALSE`.
`use_feature_selection`	Logical indicating whether to perform feature selection during the model fitting process. Defaults to `FALSE`.
`standardize`	Logical indicating whether to standardize predictor variables. Defaults to `TRUE`.
`intercept`	Logical indicating whether to include an intercept in the model. Defaults to `TRUE`.
`exclude`	Optional; indicate if any variables should be excluded in the model fitting process.

Details

Cross-Validation for Parity Regression Model Estimation

This function facilitates cross-validation for parity regression models across a range of tuning values (val) and regularization values (\lambda), depending on the model type specified. Each model type handles the parameters differently:

PR1: Performs cross-validation only over the val sequence while fixing \lambda=0. This model type is primarily used when the focus is on understanding how different levels of risk parity constraints impact the model performance purely based on the parity mechanism without the influence of ridge \lambda shrinkage.
PR2: Uses a fixed \lambda value determined by performing a ridge regression (lambda optimization) using cv.glmnet on the dataset. It then performs cross-validation over the val sequence while using this optimized \lambda value. This approach is useful when one wishes to maintain a stable amount of standard shrinkage while exploring the impact of varying levels of the proportional contribution constraint.
PR3: First, determines an optimal val using the same method as PR1. Then, keeping this val fixed, it conducts a cross-validation over all possible \lambda values. This dual-stage optimization can be particularly effective when the initial parity regularization needs further refinement via \lambda adjustment.

The function supports several types of loss metrics for assessing model performance:

mse: Mean Squared Error: Measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value.
mae: Mean Absolute Error: Measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.
rmse: Root Mean Squared Error: It is the square root of the mean of the squared errors. RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is prediction.
mape: Mean Absolute Percentage Error: Measures the size of the error in percentage terms. It is calculated as the average of the unsigned percentage error, as shown above. Because it is based on relative errors, it is less sensitive to large deviations in small true values.

The choice of measure impacts how the model's performance is assessed during cross-validation. Users should select the measure that best reflects the requirements of their specific analytical context.

Value

A list of class "cv.savvyPR" containing the following components based on the specified model_type:

`call`	The matched call used to invoke the function.
`coefficients`	The optimal coefficients results of the final fitted model.
`mean_error_cv`	A vector of computed error values across all tested parameters.
`model_type`	The type of PR model used: `PR1`, `PR2`, or `PR3`.
`measure_type`	The loss measure used for evaluation, with a descriptive name.
`method`	The parameterization method used: `"budget"` or `"target"`.
`PR_fit`	The final fitted model object from the `savvyPR` function.
`coefficients_cv`	A matrix of average coefficients across all cross-validation folds for each tuning parameter.
`vals`	The tuning values (acting as c or t) used in the cross-validation process.
`lambda_vals`	The `lambda` values used in the cross-validation process, applicable to `PR2` and `PR3`.
`optimal_val`	The optimal tuning value found from cross-validation, applicable to `PR1` and `PR2`.
`fixed_val`	The fixed tuning value used in `PR3`, derived from an initial PR1-style optimization.
`optimal_lambda_val`	The optimal `lambda` value found in `PR3`.
`fixed_lambda_val`	The fixed `lambda` value used in `PR2`, derived from `cv.glmnet`.
`optimal_index`	A list detailing the indices of the optimal parameters within the cross-validation matrix.
`fold_assignments`	(Optional) The fold assignments used during the cross-validation, provided if `foldid=TRUE`.

Author(s)

Ziwei Chen, Vali Asimit and Pietro Millossovich
Maintainer: Ziwei Chen <ziwei.chen.3@citystgeorges.ac.uk>

References

Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Prity Regression Estimation. Retrieved from https://openaccess.city.ac.uk/id/eprint/37017/

The optimization technique employed follows the algorithm described by: F. Spinu (2013). An Algorithm for Computing Risk Parity Weights. SSRN Preprint. doi:10.2139/ssrn.2297383

Examples


# Generate synthetic data
set.seed(123)
n <- 100 # Number of observations
p <- 12  # Number of variables
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(rnorm(p), p, 1)
y <- x %*% beta + rnorm(n, sd = 0.5)

# Example 1: PR1 with "budget" method (focusing on c values with MSE)
result_pr1_budget <- cv.savvyPR(x, y, method = "budget", model_type = "PR1")
print(result_pr1_budget)

# Example 2: PR1 with "target" method
result_pr1_target <- cv.savvyPR(x, y, method = "target", model_type = "PR1")
print(result_pr1_target)

# Example 3: PR3 (default model_type) exploring budget parameter
result_pr3 <- cv.savvyPR(x, y, method = "budget", folds = 5)
print(result_pr3)

savvyPR documentation built on April 7, 2026, 5:08 p.m.