spar: Sparse Projected Averaged Regression
In spareg: Sparse Projected Averaged Regression

View source: R/spareg.R

spar	R Documentation

Sparse Projected Averaged Regression

Description

Apply Sparse Projected Averaged Regression to high-dimensional data by building an ensemble of generalized linear models, where the high-dimensional predictors can be screened using a screening coefficient and then projected using data-agnostic or data-informed random projection matrices. This function performs the procedure for a given grid of thresholds \nu and a grid of the number of marginal models to be employed in the ensemble. This function is also used in the cross-validated procedure spar.cv.

Usage

spar(
  x,
  y,
  family = gaussian("identity"),
  model = NULL,
  rp = NULL,
  screencoef = NULL,
  xval = NULL,
  yval = NULL,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  avg_type = c("link", "response"),
  parallel = FALSE,
  inds = NULL,
  RPMs = NULL,
  seed = NULL,
  ...
)

spareg(
  x,
  y,
  family = gaussian("identity"),
  model = NULL,
  rp = NULL,
  screencoef = NULL,
  xval = NULL,
  yval = NULL,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  avg_type = c("link", "response"),
  parallel = FALSE,
  inds = NULL,
  RPMs = NULL,
  seed = NULL,
  ...
)

Arguments

`x`	n x p numeric matrix of predictor variables.
`y`	quantitative response vector of length n.
`family`	a family object used for the marginal generalized linear model, default `gaussian("identity")`.
`model`	function creating a `'sparmodel'` object; defaults to `spar_glm()` for gaussian family with identity link and to `spar_glmnet()` for all other family-link combinations.
`rp`	function creating a `'randomprojection'` object. Defaults to NULL. In this case `rp_cw(data = TRUE)` is used.
`screencoef`	function creating a `'screeningcoef'` object. Defaults to NULL. In this case no screening is used is used.
`xval`	optional matrix of predictor variables observations used for validation of threshold nu and number of models; `x` is used if not provided.
`yval`	optional response observations used for validation of threshold nu and number of models; `y` is used if not provided.
`nnu`	number of different threshold values `\nu` to consider for thresholding; ignored when nus are given; defaults to 20.
`nus`	optional vector of `\nu`'s to consider for thresholding; if not provided, `nnu` values ranging from 0 to the maximum absolute marginal coefficient are used.
`nummods`	vector of numbers of marginal models to consider for validation; defaults to `c(20)`.
`measure`	loss to use for validation; defaults to `"deviance"` available for all families. Other options are `"mse"` or `"mae"` (between responses and predicted means, for all families), `"class"` (misclassification error) and `"1-auc"` (one minus area under the ROC curve) both just for binomial family.
`avg_type`	type of averaging the marginal models; either on link (default) or on response level. This is used in computing the validation measure.
`parallel`	assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE.
`inds`	optional list of index-vectors corresponding to variables kept after screening in each marginal model of length `max(nummods)`; dimensions need to fit those of RPMs.
`RPMs`	optional list of projection matrices used in each marginal model of length `max(nummods)`, diagonal elements will be overwritten with a coefficient only depending on the given `x` and `y`.
`seed`	integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set.
`...`	further arguments mainly to ensure back-compatibility

Value

object of class 'spar' with elements

betas p x max(nummods) sparse matrix of class 'Matrix::dgCMatrix' containing the standardized coefficients from each marginal model
intercepts used in each marginal model
scr_coef vector of length p with coefficients used for screening the standardized predictors
inds list of index-vectors corresponding to variables kept after screening in each marginal model of length max(nummods)
RPMs list of projection matrices used in each marginal model of length max(nummods)
val_res data.frame with validation results (validation measure and number of active variables) for each element of nus and nummods
val_set logical flag, whether validation data were provided; if FALSE, training data were used for validation
family a character corresponding to family object used for the marginal generalized linear model e.g., "gaussian(identity)"
nus vector of \nu's considered for thresholding
nummods vector of numbers of marginal models considered for validation
ycenter empirical mean of initial response vector
yscale empirical standard deviation of initial response vector
xcenter p-vector of empirical means of initial predictor variables
xscale p-vector of empirical standard deviations of initial predictor variables
avg_type character, averaging type for computing the validation measure
measure character, type of validation measure used
rp an object of class "randomprojection"
screencoef an object of class "screeningcoef"
x_rows_for_fitting_marginal_models vector of row indicators from x which were used for fitting the marginal models, if screening was performed using screencoef with split_data_prop argument. Is NULL otherwise.

If a parallel backend is registered and parallel = TRUE, the foreach function is used to estimate the marginal models in parallel.

References

\insertRef

parzer2024lmspareg

\insertRef

parzer2024glmsspareg

\insertRef

Clarkson2013LowRankApproxspareg

\insertRef

ACHLIOPTAS2003JLspareg

Examples

example_data <- simulate_spareg_data(n = 200, p = 400, ntest = 100)
spar_res <- spar(example_data$x, example_data$y, xval = example_data$xtest,
  yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))
coefs <- coef(spar_res)
pred <- predict(spar_res, xnew = example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "val_measure", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_measure", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "val_numactive",  plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_numactive",  plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res_vs_fitted",  xfit = example_data$xtest,
  yfit = example_data$ytest)
plot(spar_res, plot_type = "coefs", prange = c(1,400))

spar_res <- spareg(example_data$x, example_data$y, xval = example_data$xtest,
  yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))

spareg documentation built on Aug. 8, 2025, 6:46 p.m.