spar: Sparse Projected Averaged Regression

View source: R/spareg.R

sparR Documentation

Sparse Projected Averaged Regression

Description

Apply Sparse Projected Averaged Regression to high-dimensional data by building an ensemble of generalized linear models, where the high-dimensional predictors can be screened using a screening coefficient and then projected using data-agnostic or data-informed random projection matrices. This function performs the procedure for a given grid of thresholds \nu and a grid of the number of marginal models to be employed in the ensemble. This function is also used in the cross-validated procedure spar.cv.

Usage

spar(
  x,
  y,
  family = gaussian("identity"),
  model = NULL,
  rp = NULL,
  screencoef = NULL,
  xval = NULL,
  yval = NULL,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  avg_type = c("link", "response"),
  parallel = FALSE,
  inds = NULL,
  RPMs = NULL,
  seed = NULL,
  ...
)

spareg(
  x,
  y,
  family = gaussian("identity"),
  model = NULL,
  rp = NULL,
  screencoef = NULL,
  xval = NULL,
  yval = NULL,
  nnu = 20,
  nus = NULL,
  nummods = c(20),
  measure = c("deviance", "mse", "mae", "class", "1-auc"),
  avg_type = c("link", "response"),
  parallel = FALSE,
  inds = NULL,
  RPMs = NULL,
  seed = NULL,
  ...
)

Arguments

x

n x p numeric matrix of predictor variables.

y

quantitative response vector of length n.

family

a family object used for the marginal generalized linear model, default gaussian("identity").

model

function creating a 'sparmodel' object; defaults to spar_glm() for gaussian family with identity link and to spar_glmnet() for all other family-link combinations.

rp

function creating a 'randomprojection' object. Defaults to NULL. In this case rp_cw(data = TRUE) is used.

screencoef

function creating a 'screeningcoef' object. Defaults to NULL. In this case no screening is used is used.

xval

optional matrix of predictor variables observations used for validation of threshold nu and number of models; x is used if not provided.

yval

optional response observations used for validation of threshold nu and number of models; y is used if not provided.

nnu

number of different threshold values \nu to consider for thresholding; ignored when nus are given; defaults to 20.

nus

optional vector of \nu's to consider for thresholding; if not provided, nnu values ranging from 0 to the maximum absolute marginal coefficient are used.

nummods

vector of numbers of marginal models to consider for validation; defaults to c(20).

measure

loss to use for validation; defaults to "deviance" available for all families. Other options are "mse" or "mae" (between responses and predicted means, for all families), "class" (misclassification error) and "1-auc" (one minus area under the ROC curve) both just for binomial family.

avg_type

type of averaging the marginal models; either on link (default) or on response level. This is used in computing the validation measure.

parallel

assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE.

inds

optional list of index-vectors corresponding to variables kept after screening in each marginal model of length max(nummods); dimensions need to fit those of RPMs.

RPMs

optional list of projection matrices used in each marginal model of length max(nummods), diagonal elements will be overwritten with a coefficient only depending on the given x and y.

seed

integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set.

...

further arguments mainly to ensure back-compatibility

Value

object of class 'spar' with elements

  • betas p x max(nummods) sparse matrix of class 'Matrix::dgCMatrix' containing the standardized coefficients from each marginal model

  • intercepts used in each marginal model

  • scr_coef vector of length p with coefficients used for screening the standardized predictors

  • inds list of index-vectors corresponding to variables kept after screening in each marginal model of length max(nummods)

  • RPMs list of projection matrices used in each marginal model of length max(nummods)

  • val_res data.frame with validation results (validation measure and number of active variables) for each element of nus and nummods

  • val_set logical flag, whether validation data were provided; if FALSE, training data were used for validation

  • family a character corresponding to family object used for the marginal generalized linear model e.g., "gaussian(identity)"

  • nus vector of \nu's considered for thresholding

  • nummods vector of numbers of marginal models considered for validation

  • ycenter empirical mean of initial response vector

  • yscale empirical standard deviation of initial response vector

  • xcenter p-vector of empirical means of initial predictor variables

  • xscale p-vector of empirical standard deviations of initial predictor variables

  • avg_type character, averaging type for computing the validation measure

  • measure character, type of validation measure used

  • rp an object of class "randomprojection"

  • screencoef an object of class "screeningcoef"

  • x_rows_for_fitting_marginal_models vector of row indicators from x which were used for fitting the marginal models, if screening was performed using screencoef with split_data_prop argument. Is NULL otherwise.

If a parallel backend is registered and parallel = TRUE, the foreach function is used to estimate the marginal models in parallel.

References

\insertRef

parzer2024lmspareg

\insertRef

parzer2024glmsspareg

\insertRef

Clarkson2013LowRankApproxspareg

\insertRef

ACHLIOPTAS2003JLspareg

See Also

spar.cv, coef.spar, predict.spar, plot.spar, print.spar

Examples

example_data <- simulate_spareg_data(n = 200, p = 400, ntest = 100)
spar_res <- spar(example_data$x, example_data$y, xval = example_data$xtest,
  yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))
coefs <- coef(spar_res)
pred <- predict(spar_res, xnew = example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "val_measure", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_measure", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "val_numactive",  plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_numactive",  plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res_vs_fitted",  xfit = example_data$xtest,
  yfit = example_data$ytest)
plot(spar_res, plot_type = "coefs", prange = c(1,400))

spar_res <- spareg(example_data$x, example_data$y, xval = example_data$xtest,
  yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))

spareg documentation built on Aug. 8, 2025, 6:46 p.m.