spar | R Documentation |
Apply Sparse Projected Averaged Regression to high-dimensional data by
building an ensemble of generalized linear models, where the high-dimensional
predictors can be screened using a screening coefficient and then projected
using data-agnostic or data-informed random projection matrices.
This function performs the procedure for a given grid of thresholds \nu
and a grid of the number of marginal models to be employed in the ensemble.
This function is also used in the cross-validated procedure spar.cv.
spar(
x,
y,
family = gaussian("identity"),
model = NULL,
rp = NULL,
screencoef = NULL,
xval = NULL,
yval = NULL,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
avg_type = c("link", "response"),
parallel = FALSE,
inds = NULL,
RPMs = NULL,
seed = NULL,
...
)
spareg(
x,
y,
family = gaussian("identity"),
model = NULL,
rp = NULL,
screencoef = NULL,
xval = NULL,
yval = NULL,
nnu = 20,
nus = NULL,
nummods = c(20),
measure = c("deviance", "mse", "mae", "class", "1-auc"),
avg_type = c("link", "response"),
parallel = FALSE,
inds = NULL,
RPMs = NULL,
seed = NULL,
...
)
x |
n x p numeric matrix of predictor variables. |
y |
quantitative response vector of length n. |
family |
a family object used for the marginal generalized linear model,
default |
model |
function creating a |
rp |
function creating a |
screencoef |
function creating a |
xval |
optional matrix of predictor variables observations used for
validation of threshold nu and number of models; |
yval |
optional response observations used for validation of
threshold nu and number of models; |
nnu |
number of different threshold values |
nus |
optional vector of |
nummods |
vector of numbers of marginal models to consider for
validation; defaults to |
measure |
loss to use for validation; defaults to |
avg_type |
type of averaging the marginal models; either on link (default) or on response level. This is used in computing the validation measure. |
parallel |
assuming a parallel backend is loaded and available, a logical indicating whether the function should use it in parallelizing the estimation of the marginal models. Defaults to FALSE. |
inds |
optional list of index-vectors corresponding to variables kept
after screening in each marginal model of length |
RPMs |
optional list of projection matrices used in each
marginal model of length |
seed |
integer seed to be set at the beginning of the SPAR algorithm. Default to NULL, in which case no seed is set. |
... |
further arguments mainly to ensure back-compatibility |
object of class 'spar'
with elements
betas
p x max(nummods)
sparse matrix of class
'Matrix::dgCMatrix'
containing the
standardized coefficients from each marginal model
intercepts
used in each marginal model
scr_coef
vector of length p with coefficients used for screening the standardized predictors
inds
list of index-vectors corresponding to variables kept after screening in each marginal model of length max(nummods)
RPMs
list of projection matrices used in each marginal model of length max(nummods)
val_res
data.frame
with validation results (validation measure
and number of active variables) for each element of nus
and nummods
val_set
logical flag, whether validation data were provided;
if FALSE
, training data were used for validation
family
a character corresponding to family object used for the marginal generalized linear model e.g.,
"gaussian(identity)"
nus
vector of \nu
's considered for thresholding
nummods
vector of numbers of marginal models considered for validation
ycenter
empirical mean of initial response vector
yscale
empirical standard deviation of initial response vector
xcenter
p-vector of empirical means of initial predictor variables
xscale
p-vector of empirical standard deviations of initial predictor variables
avg_type
character, averaging type for computing the validation measure
measure
character, type of validation measure used
rp
an object of class "randomprojection"
screencoef
an object of class "screeningcoef"
x_rows_for_fitting_marginal_models
vector of row indicators from
x
which were used for fitting the marginal models, if screening was performed
using screencoef
with split_data_prop
argument. Is NULL
otherwise.
If a parallel backend is registered and parallel = TRUE
,
the foreach function
is used to estimate the marginal models in parallel.
parzer2024lmspareg
\insertRefparzer2024glmsspareg
\insertRefClarkson2013LowRankApproxspareg
\insertRefACHLIOPTAS2003JLspareg
spar.cv, coef.spar, predict.spar, plot.spar, print.spar
example_data <- simulate_spareg_data(n = 200, p = 400, ntest = 100)
spar_res <- spar(example_data$x, example_data$y, xval = example_data$xtest,
yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))
coefs <- coef(spar_res)
pred <- predict(spar_res, xnew = example_data$x)
plot(spar_res)
plot(spar_res, plot_type = "val_measure", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_measure", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "val_numactive", plot_along = "nummod", nu = 0)
plot(spar_res, plot_type = "val_numactive", plot_along = "nu", nummod = 10)
plot(spar_res, plot_type = "res_vs_fitted", xfit = example_data$xtest,
yfit = example_data$ytest)
plot(spar_res, plot_type = "coefs", prange = c(1,400))
spar_res <- spareg(example_data$x, example_data$y, xval = example_data$xtest,
yval = example_data$ytest, nummods=c(5, 10, 15, 20, 25, 30))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.