cvdglars | R Documentation |
Uses the k
-fold cross-validation deviance to estimate the solution point of the dgLARS solution curve.
cvdglars(formula, family = gaussian, g, unpenalized,
b_wght, data, subset, contrasts = NULL, control = list())
cvdglars.fit(X, y, family = gaussian, g, unpenalized,
b_wght, control = list())
formula |
an object of class “ |
family |
a description of the error distribution and link
function used to specify the model. This can be a character string
naming a family function or the result of a call to a family function
(see |
g |
argument available only for |
unpenalized |
a vector used to specify the unpenalized estimators;
|
b_wght |
a vector, with length equal to the number of columns of
the matrix |
data |
an optional data frame, list or environment (or object coercible by ‘as.data.frame’ to a data frame) containing the variables in the model. If not found in ‘data’, the variables are taken from ‘environment(formula)’. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
contrasts |
an optional list. See the ‘contrasts.arg’ of ‘model.matrix.default’. |
control |
a list of control parameters. See ‘Details’. |
X |
design matrix of dimension |
y |
response vector. When the |
cvdglars
function runs dglars
nfold
+1 times.
The deviance is stored, and the average and its standard deviation
over the folds are computed.
cvdglars.fit
is the workhorse function: it is more efficient
when the design matrix have already been calculated. For this reason
we suggest to use this function when the dgLARS method is applied in
a high-dimensional setting, i.e. when p>n
.
The control
argument is a list that can supply any of the following components:
algorithm
:a string specifying the algorithm used to
compute the solution curve. The predictor-corrector algorithm is used
when algorithm = ''pc''
(default), while the cyclic coordinate d
escent method is used setting algorithm = ''ccd''
;
method
:a string by means of to specify the kind of solution curve.
If method = ''dgLASSO''
(default) the algorithm computes the solution
curve defined by the differential geometric generalization of the LASSO
estimator; otherwise, if method = ''dgLARS''
, the differential geometric
generalization of the least angle regression method is used;
nfold
:a non negative integer used to specify the number of folds.
Although nfolds
can be as large as the sample size (leave-one-out CV), it
is not recommended for large datasets. Default is nfold = 10
;
foldid
a n
-dimensional vector of integers, between 1 and n
,
used to define the folds for the cross-validation. By default foldid
is
randomly generated;
ng
:number of values of the tuning parameter used to compute the
cross-validation deviance. Default is ng = 100
;
nv
:control parameter for the pc
algorithm. An integer value
belonging to the interval [1;min(n,p)]
(default is nv = min(n-1,p)
)
used to specify the maximum number of variables included in the final model;
np
:control parameter for the pc/ccd
algorithm. A non negative
integer used to define the maximum number of points of the solution curve. For the
predictor-corrector algorithm np
is set to 50 \cdot min(n-1,p)
(default),
while for the cyclic coordinate descent method is set to 100 (default), i.e. the number
of values of the tuning parameter \gamma
;
g0
:control parameter for the pc/ccd
algorithm. Set the smallest
value for the tuning parameter \gamma
. Default is g0 = ifelse(p<n, 1.0e-06, 0.05)
;
dg_max
:control parameter for the pc
algorithm. A non negative value
used to specify the maximum length of the step size. Setting dg_max = 0
(default)
the predictor-corrector algorithm uses the optimal step size (see Augugliaro et al. (2013)
for more details) to approximate the value of the tuning parameter corresponding to the
inclusion/exclusion of a variable from the model;
nNR
:control parameter for the pc
algorithm. A non negative integer
used to specify the maximum number of iterations of the Newton-Raphson algorithm
used in the corrector step. Default is nNR = 200
;
NReps
:control parameter for the pc
algorithm. A non negative
value used to define the convergence criterion of the Newton-Raphson algorithm.
Default is NReps = 1.0e-06
;
ncrct
:control parameter for the pc
algorithm. When the Newton-Raphson
algorithm does not converge, the step size (d\gamma
) is reduced by
d\gamma = cf \cdot d\gamma
and the corrector step is repeated. ncrct
is a non negative integer used to specify the maximum number of trials for the corrector step.
Default is ncrct = 50
;
cf
:control parameter for the pc
algorithm. The contractor factor
is a real value belonging to the interval [0,1]
used to reduce the step size
as previously described. Default is cf = 0.5
;
nccd
:control parameter for the ccd
algorithm. A non negative integer
used to specify the maximum number for steps of the cyclic coordinate descent algorithm.
Default is 1.0e+05
.
eps
control parameter for the pc/ccd
algorithm. The meaning of
this parameter is related to the algorithm used to estimate the solution curve:
i.
if algorithm = ''pc''
then eps
is used
a.
to identify a variable that will be included in the active
set (absolute value of the corresponding Rao's score test
statistic belongs to
[\gamma - \code{eps}, \gamma + \code{eps}]
);
b.
to establish if the corrector step must be repeated;
c.
to define the convergence of the algorithm, i.e., the
actual value of the tuning parameter belongs to the interval
[\code{g0 - eps},\code{g0 + eps}]
;
ii.
if algorithm = ''ccd''
then eps
is used to define the
convergence for a single solution point, i.e., each inner coordinate-descent loop
continues until the maximum change in the Rao's score test statistic, after any
coefficient update, is less than eps
.
Default is eps = 1.0e-05.
cvdglars
returns an object with S3 class “cvdglars
”, i.e. a list
containing the following components:
call |
the call that produced this object; |
formula_cv |
if the model is fitted by |
family |
a description of the error distribution used in the model; |
var_cv |
a character vector with the name of variables selected by cross-validation; |
beta |
the vector of the coefficients estimated by cross-validation; |
phi |
the cross-validation estimate of the disperion parameter; |
dev_m |
a vector of length |
dev_v |
a vector of length |
g |
the value of the tuning parameter corresponding to the minimum of the cross-validation deviance; |
g0 |
the smallest value for the tuning parameter; |
g_max |
the value of the tuning parameter corresponding to the starting point of the dgLARS solution curve; |
X |
the used design matrix; |
y |
the used response vector; |
w |
the vector of weights used to compute the adaptive dglars method; |
conv |
an integer value used to encode the warnings and the errors related to the algorithm used to fit the dgLARS solution curve. The values returned are:
|
control |
the list of control parameters used to compute the cross-validation deviance. |
Luigi Augugliaro
Maintainer: Luigi Augugliaro luigi.augugliaro@unipa.it
Augugliaro L., Mineo A.M. and Wit E.C. (2014) <doi:10.18637/jss.v059.i08> dglars: An R Package to Estimate Sparse Generalized Linear Models, Journal of Statistical Software, Vol 59(8), 1-40. https://www.jstatsoft.org/v59/i08/.
Augugliaro L., Mineo A.M. and Wit E.C. (2013) <doi:10.1111/rssb.12000> dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471-498.
coef.cvdglars
, print.cvdglars
, plot.cvdglars
methods
###########################
# Logistic regression model
# y ~ Binomial
set.seed(123)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), n, p)
b <- 1:2
eta <- b[1] + X[, 1] * b[2]
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 1, mu)
fit_cv <- cvdglars.fit(X, y, family = binomial)
fit_cv
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.