Description Usage Arguments Details Value Author(s) References See Also Examples
Uses the kfold crossvalidation deviance to estimate the solution point of the dgLARS solution curve.
1 2 3 4 5 
formula 
an object of class “ 
family 
a description of the error distribution and link
function used to specify the model. This can be a character string
naming a family function or the result of a call to a family function
(see 
g 
argument available only for 
unpenalized 
a vector used to specify the unpenalized estimators;

b_wght 
a vector, with length equal to the number of columns of
the matrix 
data 
an optional data frame, list or environment (or object coercible by ‘as.data.frame’ to a data frame) containing the variables in the model. If not found in ‘data’, the variables are taken from ‘environment(formula)’. 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
contrast 
an optional list. See the ‘contrasts.arg’ of ‘model.matrix.default’. 
control 
a list of control parameters. See ‘Details’. 
X 
design matrix of dimension n\times p. 
y 
response vector. When the 
cvdglars
function runs dglars
nfold
+1 times.
The deviance is stored, and the average and its standard deviation
over the folds are computed.
cvdglars.fit
is the workhorse function: it is more efficient
when the design matrix have already been calculated. For this reason
we suggest to use this function when the dgLARS method is applied in
a highdimensional setting, i.e. when p>n
.
The control
argument is a list that can supply any of the following components:
algorithm
:a string specifying the algorithm used to
compute the solution curve. The predictorcorrector algorithm is used
when algorithm = ''pc''
(default), while the cyclic coordinate d
escent method is used setting algorithm = ''ccd''
;
method
:a string by means of to specify the kind of solution curve.
If method = ''dgLASSO''
(default) the algorithm computes the solution
curve defined by the differential geometric generalization of the LASSO
estimator; otherwise, if method = ''dgLARS''
, the differential geometric
generalization of the least angle regression method is used;
nfold
:a non negative integer used to specify the number of folds.
Although nfolds
can be as large as the sample size (leaveoneout CV), it
is not recommended for large datasets. Default is nfold = 10
;
foldid
a ndimensional vector of integers, between 1 and n,
used to define the folds for the crossvalidation. By default foldid
is
randomly generated;
ng
:number of values of the tuning parameter used to compute the
crossvalidation deviance. Default is ng = 100
;
nv
:control parameter for the pc
algorithm. An integer value
belonging to the interval [1;min(n,p)] (default is nv = min(n1,p)
)
used to specify the maximum number of variables included in the final model;
np
:control parameter for the pc/ccd
algorithm. A non negative
integer used to define the maximum number of points of the solution curve. For the
predictorcorrector algorithm np
is set to 50 \cdot min(n1,p) (default),
while for the cyclic coordinate descent method is set to 100 (default), i.e. the number
of values of the tuning parameter g;
g0
:control parameter for the pc/ccd
algorithm. Set the smallest
value for the tuning parameter g. Default is g0 = ifelse(p<n, 1.0e06, 0.05)
;
dg_max
:control parameter for the pc
algorithm. A non negative value
used to specify the maximum length of the step size. Setting dg_max = 0
(default)
the predictorcorrector algorithm uses the optimal step size (see Augugliaro et al. (2013)
for more details) to approximate the value of the tuning parameter corresponding to the
inclusion/exclusion of a variable from the model;
nNR
:control parameter for the pc
algorithm. A non negative integer
used to specify the maximum number of iterations of the NewtonRaphson algorithm
used in the corrector step. Default is nNR = 200
;
NReps
:control parameter for the pc
algorithm. A non negative
value used to define the convergence criterion of the NewtonRaphson algorithm.
Default is NReps = 1.0e06
;
ncrct
:control parameter for the pc
algorithm. When the NewtonRaphson
algorithm does not converge, the step size (dg) is reduced by
dg = cf * dg and the corrector step is repeated. ncrct
is a non negative integer used to specify the maximum number of trials for the corrector step.
Default is ncrct = 50
;
cf
:control parameter for the pc
algorithm. The contractor factor
is a real value belonging to the interval [0,1] used to reduce the step size
as previously described. Default is cf = 0.5
;
nccd
:control parameter for the ccd
algorithm. A non negative integer
used to specify the maximum number for steps of the cyclic coordinate descent algorithm.
Default is 1.0e+05
.
eps
control parameter for the pc/ccd
algorithm. The meaning of
this parameter is related to the algorithm used to estimate the solution curve:
i.
if algorithm = ''pc''
then eps
is used
a.
to identify a variable that will be included in the active set (absolute value of the corresponding Rao's score test statistic belongs to [g  eps, g + eps]);
b.
to establish if the corrector step must be repeated;
c.
to define the convergence of the algorithm, i.e., the actual value of the tuning parameter belongs to the interval g0  eps, g0 + eps;
ii.
if algorithm = ''ccd''
then eps
is used to define the
convergence for a single solution point, i.e., each inner coordinatedescent loop
continues until the maximum change in the Rao's score test statistic, after any
coefficient update, is less than eps
.
Default is eps = 1.0e05.
cvdglars
returns an object with S3 class “cvdglars
”, i.e. a list
containing the following components:
call 
the call that produced this object; 
formula_cv 
if the model is fitted by 
family 
a description of the error distribution used in the model; 
var_cv 
a character vector with the name of variables selected by crossvalidation; 
beta 
the vector of the coefficients estimated by crossvalidation; 
phi 
the crossvalidation estimate of the disperion parameter; 
dev_m 
a vector of length 
dev_v 
a vector of length 
g 
the value of the tuning parameter corresponding to the minimum of the crossvalidation deviance; 
g0 
the smallest value for the tuning parameter; 
g_max 
the value of the tuning parameter corresponding to the starting point of the dgLARS solution curve; 
X 
the used design matrix; 
y 
the used response vector; 
w 
the vector of weights used to compute the adaptive dglars method; 
conv 
an integer value used to encode the warnings and the errors related to the algorithm used to fit the dgLARS solution curve. The values returned are:

control 
the list of control parameters used to compute the crossvalidation deviance. 
Luigi Augugliaro
Maintainer: Luigi Augugliaro [email protected]
Augugliaro L., Mineo A.M. and Wit E.C. (2014) dglars: An R Package to Estimate Sparse Generalized Linear Models, Journal of Statistical Software, Vol 59(8), 140. http://www.jstatsoft.org/v59/i08/.
Augugliaro L., Mineo A.M. and Wit E.C. (2013) dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471498.
Augugliaro L., Mineo A.M. and Wit E.C. (2012) Differential geometric LARS via cyclic coordinate descent method, in Proceeding of COMPSTAT 2012, pp. 6779. Limassol, Cyprus.
coef.cvdglars
, print.cvdglars
, plot.cvdglars
methods
1 2 3 4 5 6 7 8 9 10 11 12 13 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.