grcv: General Refitted Cross-Validation Estimator
In dglars: Differential Geometric Least Angle Regression

View source: R/grcv.R

grcv	R Documentation

General Refitted Cross-Validation Estimator

Description

grcv computes the estimate of the dispersion parameter using the general refitted cross-validation method.

Usage

grcv(object, type = c("BIC", "AIC"), nit = 10L, trace = FALSE,
     control = list(), ...)

Arguments

`object`	fitted `dglars` object.
`type`	the measure of goodness-of-fit used in Step 2 to select the two set of variables (see section Description for more details). Default is `type = BIC`.
`control`	a list of control parameters passed to the function `dglars`.
`nit`	integer specifying the number of times that the general refitted cross-validation method is repeated (see section Description for more details). Default is `nit = 10L`.
`trace`	flag used to print out information about the algorithm. Default is `trace = FALSE`.
`...`	further arguments passed to the functions `AIC.dglars` or `BIC.dglars`.

Details

The general refitted cross-validation (grcv) estimator (Pazira et al., 2018) is an estimator of the dispersion parameter of the exponential family based on the following four stage procedure:

Step	Description
1.	randomly split the data set `D = (y, X)` into two even datasets, denoted by `D_1` and `D_2`.
2.	fit dglars model to the dataset `D_1` to select a set of variables `A_1`.
	fit dglars model to the dataset `D_2` to select a set of variables `A_2`.
3.	fit the glm model to the dataset `D_1` using the variables that are in `A_2`; then estimate the
	disporsion parameter using the Pearson method. Denote by `\hat{\phi}_1(A_2)` the resulting estimate.
	fit the glm model to the dataset `D_2` using the variables that are in `A_1`; then estimate the
	disporsion parameter using the Pearson method. Denote by `\hat{\phi}_2(A_1)` the resulting estimate.
4.	estimate `\phi` using the following estimator: `\hat{\phi}_{grcv} = (\hat{\phi}_1(A_2) + \hat{\phi}_2(A_1)) / 2`.

In order to reduce the random variabilty due to the splitting of the dataset (Step 1), the previous procedure is repeated ‘nit’-times; the median of the resulting estimates is used as final estimate of the dispersion parameter. In Step 3, the two sets of variables are selected using the AIC.dglars and BIC.dglars; in this step, the Pearson method is used to obtain a first estimate of the dispersion parameter. Furthermore, if the function glm does not converge the function dglars is used to compute the maximum likelihood estimates.

Value

grcv returns the estimate of the dispersion parameter.

Author(s)

Luigi Augugliaro and Hassan Pazira
Maintainer: Luigi Augugliaro luigi.augugliaro@unipa.it

References

Pazira H., Augugliaro L. and Wit E.C. (2018) <doi:10.1007/s11222-017-9761-7> Extended differential-geometric LARS for high-dimensional GLMs with general dispersion parameter, Statistics and Computing, Vol 28(4), 753-774.

Examples

############################
# y ~ Gamma
set.seed(321)
n <- 100
p <- 50
X <- matrix(abs(rnorm(n*p)),n,p)
eta <- 1 + 2 * X[,1]
mu <- drop(Gamma()$linkinv(eta))
shape <- 0.5
phi <- 1 / shape
y <- rgamma(n, scale = mu / shape, shape = shape)
fit <- dglars(y ~ X, Gamma("log"))

phi
grcv(fit, type = "AIC")
grcv(fit, type = "BIC")

dglars documentation built on Oct. 10, 2023, 1:08 a.m.