cv.lqa: Finding Optimal Tuning Parameter via Cross-Validation or...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/cv.lqa.R

Description

This function computes optimal tuning parameters for penalized GLMs that can be fitted by lqa. The optimal tuning parameter minimizes the loss function you have specified in the argument loss.func. However, to find the optimal one this function evaluates model performance for different tuning parameter candidates given in the argument lambda.candidates.

If you just give training data then a cross-validation will be applied. If you additionally provide validation data (y.vali and x.vali) then these will be used for measuring model performance while your training data (y.train and x.train) are entirely used for model fitting.

The cv.lqa function also returns best.obj. That is the lqa object as returned from lqa when it has been called with the chosen penalty family and the optimal tuning parameter.

Usage

1
2
3
4
5
6
  cv.lqa(y.train, x.train, intercept = TRUE, y.vali = NULL, 
             x.vali = NULL, lambda.candidates, family, penalty.family, 
             standardize = TRUE, n.fold, cv.folds, 
             loss.func = aic.loss, control = lqa.control(), ...)
  ## S3 method for class 'cv.lqa'
print(x, ...)

Arguments

y.train

the vector of response training data.

x.train

the design matrix of training data. If intercept = TRUE then it does not matter whether a column of ones is already included in x.train or not. The function adjusts it if necessary.

intercept

logical. If ‘TRUE’ then an intercept is included in the model (this is recommended).

y.vali

an additional vector of response validation data. If given the validation data are used for evaluating the loss function.

x.vali

an additional design matrix of validation data. If given the validation data are used for evaluating the loss function. If intercept = TRUE then it does not matter whether a column of ones is already included in x.vali or not. The function adjusts it if necessary.

lambda.candidates

a list containing the tuning parameter candidates. The number of list elements must correspond to the dimension of the tuning parameter. See the accompanying ‘User’s Guide' for further details.

family

identifies the exponential family of the response and the link function of the model. See the description of the R function family() for further details.

penalty.family

a function or character argument identifying the penalty family. See examples below.

standardize

logical. If ‘TRUE’ the data are standardized (this is recommended).

n.fold

number of folds in cross-validation. This argument can be omitted if a validation set is used.

cv.folds

a list containing the indices of y.train to indicate the observations that might be used in the particular cross-validation folds. This can be omitted if a validation set is used. Moreover, it is optional as well if no validation set is given.

loss.func

a character indicating the loss function to be used in evaluating the model performance for the tuning parameter candidates. If it is missing then the aic.loss() function will be used. See details below.

control

a list of parameters for controlling the fitting process. See the documentation of lqa.control for details.

x

used in the ‘print’ method: a cv.lqa object as returned by cv.lqa.

...

Further arguments.

Details

This function can be used for evaluating model performance for different tuning parameter candidates. If you just give training data a cross-validation will be applied. If you additionally provide validation data then those data will be used for measuring the performance and the training data are completely used for model fitting.

You must specify a penalty family. This can be done by giving its name as a character (e.g. penalty.family = "lasso") or as a function call (e.g. penalty.family = lasso).

The tuning parameter candidates are given in the argument lambda.candidates. Usually one should a priori generate a sequence of equidistant points and then use them as exponents to Euler's number. See example below. Note that lambda.candidates must be a list in order to cope with different numbers of candidates among the elements of the tuning parameter vector.

For evaluation you must specify a loss function. The default value is aic.loss e.g. the AIC will be used to find an optimal tuning parameter. Other already implemented loss functions are bic.loss, gcv.loss, squared.loss (quadratic loss function), dev.loss (deviance as loss function).

Value

The function cv.lqa returns an object of class cv.lqa which is a list with the following components:

lambda.opt

the optimal tuning parameter(s).

beta.opt

the MLE corresponding to the optimal tuning parameter(s).

best.pos

the positions of the optimal tuning parameter(s) in the lambda.candidates argument.

loss.mat

the array containing the loss function values of all tuning parameter candidates (rows) and all folds (columns).

best.obj

a member of the class lqa where to optimal tuning parameters are used.

loss.func

the loss function used.

exist.vali

logical whether or not a validation data set has been given as argument.

cv.folds

the cv.folds (list containing the indices of the training data that has been used in the cross-validation folds) used.

n.fold

number of folds.

mean.array

the array containing the mean performances of all tuning parameter candidates.

lambda.candidates

the original lambda.candidates argument.

Author(s)

Jan Ulbricht

See Also

lqa, predict.lqa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Gaussian response + lasso penalty + aic.loss:

set.seed (1111)

n <- 200
p <- 5
X <- matrix (rnorm (n * p), ncol = p)
X[,2] <- X[,1] + rnorm (n, sd = 0.1)
X[,3] <- X[,1] + rnorm (n, sd = 0.1)
true.beta <- c (1, 2, 0, 0, -1)
y <- drop (X %*% true.beta) + rnorm (n)

cv.obj1 <- cv.lqa (y, X, intercept = TRUE, 
   lambda.candidates = list (c (0.001, 0.05, 1, 5, 10)), family = gaussian (), 
   penalty.family = lasso, n.fold = 5, 
   loss.func = "aic.loss")
cv.obj1


## Binary response + fused.lasso penalty + dev.loss:

 n <- 100
 p <- 5

 set.seed (1234)
 x <- matrix (rnorm (n * p), ncol = p)
 x[,2] <- x[,1] + rnorm (n, sd = 0.01)
 x[,3] <- x[,1] + rnorm (n, sd = 0.1)
 beta <- c (1, 2, 0, 0, -1)
 prob1 <- 1 / (1 + exp (drop (-x %*% beta)))
 y <- sapply (prob1, function (prob1) {rbinom (1, 1, prob1)})

cv.obj2 <- cv.lqa (y, x, family = binomial (), penalty.family =
 fused.lasso, lambda.candidates = list (c (0.001, 0.05, 0.5, 1, 5),
 c (0.001, 0.01, 0.5)), n.fold = 5, loss.func = "dev.loss")
cv.obj2

lqa documentation built on May 30, 2017, 3:41 a.m.