penAFT.cv: Cross-validation function for fitting a regularized...

View source: R/penAFT.cv.R

penAFT.cvR Documentation

Cross-validation function for fitting a regularized semiparametric accelerated failure time model

Description

A function to perform cross-validation and compute the solution path for the regularized semiparametric accelerated failure time model estimator.

Usage

penAFT.cv(X, logY, delta, nlambda = 50, 
  lambda.ratio.min = 0.1, lambda = NULL, 
  penalty = NULL, alpha = 1,weight.set = NULL, 
  groups = NULL, tol.abs = 1e-8, tol.rel = 2.5e-4, 
  standardize = TRUE, nfolds = 5, cv.index = NULL, 
  admm.max.iter = 1e4,quiet = TRUE)

Arguments

X

An n \times p matrix of predictors. Observations should be organized by row.

logY

An n-dimensional vector of log-survival or log-censoring times.

delta

An n-dimensional binary vector indicating whether the jth component of logY is an observed log-survival time (\delta_j = 1) or a log-censoring time (\delta_j = 0) for j=1, \dots, n.

nlambda

The number of candidate tuning parameters to consider.

lambda.ratio.min

The ratio of maximum to minimum candidate tuning parameter value. As a default, we suggest 0.1, but standard model selection procedures should be applied to select \lambda.

lambda

An optional (not recommended) prespecified vector of candidate tuning parameters. Should be in descending order.

penalty

Either "EN" or "SG" for elastic net or sparse group lasso penalties.

alpha

The tuning parameter \alpha. See documentation.

weight.set

A list of weights. For both penalties, w is an n-dimensional vector of nonnegative weights. For "SG" penalty, can also include v – a non-negative vector the length of the number of groups. See documentation for usage example.

groups

When using penalty "SG", a p-dimensional vector of integers corresponding the to group assignment of each predictor (i.e., column of X).

tol.abs

Absolute convergence tolerance.

tol.rel

Relative convergence tolerance.

standardize

Should predictors be standardized (i.e., scaled to have unit variance) for model fitting?

nfolds

The number of folds to be used for cross-validation. Default is five. Ten is recommended when sample size is especially small.

cv.index

A list of length nfolds of indices to be used for cross-validation. This is to be used if trying to perform cross-validation for both \alpha and \lambda. Use with extreme caution: this overwrites nfolds.

admm.max.iter

Maximum number of ADMM iterations.

quiet

TRUE or FALSE variable indicating whether progress should be printed.

Details

Given (\log y_1 , x_1, \delta_1),\dots,(\log y_n , x_n, \delta_n) where for subject i (i = 1, \dots, n), y_i is the minimum of the survival time and censoring time, x_i is a p-dimensional predictor, and \delta_i is the indicator of censoring, penAFT.cv performs nfolds cross-validation for selecting the tuning parameter to be used in the argument minimizing

\frac{1}{n^2}\sum_{i=1}^n \sum_{j=1}^n \delta_i \{ \log y_i - \log y_j - (x_i - x_j)'\beta \}^{-} + \lambda g(\beta)

where \{a \}^{-} := \max(-a, 0) , \lambda > 0, and g is either the weighted elastic net penalty (penalty = "EN") or weighted sparse group lasso penalty (penalty = "SG"). The weighted elastic net penalty is defined as

\alpha \| w \circ \beta\|_1 + \frac{(1-\alpha)}{2}\|\beta\|_2^2

where w is a set of non-negative weights (which can be specified in the weight.set argument). The weighted sparse group-lasso penalty we consider is

\alpha \| w \circ \beta\|_1 + (1-\alpha)\sum_{l=1}^G v_l\|\beta_{\mathcal{G}_l}\|_2

where again, w is a set of non-negative weights and v_l are weights applied to each of the G groups.

Next, we define the cross-validation errors. Let \mathcal{V}_1, \dots, \mathcal{V}_K be a random nfolds = K element partition of [n] (the subjects) with the cardinality of each \mathcal{V}_k (the "kth fold"") approximately equal for k = 1, \dots, K. Let {\hat{\beta}}_{\lambda(-\mathcal{V}_k)} be the solution with tuning parameter \lambda using only data indexed by [n] \setminus \{\mathcal{V}_k\} (i.e., outside the kth fold). Then, definining e_i(\beta) := \log y_i - \beta'x_i for i= 1, \dots, n, we call

\sum_{k=1}^K \left[\frac{1}{|\mathcal{V}_k|^2} \sum_{i \in \mathcal{V}_k} \sum_{j \in \mathcal{V}_k} \delta_i \{e_i({\hat{\beta}}_{\lambda(-\mathcal{V}_k)}) - e_{j}({\hat{\beta}}_{\lambda(-\mathcal{V}_k)})\}^{-}\right],

the cross-validated Gehan loss at \lambda in the kth fold, and refer to the sum over all nfolds = K folds as the cross-validated Gehan loss. Similarly, letting letting

\tilde{e}_i({\hat{\beta}}_\lambda) = \sum_{k = 1}^K (\log y_i - x_i'{\hat{\beta}}_{\lambda(-\mathcal{V}_k)}) \mathbf{1}(i \in \mathcal{V}_k)

for each i \in [n], we call

\left[\sum_{i = 1}^n \sum_{j = 1}^n \delta_i \{\tilde{e}_i({\hat{\beta}}_\lambda) - \tilde{e}_j({\hat{\beta}}_\lambda)\}^{-}\right]

the cross-validated linear predictor score at \lambda.

Value

full.fit

A model fit with the same output as a model fit using penAFT. See documentation for penAFT for more.

cv.err.linPred

A nlambda-dimensional vector of cross-validated linear predictor scores.

cv.err.obj

A nfolds \timesnlambda matrix of cross-valdiation Gehan losses.

cv.index

A list of length nfolds. Each element contains the indices for subjects belonging to that particular fold.

Examples

 # --------------------------------------
# Generate data  
# --------------------------------------
set.seed(1)
genData <- genSurvData(n = 50, p = 50, s = 10, mag = 2,  cens.quant = 0.6)
X <- genData$X
logY <- genData$logY
delta <- genData$status
p <- dim(X)[2]

# -----------------------------------------------
# Fit elastic net penalized estimator
# -----------------------------------------------
fit.en <- penAFT.cv(X = X, logY = logY, delta = delta,
                   nlambda = 10, lambda.ratio.min = 0.1,
                   penalty = "EN", nfolds = 5,
                   alpha = 1)
# ---- coefficients at tuning parameter minimizing cross-valdiation error
coef.en <- penAFT.coef(fit.en)

# ---- predict at 8th tuning parameter from full fit
Xnew <- matrix(rnorm(10*p), nrow=10)
predict.en <- penAFT.predict(fit.en, Xnew = Xnew, lambda = fit.en$full.fit$lambda[8])


  # -----------------------------------------------
  # Fit sparse group penalized estimator
  # -----------------------------------------------
  groups <- rep(1:5, each = 10)
  fit.sg <- penAFT.cv(X = X, logY = logY, delta = delta,
                    nlambda = 50, lambda.ratio.min = 0.01,
                    penalty = "SG", groups = groups, nfolds = 5,
                    alpha = 0.5)
                     
  # -----------------------------------------------
  # Pass fold indices
  # -----------------------------------------------
  groups <- rep(1:5, each = 10)
  cv.index <- list()
  for(k in 1:5){
    cv.index[[k]] <- which(rep(1:5, length=50) == k)
  }
  fit.sg.cvIndex <- penAFT.cv(X = X, logY = logY, delta = delta,
                    nlambda = 50, lambda.ratio.min = 0.01,
                    penalty = "SG", groups = groups, 
                    cv.index = cv.index,
                    alpha = 0.5)
  # --- compare cv indices
  ## Not run: fit.sg.cvIndex$cv.index  == cv.index


penAFT documentation built on April 18, 2023, 9:10 a.m.