pcoxtimecv: Cross-validation for pcoxtime

View source: R/pcoxtimecv.R

pcoxtimecvR Documentation

Cross-validation for pcoxtime

Description

Performs k-fold cross-validation for pcoxtime, plots solution path plots, and returns optimal value of lambda (and optimal alpha if more than one is given).

Usage

pcoxtimecv(
  formula,
  data,
  alphas = 1,
  lambdas = NULL,
  nlambdas = 100,
  lammin_fract = NULL,
  lamfract = 0.6,
  nfolds = 10,
  foldids = NULL,
  devtype = "vv",
  refit = FALSE,
  maxiter = 1e+05,
  tol = 1e-08,
  quietly = FALSE,
  seed = NULL,
  nclusters = 1,
  na.action = na.omit,
  ...
)

Arguments

formula

object of class formula describing the model. The response is specified similar to Surv function from package survival. The terms (predictors) are specified on the right of "~" in the formula.

data

optional data frame containing variables specified in the formula.

alphas

elasticnet mixing parameter, with 0 <= alphas <= 1. If a vector of alphas is supplied, cross-validation will be performed for each of the alphas and optimal value returned. The default is 1.

lambdas

optional user-supplied sequence. If lambdas = NULL (default – highly recommended), the algorithm chooses its own sequence.

nlambdas

the default number of lambdas values. Default is 100.

lammin_fract

smallest value of lambda, as fraction of maximum lambda. If NULL, default, it depends on the number of observations (n) relative to the number of variables (p). If n > p, the default is 0.0001, otherwise 0.01. Increasing this value may lead to faster convergence.

lamfract

proportion of regularization path to consider. If lamfract = 1, complete regularization path is considered. However, if 0.5 <= lamfract <1, only a proportion of the nlambdas considered. Choosing a smaller lamfract reduces computational time and potentially stable estimates for model with large number of predictors. See details.

nfolds

number of folds. Default is 10. The smallest allowable is nfolds = 3.

foldids

an optional sequence of values between 1 and nfolds specifying what fold each observation is in. This is important when comparing performance across models. If specified, nfolds can be missing.

devtype

loss to use for cross-validation. Currently, two options are available but versions will implement concordScore.pcoxtime loss too. The two are, default (devtype = "vv") Verweij Van Houwelingen partial-likelihood deviance and basic cross-validated parial likelihood devtype = "basic". See Dai, B., and Breheny, P. (2019) for details.

refit

logical. Whether to return solution path based on optimal lambda and alpha picked by the model. Default is refit = FALSE.

maxiter

maximum number of iterations to convergence. Default is 1e5. Consider increasing it if the model does not converge.

tol

convergence threshold for proximal gradient gradient descent. Each proximal update continues until the relative change in all the coefficients (i.e. √{∑(β_{k+1} - β_k)^2}/stepsize) is less than tol. The default value is 1e-8.

quietly

logical. If TRUE, refit progress is printed.

seed

random seed. Default is NULL, which generated the seed internally.

nclusters

number of cores to use to run the cross-validation in parallel. Default is nclusters = 1 which runs serial.

na.action

a function which indicates what should happen when the data contain NAs.

...

additional arguments not implemented.

Details

The function fits pcoxtime folds + 1 (if refit = FALSE) or folds + 2 times (if refit = FALSE). In the former case, the solution path to display using plot.pcoxtimecv is randomly picked from all the cross-validation runs. However, in the later case, the solution path plot is based on the model refitted using the optimal parameters. In both cases, the function first runs plot.pcoxtimecv to compute the lambda sequence and then perform cross-validation on nfolds.

If more than one alphas is specified, say code(0.2, 0.5, 1), the pcoxtimecv will search (experimental) for optimal values for alpha with respect to the corresponding lambda values. In this case, optimal alpha and lambda sequence will be returned, i.e., the (alphas, lambdas) pair that corresponds to the lowest predicted cross-validated error (likelihood deviance).

For data sets with a very large number of predictors, it is recommended to only calculate partial paths by lowering the value of lamfract. In other words, for p > n problems, the near lambda = 0 solution is poorly behaved and this may account for over 99% of the function's runtime. We therefore recommend always specifying lamfract < 1 and increase if the optimal lambda suggests lower values.

Value

An S3 object of class pcoxtimecv:

lambda.min

the value of lambda that gives minimum cross-validated error.

lambda.1se

largest value of lambda such that error is within 1 standard error of the minimum.

alpha.optimal

optimal alpha corresponding to lambda.min.

lambdas.optimal

the sequence of lambdas containing lambda.min.

foldids

the fold assignment used.

dfs

list of data frames containing mean cross-validated error summaries and estimated coefficients in each fold.

fit

if refit = TRUE, summaries corresponding to the optimal alpha and lambdas. This is used to plot solution path

.

References

Dai, B., and Breheny, P. (2019). Cross validation approaches for penalized Cox regression. arXiv preprint arXiv:1905.10432.

Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5) 1-13 doi: 10.18637/jss.v039.i05.

See Also

plot.pcoxtimecv, pcoxtime

Examples


# Time-independent covariates
if (packageVersion("survival")>="3.2.9") {
   data(cancer, package="survival")
} else {
   data(veteran, package="survival")
}

cv1 <- pcoxtimecv(Surv(time, status) ~ factor(trt) + karno + diagtime + age + prior
	, data = veteran
	, alphas = 1
	, refit = FALSE
	, lamfract = 0.6
)
print(cv1)

# Train model using optimal alpha and lambda
fit1 <- pcoxtime(Surv(time, status) ~ factor(trt) + karno + diagtime + age + prior
	, data = veteran
	, alpha = cv1$alpha.optimal
	, lambda = cv1$lambda.min
)
print(fit1)
# Time-varying covariates
data(heart, package="survival")
cv2 <- pcoxtimecv(Surv(start, stop, event) ~ age + year + surgery + transplant
	, data = heart
	, alphas = 1
	, refit = FALSE
	, lamfract = 0.6
)
print(cv2)

# Train model
fit2 <- pcoxtime(Surv(start, stop, event) ~ age + year + surgery + transplant
	, data = heart
	, alpha = cv2$alpha.optimal
	, lambda = cv2$lambda.min
)
print(fit2)


pcoxtime documentation built on May 13, 2022, 1:05 a.m.