cv_loglik: Performs cross-validation to calculate the average predicted...

View source: R/cv_loglik.R

cv_loglikR Documentation

Performs cross-validation to calculate the average predicted log likelihood for the logreg2ph method. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.

Description

Performs cross-validation to calculate the average predicted log likelihood for the logreg2ph method. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.

Usage

cv_loglik(
  seed = 1,
  interp = TRUE,
  nfolds = 5,
  Y_unval = NULL,
  Y_val = NULL,
  X_unval = NULL,
  X_val = NULL,
  C = NULL,
  Validated = NULL,
  Bspline = NULL,
  data,
  theta_pred = NULL,
  gamma_pred = NULL,
  TOL = 1e-04,
  MAX_ITER = 1000
)

Arguments

seed

(For reproducibility in assigning the folds) an integer to specify the random number generator.

interp

Indicator of whether the B-spline coefficients in the testing data should be linearly interpolated from the training data. Defaults to TRUE.

nfolds

Specifies the number of cross-validation folds. The default value is 5. Although nfolds can be as large as the sample size (leave-one-out cross-validation), it is not recommended for large datasets. The smallest value allowable is 3.

Y_unval

Column name with the unvalidated outcome. If Y_unval is null, the outcome is assumed to be error-free.

Y_val

Column name with the validated outcome.

X_unval

Column name(s) with the unvalidated predictors. If X_unval and X_val are null, all precictors are assumed to be error-free.

X_val

Column name(s) with the validated predictors. If X_unval and X_val are null, all precictors are assumed to be error-free.

C

(Optional) Column name(s) with additional error-free covariates.

Validated

Column name with the validation indicator. The validation indicator can be defined as Validated = 1 or TRUE if the subject was validated and Validated = 0 or FALSE otherwise.

Bspline

Vector of column names containing the B-spline basis functions.

data

A dataframe with one row per subject containing columns: Y_unval, Y_val, X_unval, X_val, C, Validated, and Bspline.

theta_pred

Vector of columns in data that pertain to the predictors in the analysis model.

gamma_pred

Vector of columns in data that pertain to the predictors in the outcome error model.

TOL

Tolerance between iterations in the EM algorithm used to define convergence. Defaults to 1E-4.

MAX_ITER

Maximum number of iterations allowed in the EM algorithm. Defaults to 1000.

Value

avg_pred_loglike

Stores the average predicted log likelihood.

pred_loglike

Stores the predicted log likelihoood in each fold.

converged

Stores the convergence status of the EM algorithm in each run.


sarahlotspeich/logreg2ph_R_only documentation built on Jan. 20, 2025, 6:20 p.m.