en_kfold_model_grid_lim: en_kfold_model_grid_lim function
In clmacleod/highlandr: Random Useful Functions

en_kfold_model_grid_lim

R Documentation

en_kfold_model_grid_lim function

Description

Runs k fold cross validation ENR models with simultaneous estimation of alpha, lambda, and cutoff via randomized grid (DOCUMENTATION COMING- CURRENT DOCUMENTATION INCORRECT)

Usage

en_kfold_model_grid_lim(
  ddata,
  response_var,
  iter = 10,
  k = 10,
  num_alpha = 20,
  num_lambda = 100,
  seed = 123,
  fit_met = "accuracy",
  loo = FALSE,
  up_dn_samp = "none",
  eq_wt = FALSE,
  type_meas = "deviance",
  na_rm = TRUE,
  lr_cutoff = c(0.5),
  accuracy_modeling = FALSE,
  ties_measure = "mode",
  save_results = FALSE,
  writeout = T,
  writeout_num = num_alpha * 5,
  writeout_path =
    "I:/Lagisetty SDR Misuse/5. Identifiable Data/E. Database/treatment arm creation/treatment arm creation/enr grid writeout directory/",
  restarting = FALSE
)

Arguments

`ddata`	data frame containing the data to be modeled
`response_var`	string identifying the name of the outcome variable
`iter`	the number of iterations to use
`k`	the number of folds to use
`seed`	the seed value for allowing results to be reproduced
`fit_met`	string indicating the fit metric to be used to evaluate model performance. options are c(accuracy, auroc, logloss, f1, ppv, npv, sens, spec, bal_acc)
`loo`	boolean indicating whether 'leave one out' cross validation should be used
`up_dn_samp`	string indicating whether unbalanced classes should be balanced by having the smaller class upsampled to be the same size as the larger class or vice versa. can take the form 'upsamp', 'downsamp', and 'none' (default)
`eq_wt`	boolean indicating whether the 0/1 classes should be balanced with weights. you may want to use this if there is a bad class imbalance
`type_meas`	the 'type measure' which is passed to cv.glmnet that governs its training penalty when tuning lambda. this should match arguments expected in cv.glmnet
`na_rm`	boolean indicating whether missing values should be removed. default is TRUE
`lr_cutoff`	vetor of cutoff values to test/tune for optimization. the default is 'c(.5)' which is to say 'equal distance from all classes' which is typical in standard analyses
`accuracy_modeling`	switch determining if we need to break ties between optimal solutions
`ties_measure`	string indicating the method for breaking ties. default is 'mode' indicating that the model with the best performance across all fit metrics listed will when when model results are tied.
`save_results`	boolean indicating if the unaggregated results should be saved and returned. default is FALSE as this is typically too much material to save and could crash the r session
`writeout`	boolean indicating if the progress of the function should be written out in real time in order to restart if need be. the file wll contain the aggregated results up to some iteration so that the process may be restarted without having to return to i = 1
`writeout_num`	the number of iterations before the results are written out to save. a higher number will move faster (beacuse it's not writing out all the time) and a lower number will move slower but will probably mean a lower number of reruns in the even of an interruption. default is alpha*5 so in the event of checking 20 alphas the function will write results every 100 iterations
`writeout_path`	string of the path directory for writing out results
`restarting`	boolean indicating whether the process is restarting. if it is then the function looks for a file in the 'writeout_path' and reads that in and continues from where it left off