en_kfold_model_grid_lim: en_kfold_model_grid_lim function

View source: R/enr functions.R

en_kfold_model_grid_limR Documentation

en_kfold_model_grid_lim function

Description

Runs k fold cross validation ENR models with simultaneous estimation of alpha, lambda, and cutoff via randomized grid (DOCUMENTATION COMING- CURRENT DOCUMENTATION INCORRECT)

Usage

en_kfold_model_grid_lim(
  ddata,
  response_var,
  iter = 10,
  k = 10,
  num_alpha = 20,
  num_lambda = 100,
  seed = 123,
  fit_met = "accuracy",
  loo = FALSE,
  up_dn_samp = "none",
  eq_wt = FALSE,
  type_meas = "deviance",
  na_rm = TRUE,
  lr_cutoff = c(0.5),
  accuracy_modeling = FALSE,
  ties_measure = "mode",
  save_results = FALSE,
  writeout = T,
  writeout_num = num_alpha * 5,
  writeout_path =
    "I:/Lagisetty SDR Misuse/5. Identifiable Data/E. Database/treatment arm creation/treatment arm creation/enr grid writeout directory/",
  restarting = FALSE
)

Arguments

ddata

data frame containing the data to be modeled

response_var

string identifying the name of the outcome variable

iter

the number of iterations to use

k

the number of folds to use

seed

the seed value for allowing results to be reproduced

fit_met

string indicating the fit metric to be used to evaluate model performance. options are c(accuracy, auroc, logloss, f1, ppv, npv, sens, spec, bal_acc)

loo

boolean indicating whether 'leave one out' cross validation should be used

up_dn_samp

string indicating whether unbalanced classes should be balanced by having the smaller class upsampled to be the same size as the larger class or vice versa. can take the form 'upsamp', 'downsamp', and 'none' (default)

eq_wt

boolean indicating whether the 0/1 classes should be balanced with weights. you may want to use this if there is a bad class imbalance

type_meas

the 'type measure' which is passed to cv.glmnet that governs its training penalty when tuning lambda. this should match arguments expected in cv.glmnet

na_rm

boolean indicating whether missing values should be removed. default is TRUE

lr_cutoff

vetor of cutoff values to test/tune for optimization. the default is 'c(.5)' which is to say 'equal distance from all classes' which is typical in standard analyses

accuracy_modeling

switch determining if we need to break ties between optimal solutions

ties_measure

string indicating the method for breaking ties. default is 'mode' indicating that the model with the best performance across all fit metrics listed will when when model results are tied.

save_results

boolean indicating if the unaggregated results should be saved and returned. default is FALSE as this is typically too much material to save and could crash the r session

writeout

boolean indicating if the progress of the function should be written out in real time in order to restart if need be. the file wll contain the aggregated results up to some iteration so that the process may be restarted without having to return to i = 1

writeout_num

the number of iterations before the results are written out to save. a higher number will move faster (beacuse it's not writing out all the time) and a lower number will move slower but will probably mean a lower number of reruns in the even of an interruption. default is alpha*5 so in the event of checking 20 alphas the function will write results every 100 iterations

writeout_path

string of the path directory for writing out results

restarting

boolean indicating whether the process is restarting. if it is then the function looks for a file in the 'writeout_path' and reads that in and continues from where it left off

Examples

en_kfold_model_grid_lim()

clmacleod/highlandr documentation built on April 17, 2025, 3:30 a.m.