stackBagg-internal: Algorithm 2: Procedure to obtain optimally the coefficients...
In pablogonzalezginestet/EnsBagg: Stacked IPCW Bagging

ipcw_ensbagg

R Documentation

Algorithm 2: Procedure to obtain optimally the coefficients to be used in Algorithm 1

Description

Obtain predictions

Compute weighted Brier Loss function for a single marker or a linear weighted combination of markers

Compute weighted Cross Entropy Loss function for a single marker or a linear weighted combination of markers

Obtain the lambda hyparameter for the LASSO using cross-validation

Internal stackBagg helper functions

Compute the risk of missclassifying an individual using as a marker a single prediction or weighted linear combination of several predictions (1-AUC)

Predictions based on a library of Machine Learning procedures

Library of Machine Learning procedures

Predictions based on those Machine Learning procedures in the library that allow for weights to be specified as an argument of the R function. No bagging occurs. This group of algorithms is denoted as Native Weights

Library of Machine Learning procedures that allows for weights

A grid of values for hyperparameters used in the Real Data Application: InfCareHIV Register. This grid of values isan argument in the tuning parameter function tune_parameter_ml.R

Usage

ipcw_ensbagg(folds, MLprocedures, fmla, tuneparams, tao, B = NULL, A,
  data, xnam, xnam.factor, xnam.cont, xnam.cont.gam, ens.library)

ipcw_genbagg(fmla, tuneparams, MLprocedures, traindata, testdata, B, A,
  xnam, xnam.factor, xnam.cont, xnam.cont.gam, ens.library)

ipcw_brier(par, Z, y, wts)

ipcw_crossentropy(par, Z, y, wts)

tune_lasso(folds, fmla, tao, data, xnam)

optimun_auc_coef(coef_init, lambda, data, Z, tao)

risk_auc(par, lambda, Z, data, tao)

MLprocedures(traindata, testdata, fmla, xnam, xnam.factor, xnam.cont,
  xnam.cont.gam, tuneparams, ens.library, i)

ML_list

MLprocedures_natively(traindata, testdata, fmla, xnam, xnam.factor,
  xnam.cont, xnam.cont.gam, tuneparams)

ML_list_natively

grid_parametersDataHIV(xnam, data, tao)

Arguments

`folds`	Number of folds
`MLprocedures`	MLprocedures
`fmla`	formula object ex. "E ~ x1+x2"
`tuneparams`	a list of tune parameters for each machine learning procedure
`tao`	time point of interest
`B`	number of bootstrap samples
`data`	a training data set
`xnam`	all covariates in the model
`xnam.factor`	categorical variables include in the model
`xnam.cont`	continous variables include in the model
`xnam.cont.gam`	continous variables to be included in the smoothing operator gam::s(,df)
`ens.library`	algorithms in the library
`traindata`	a training data set
`testdata`	a test data set
`par`	a vector of weights. Its length must be equal to the number of predictions included in Z
`Z`	a matrix that contains the predictions. Each column represents a single marker.
`y`	vector of response variable (binary).
`wts`	IPC weights
`coef_init`	starting values for the coefficients
`lambda`	penalization term. It is a positive scalar.
`i`	sample selected by bootstrap
`fmla`	formula object ex. "E ~ x1+x2"
`tuneparams`	a list of tune parameters for each machine learning procedure
`MLprocedures`	MLprocedures
`B`	number of bootstrap samples
`xnam`	all covariates in the model
`xnam.factor`	categorical variables include in the model
`xnam.cont`	continous variables include in the model
`xnam.cont.gam`	continous variables to be included in the smoothing operator gam::s(,df=)
`ens.library`	algorithms in the library
`par`	a vector of weights. Its length must be equal to the number of predictions included in Z
`Z`	a matrix that contains the predictions. Each column represents a single marker.
`y`	vector of response variable (binary).
`wts`	IPC weights
`folds`	number of folds
`fmla`	formula object ex. "E ~ x1+x2"
`tao`	time point of interest
`data`	a training data set
`data`	A data frame that contains at least: ttilde, delta, wts
`Z`	a matrix that contains the predictions. Each column represents a single marker.
`tao`	time point of interest
`par`	a vector of coefficients/weights. Its length must be equal to the number of predictions included in Z
`lambda`	penalization term. It is a positive scalar.
`Z`	a matrix that contains the predictions. Each column represents a single marker.
`data`	A data frame that constains at least: ttilde= time to event, delta=event type, wts= IPC weights
`tao`	time point of interest
`traindata`	training data set
`testdata`	validation/test data set
`fmla`	formula object ex. "E ~ x1+x2"
`tuneparams`	a list of tune parameters for each machine learning procedure
`traindata`	training data set
`testdata`	validation/test data set
`fmla`	formula object ex. "E ~ x1+x2"
`tuneparams`	a list of tune parameters for each machine learning procedure
`xnam`	a vector with the covariates names considered in the modeling
`data`	a training data set
`tao`	time point of interest

Format

An object of class list of length 8.

Details

These functions are not intended for use by users.

Value

a list with the predictions of each machine learning algorithm (id, predictions), the average AUC across folds for each of them, the optimal coefficients, an indicator if the optimization procedure has converged and the value of penalization term chosen

a matrix with the predictions on the test data set of each machine learning algorithm considered in MLprocedures

lambda to be used in the glmnet function

a vector with the optimal AUC value and the optimal coefficient

1-AUC

a matrix of predictions where each column is the prediction of each algorithm based on the testdata

a list of Machine Learning functions

a matrix of predictions where each column is the prediction of each algorithm based on the testdata

a list of Machine Learning functions

a list with a grid of values for each hyperparameter gam_param a vector containing degree of freedom 3 and 4 lasso_param a grid of values for the shrinkage term lambda randomforest_param a two column matrix: first column denotes the num_trees parameter and the second column denotes the mtry parameter. knn_param a grid of positive integers values svm_param a three column matrix: first column denotes the cost parameter, second column the gamma and third column the kernel. kernel=1 denotes "radial" and kernel=2 denotes "linear". nn_param a grid of positive integers values for the neurons bart_param a three column matrix: first column denotes the num_tree parameter, second column the k parameter and third column the q parameter.

pablogonzalezginestet/EnsBagg documentation built on Aug. 25, 2023, 3:22 a.m.