View source: R/fx_modelResample.R
fx_modelResample | R Documentation |
Apply machine learning framework to specified dataset
fx_modelResample( df0, cv.type = NULL, covar = NULL, voi = NULL, outcome = NULL, model.type = NULL, nresample = 1, dthresh = 0.5, z.pred = F, n.cores = 20, balance.col = NULL, partitions = NULL )
df0 |
data frame including all observations (data frame) |
cv.type |
cross-validation type ('loocv', 'ltocv', 'n-fold', 'numeric') (string) |
covar |
list of df0 column names for "covariate" (not of specific interest) features (string/list) |
voi |
list of df0 column names for variables/features of interest (string/list) |
outcome |
df0 column name for outcome measure to be predicted (string) |
model.type |
machine learning model ('rf', 'logistic', 'regression', 'rf.regression', 'svm') (string) |
nresample |
number of resamples (numeric) |
dthresh |
decision threshold (numeric) |
z.pred |
standardize predictive features (boolean) |
n.cores |
number of cores (parallel processes) (numeric/integer) |
balance.col |
df0 column name used for ensuring balanced columns |
partitions |
pre-defined train/test partitions |
A list of length five, containing the following elements:
"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
"parameters": list of relevant specified parameters
A list of length five, containing the following elements:
"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
TP: true positive
FP: false positive
TN: true negative
FN: false negative
sens: sensitivity
spec: specificity
ppv: positive predictive value
npv: negative predictive value
acc: accuracy
auc.ROC: area under the curve of ROC curve
optThresh: optimal decision threshold determined from training data
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
orig.df.row: row in original data frame for specific observation,
fold: fold assignment
pred.prob.covar: predicted probability of class membership from covariate model
pred.prob.full: predicted probability of class membership from full model
pred.class.covar: predicted class from covariate model
pred.class.full: predicted class from full model
actual.class: actual class membership
"parameters": list of relevant specified parameters
"sample.type": cross-validation sampling procedure
"class.levels": class levels
"model.type": machine learning model framework
"covar": specified covariates
"voi": specified variables of interest
"outcome": name of class being predicted
"formula.covar": formula object for covariate model
"formula.full": formula object for full model
"data.frame": data frame specified (CURRENTLY NOT CORRECTLY SPECIFIED)
"cmat.descrip": key for how to understand confusion matrices ()
"negative.class": class assigned to probability = 0
"positive.class": class assigned to probability = 1
"dthresh": decision threshold
"z.pred": whether z-scoring of features is specified
"nresample": number of resamples
#### Generate data #### n <- 100 set.seed(1) group <- factor(sample(c('MDD','HC'),n,replace=T)) age <- rnorm(n,25,5) sex <- factor(sample(c('male','female'),n,replace=T)) rand.vals1 <- rnorm(n,0,0.75) set.seed(2) rand.vals2 <- rnorm(n,0,0.75) dd <- data.frame(group = group, age = age, sex = sex, f1 = rand.vals1 + as.numeric(group), f2 = rand.vals2) #### MODEL EXAMPLE 1 ##### ## covariates covar <- c('age','sex') ## variables of interest voi <- c('f1','f2') ## class outcome y <- 'group' ## resamples and permutations nresample <- 10 nperm <- 10 n.cores <- 1 ## 10 ## fit classification model modelObj <- fx_modelResample(df0 = dd, cv.type = '5-fold', covar = covar, voi = voi, outcome = y, model.type = 'rf', nresample = nresample, dthresh = 0.5, z.pred = F, balance.col = y, n.cores = n.cores) ## determine overall model performance modelPerfObj <- fx_modelResamplePerf(modelResampleObj = modelObj) ## permutation testing permObj <- fx_perm(df0 = dd, modelObj = modelObj, nperm = nperm, n.cores = n.cores) ## determine permutation test performance permPerfObj <- fx_permPerf(permObj = permObj, modelResamplePerf = modelPerfObj) ## Summary of performance measures based on observed data modelPerfObj$df.summary ## Outcome metrics for each resample modelPerfObj$df.iter ## Summary of permutation test outcomes permPerfObj$df.summary ## Outcome metrics for each permutation permPerfObj$df.iter ## create roc curve plot fx_rocPlot(modelObj = modelObj, modelPerfObj = modelPerfObj, permPerfObj = permPerfObj, title.text = 'My Title')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.