View source: R/fx_modelResample.R
| fx_modelResample | R Documentation |
Apply machine learning framework to specified dataset
fx_modelResample( df0, cv.type = NULL, covar = NULL, voi = NULL, outcome = NULL, model.type = NULL, nresample = 1, dthresh = 0.5, z.pred = F, n.cores = 20, balance.col = NULL, partitions = NULL )
df0 |
data frame including all observations (data frame) |
cv.type |
cross-validation type ('loocv', 'ltocv', 'n-fold', 'numeric') (string) |
covar |
list of df0 column names for "covariate" (not of specific interest) features (string/list) |
voi |
list of df0 column names for variables/features of interest (string/list) |
outcome |
df0 column name for outcome measure to be predicted (string) |
model.type |
machine learning model ('rf', 'logistic', 'regression', 'rf.regression', 'svm') (string) |
nresample |
number of resamples (numeric) |
dthresh |
decision threshold (numeric) |
z.pred |
standardize predictive features (boolean) |
n.cores |
number of cores (parallel processes) (numeric/integer) |
balance.col |
df0 column name used for ensuring balanced columns |
partitions |
pre-defined train/test partitions |
A list of length five, containing the following elements:
"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
"parameters": list of relevant specified parameters
A list of length five, containing the following elements:
"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
TP: true positive
FP: false positive
TN: true negative
FN: false negative
sens: sensitivity
spec: specificity
ppv: positive predictive value
npv: negative predictive value
acc: accuracy
auc.ROC: area under the curve of ROC curve
optThresh: optimal decision threshold determined from training data
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
orig.df.row: row in original data frame for specific observation,
fold: fold assignment
pred.prob.covar: predicted probability of class membership from covariate model
pred.prob.full: predicted probability of class membership from full model
pred.class.covar: predicted class from covariate model
pred.class.full: predicted class from full model
actual.class: actual class membership
"parameters": list of relevant specified parameters
"sample.type": cross-validation sampling procedure
"class.levels": class levels
"model.type": machine learning model framework
"covar": specified covariates
"voi": specified variables of interest
"outcome": name of class being predicted
"formula.covar": formula object for covariate model
"formula.full": formula object for full model
"data.frame": data frame specified (CURRENTLY NOT CORRECTLY SPECIFIED)
"cmat.descrip": key for how to understand confusion matrices ()
"negative.class": class assigned to probability = 0
"positive.class": class assigned to probability = 1
"dthresh": decision threshold
"z.pred": whether z-scoring of features is specified
"nresample": number of resamples
#### Generate data ####
n <- 100
set.seed(1)
group <- factor(sample(c('MDD','HC'),n,replace=T))
age <- rnorm(n,25,5)
sex <- factor(sample(c('male','female'),n,replace=T))
rand.vals1 <- rnorm(n,0,0.75)
set.seed(2)
rand.vals2 <- rnorm(n,0,0.75)
dd <- data.frame(group = group,
age = age,
sex = sex,
f1 = rand.vals1 + as.numeric(group),
f2 = rand.vals2)
#### MODEL EXAMPLE 1 #####
## covariates
covar <- c('age','sex')
## variables of interest
voi <- c('f1','f2')
## class outcome
y <- 'group'
## resamples and permutations
nresample <- 10
nperm <- 10
n.cores <- 1 ## 10
## fit classification model
modelObj <- fx_modelResample(df0 = dd,
cv.type = '5-fold',
covar = covar,
voi = voi,
outcome = y,
model.type = 'rf',
nresample = nresample,
dthresh = 0.5,
z.pred = F,
balance.col = y,
n.cores = n.cores)
## determine overall model performance
modelPerfObj <- fx_modelResamplePerf(modelResampleObj = modelObj)
## permutation testing
permObj <- fx_perm(df0 = dd, modelObj = modelObj, nperm = nperm, n.cores = n.cores)
## determine permutation test performance
permPerfObj <- fx_permPerf(permObj = permObj, modelResamplePerf = modelPerfObj)
## Summary of performance measures based on observed data
modelPerfObj$df.summary
## Outcome metrics for each resample
modelPerfObj$df.iter
## Summary of permutation test outcomes
permPerfObj$df.summary
## Outcome metrics for each permutation
permPerfObj$df.iter
## create roc curve plot
fx_rocPlot(modelObj = modelObj, modelPerfObj = modelPerfObj, permPerfObj = permPerfObj, title.text = 'My Title')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.