fx_modelResample: Apply Machine Learning Framework
In fishpm/nruPredict: Miscellaneous function for assessing predictive accuracy

fx_modelResample

R Documentation

Apply Machine Learning Framework

Description

Apply machine learning framework to specified dataset

Usage

fx_modelResample(
  df0,
  cv.type = NULL,
  covar = NULL,
  voi = NULL,
  outcome = NULL,
  model.type = NULL,
  nresample = 1,
  dthresh = 0.5,
  z.pred = F,
  n.cores = 20,
  balance.col = NULL,
  partitions = NULL
)

Arguments

`df0`	data frame including all observations (data frame)
`cv.type`	cross-validation type ('loocv', 'ltocv', 'n-fold', 'numeric') (string)
`covar`	list of df0 column names for "covariate" (not of specific interest) features (string/list)
`voi`	list of df0 column names for variables/features of interest (string/list)
`outcome`	df0 column name for outcome measure to be predicted (string)
`model.type`	machine learning model ('rf', 'logistic', 'regression', 'rf.regression', 'svm') (string)
`nresample`	number of resamples (numeric)
`dthresh`	decision threshold (numeric)
`z.pred`	standardize predictive features (boolean)
`n.cores`	number of cores (parallel processes) (numeric/integer)
`balance.col`	df0 column name used for ensuring balanced columns
`partitions`	pre-defined train/test partitions

Value

A list of length five, containing the following elements:

"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
"parameters": list of relevant specified parameters

A list of length five, containing the following elements:

"perfMetrics" Model performance metrics for each individual fold and "across" and "within".
"across": sum or mean of metric across folds
"within": mean of metric across folds
- TP: true positive
- FP: false positive
- TN: true negative
- FN: false negative
- sens: sensitivity
- spec: specificity
- ppv: positive predictive value
- npv: negative predictive value
- acc: accuracy
- auc.ROC: area under the curve of ROC curve
- optThresh: optimal decision threshold determined from training data
"cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)
"cmat.full": confusion matrix of full model (at "dthresh" decision threshold)
"df.allfolds": data frame for test-related model predictions
- orig.df.row: row in original data frame for specific observation,
- fold: fold assignment
- pred.prob.covar: predicted probability of class membership from covariate model
- pred.prob.full: predicted probability of class membership from full model
- pred.class.covar: predicted class from covariate model
- pred.class.full: predicted class from full model
- actual.class: actual class membership
"parameters": list of relevant specified parameters
- "sample.type": cross-validation sampling procedure
- "class.levels": class levels
- "model.type": machine learning model framework
- "covar": specified covariates
- "voi": specified variables of interest
- "outcome": name of class being predicted
- "formula.covar": formula object for covariate model
- "formula.full": formula object for full model
- "data.frame": data frame specified (CURRENTLY NOT CORRECTLY SPECIFIED)
- "cmat.descrip": key for how to understand confusion matrices ()
- "negative.class": class assigned to probability = 0
- "positive.class": class assigned to probability = 1
- "dthresh": decision threshold
- "z.pred": whether z-scoring of features is specified
- "nresample": number of resamples

Examples

#### Generate data ####
n <- 100

set.seed(1)
group <- factor(sample(c('MDD','HC'),n,replace=T))
age <- rnorm(n,25,5)
sex <- factor(sample(c('male','female'),n,replace=T))
rand.vals1 <- rnorm(n,0,0.75)
set.seed(2)
rand.vals2 <- rnorm(n,0,0.75)
dd <- data.frame(group = group,
                 age = age,
                 sex = sex,
                 f1 = rand.vals1 + as.numeric(group),
                 f2 = rand.vals2)

#### MODEL EXAMPLE 1 #####
## covariates
covar <- c('age','sex')
## variables of interest
voi <- c('f1','f2')
## class outcome
y <- 'group'

## resamples and permutations
nresample <- 10
nperm <- 10
n.cores <- 1 ## 10

## fit classification model
modelObj <- fx_modelResample(df0 = dd, 
                             cv.type = '5-fold',
                             covar = covar, 
                             voi = voi,  
                             outcome = y,
                             model.type = 'rf',
                             nresample = nresample, 
                             dthresh = 0.5,
                             z.pred = F,
                             balance.col = y,
                             n.cores = n.cores)

## determine overall model performance
modelPerfObj <- fx_modelResamplePerf(modelResampleObj = modelObj)
## permutation testing
permObj <- fx_perm(df0 = dd, modelObj = modelObj, nperm = nperm, n.cores = n.cores)
## determine permutation test performance
permPerfObj <- fx_permPerf(permObj = permObj, modelResamplePerf = modelPerfObj)

## Summary of performance measures based on observed data
modelPerfObj$df.summary
## Outcome metrics for each resample
modelPerfObj$df.iter
## Summary of permutation test outcomes
permPerfObj$df.summary
## Outcome metrics for each permutation
permPerfObj$df.iter
## create roc curve plot
fx_rocPlot(modelObj = modelObj, modelPerfObj = modelPerfObj, permPerfObj = permPerfObj, title.text = 'My Title')

fishpm/nruPredict documentation built on July 12, 2022, 3:22 p.m.