fx_modelResample: Apply Machine Learning Framework

View source: R/fx_modelResample.R

fx_modelResampleR Documentation

Apply Machine Learning Framework

Description

Apply machine learning framework to specified dataset

Usage

fx_modelResample(
  df0,
  cv.type = NULL,
  covar = NULL,
  voi = NULL,
  outcome = NULL,
  model.type = NULL,
  nresample = 1,
  dthresh = 0.5,
  z.pred = F,
  n.cores = 20,
  balance.col = NULL,
  partitions = NULL
)

Arguments

df0

data frame including all observations (data frame)

cv.type

cross-validation type ('loocv', 'ltocv', 'n-fold', 'numeric') (string)

covar

list of df0 column names for "covariate" (not of specific interest) features (string/list)

voi

list of df0 column names for variables/features of interest (string/list)

outcome

df0 column name for outcome measure to be predicted (string)

model.type

machine learning model ('rf', 'logistic', 'regression', 'rf.regression', 'svm') (string)

nresample

number of resamples (numeric)

dthresh

decision threshold (numeric)

z.pred

standardize predictive features (boolean)

n.cores

number of cores (parallel processes) (numeric/integer)

balance.col

df0 column name used for ensuring balanced columns

partitions

pre-defined train/test partitions

Value

A list of length five, containing the following elements:

  • "perfMetrics" Model performance metrics for each individual fold and "across" and "within".
    "across": sum or mean of metric across folds
    "within": mean of metric across folds

  • "cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)

  • "cmat.full": confusion matrix of full model (at "dthresh" decision threshold)

  • "df.allfolds": data frame for test-related model predictions

  • "parameters": list of relevant specified parameters

A list of length five, containing the following elements:

  • "perfMetrics" Model performance metrics for each individual fold and "across" and "within".
    "across": sum or mean of metric across folds
    "within": mean of metric across folds

    • TP: true positive

    • FP: false positive

    • TN: true negative

    • FN: false negative

    • sens: sensitivity

    • spec: specificity

    • ppv: positive predictive value

    • npv: negative predictive value

    • acc: accuracy

    • auc.ROC: area under the curve of ROC curve

    • optThresh: optimal decision threshold determined from training data

  • "cmat.covar": confusion matrix of covariate model (at "dthresh" decision threshold)

  • "cmat.full": confusion matrix of full model (at "dthresh" decision threshold)

  • "df.allfolds": data frame for test-related model predictions

    • orig.df.row: row in original data frame for specific observation,

    • fold: fold assignment

    • pred.prob.covar: predicted probability of class membership from covariate model

    • pred.prob.full: predicted probability of class membership from full model

    • pred.class.covar: predicted class from covariate model

    • pred.class.full: predicted class from full model

    • actual.class: actual class membership

  • "parameters": list of relevant specified parameters

    • "sample.type": cross-validation sampling procedure

    • "class.levels": class levels

    • "model.type": machine learning model framework

    • "covar": specified covariates

    • "voi": specified variables of interest

    • "outcome": name of class being predicted

    • "formula.covar": formula object for covariate model

    • "formula.full": formula object for full model

    • "data.frame": data frame specified (CURRENTLY NOT CORRECTLY SPECIFIED)

    • "cmat.descrip": key for how to understand confusion matrices ()

    • "negative.class": class assigned to probability = 0

    • "positive.class": class assigned to probability = 1

    • "dthresh": decision threshold

    • "z.pred": whether z-scoring of features is specified

    • "nresample": number of resamples

Examples

#### Generate data ####
n <- 100

set.seed(1)
group <- factor(sample(c('MDD','HC'),n,replace=T))
age <- rnorm(n,25,5)
sex <- factor(sample(c('male','female'),n,replace=T))
rand.vals1 <- rnorm(n,0,0.75)
set.seed(2)
rand.vals2 <- rnorm(n,0,0.75)
dd <- data.frame(group = group,
                 age = age,
                 sex = sex,
                 f1 = rand.vals1 + as.numeric(group),
                 f2 = rand.vals2)

#### MODEL EXAMPLE 1 #####
## covariates
covar <- c('age','sex')
## variables of interest
voi <- c('f1','f2')
## class outcome
y <- 'group'

## resamples and permutations
nresample <- 10
nperm <- 10
n.cores <- 1 ## 10

## fit classification model
modelObj <- fx_modelResample(df0 = dd, 
                             cv.type = '5-fold',
                             covar = covar, 
                             voi = voi,  
                             outcome = y,
                             model.type = 'rf',
                             nresample = nresample, 
                             dthresh = 0.5,
                             z.pred = F,
                             balance.col = y,
                             n.cores = n.cores)

## determine overall model performance
modelPerfObj <- fx_modelResamplePerf(modelResampleObj = modelObj)
## permutation testing
permObj <- fx_perm(df0 = dd, modelObj = modelObj, nperm = nperm, n.cores = n.cores)
## determine permutation test performance
permPerfObj <- fx_permPerf(permObj = permObj, modelResamplePerf = modelPerfObj)

## Summary of performance measures based on observed data
modelPerfObj$df.summary
## Outcome metrics for each resample
modelPerfObj$df.iter
## Summary of permutation test outcomes
permPerfObj$df.summary
## Outcome metrics for each permutation
permPerfObj$df.iter
## create roc curve plot
fx_rocPlot(modelObj = modelObj, modelPerfObj = modelPerfObj, permPerfObj = permPerfObj, title.text = 'My Title')

fishpm/nruPredict documentation built on July 12, 2022, 3:22 p.m.