Diagnostic accuracy of classification models.

Description

Estimation of misclassification rate, sensitivity, specificity and AUC based on cross-validation (CV) or various bootstrap techniques.

Usage

1
2
3
Daim(formula, model = NULL, data = NULL, control = Daim.control(),			
     thres = seq(0, 1, by = 0.01), cutoff = 0.5, labpos = "1", returnSample = FALSE,
     cluster = NULL, seed.cluster = NULL, multicore = FALSE, ...)

Arguments

formula

formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric or factor.

model

function. Modelling technique whose error rate is to be estimated. The function model returns the predicted probability for each observation.

data

an optional data frame containing the variables in the model (training data).

control

See Daim.control.

thres

a numeric vector with the cutoff values.

cutoff

the cutoff value for error estimation. This can be a numeric value or a character string:
"cv" - the optimal cut-point corresponding to cv estimation of the sensitivity and the specificity.
"loob" - the optimal cut-point corresponding to loob estimation of the sensitivity and the specificity.
"0.632" - the optimal cut-point corresponding to 0.632 estimation of the sensitivity and the specificity.
"0.632+" - the optimal cut-point corresponding to 0.632+ estimation of the sensitivity and the specificity.

labpos

a character string of the response variable that defines a "positive" event. The labels of the "positive" events will be set to "pos" and others to "neg".

returnSample

a logical value for saving the data from each sample.

cluster

the name of the cluster, if parallel computing is used.

seed.cluster

an integer value used as seed for the RNG.

multicore

a logical indicating whether multiple cores (if available) should be used for the computations.

...

additional parameters passed to clusterApplyLB or mclapply.

Value

An object of class Daim-class.

References

Werner Adler and Berthold Lausen (2009).
Bootstrap Estimated True and False Positive Rates and ROC Curve.
Computational Statistics & Data Analysis, 53, (3), 718–729.

Tom Fawcett (2006).
An introduction to ROC analysis.
Pattern Recognition Letters, 27, (8).

Bradley Efron and Robert Tibshirani (1997).
Improvements on cross-validation: The.632+ bootstrap method.
Journal of the American Statistical Association, 92, (438), 548–560.

See Also

plot.Daim, performDaim, auc.Daim, roc.area.Daim

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
#############################
##      Evaluation of      ##
##           LDA           ##
#############################

library(TH.data)
library(MASS)
data(GlaucomaM)
head(GlaucomaM)

mylda <- function(formula, train, test){
  	model <- lda(formula, train)
  	predict(model, test)$posterior[,"pos"]
}
  
set.seed(1102013)
ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", 
            control=Daim.control(method="boot", number=50))
ACC
summary(ACC)

  
## Not run:   
## just because of checking time on CRAN
  
  
  ####
  #### optimal cut point determination
  ####
  
  
  set.seed(1102013)
  ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", 
              control=Daim.control(method="boot", number=50), cutoff="0.632+")
  ACC
  summary(ACC)
  
  
  
  ####
  #### for parallel execution on multicore CPUs and computer clusters
  ####
  
  library(parallel)
  ### 
  ### create cluster with two slave nodes

  cl <- makeCluster(2)

  ###
  ### Load used package on all slaves and execute Daim in parallel
  ###

  clusterEvalQ(cl, library(ipred))
  ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", cluster=cl)
  ACC


  ####
  #### for parallel computing on multicore CPUs
  ####

  ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", multicore=TRUE)
  ACC
  
  
  
  
  
  #############################
  ##      Evaluation of      ##
  ##      randomForrest      ##
  #############################
  
  
  library(randomForest)

  myRF <- function(formula, train, test){
      model <- randomForest(formula, train)
  	  predict(model,test,type="prob")[,"pos"]
  }

  ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma",
               control=Daim.control(number=50))
  ACC2
  summary(ACC2)
  
  
  ####
  #### optimal cut point determination
  ####
  
  
  set.seed(1102013)
  ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", 
              control=Daim.control(method="boot", number=50), cutoff="0.632+")
  summary(ACC2)
  
  
  
  ####
  #### for parallel execution on multicore CPUs and computer clusters
  ####
  
  
  library(parallel)
  ### 
  ### create cluster with two slave nodes

  cl <- makeCluster(2)

  ###
  ### Load used package on all slaves and execute Daim in parallel
  ###

  clusterEvalQ(cl, library(randomForest))
  ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", cluster=cl)
  ACC2

  ####
  #### for parallel computing on multicore CPUs
  ####

  ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", multicore=TRUE)
  ACC2
  
## End(Not run)