accest: Classification Wrapper Using Customised Classifiers

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/FIEmspro_accest.r

Description

Wrapper function for calculating classification estimates using pre-defined data partitioning sets (valipars and trainind). This function works with two type of classifiers. First generic classifiers that fulfil R standards to define predictive techniques such as the ones available in packages like MASS, e1071 or randomForest and nlda are normally handle with accest: the name of function (clmeth in the accest call) must be accompanied with an S3 method predict; the later function should return a list with component 'class' (hard classification) and if possible 'prob' or 'posterior' for class probabilities. If the algorithm doesn't fulfil these requirements, two postions can be adopted: 1) define explicitly the algorithm so that it means R standards 2) define customised a function that returns necessary informations. The second ('quicky and dirty') approach is illustrated in an example given below. Unless the classifier can only cope with two-class tasks, this function allows the manipulation of any problem complexity. Three types of estimates are given for each replication: accuracy, so-called margin and AUC (see details). Data input can be in the form of data matrix + class vector, following the classic formula type or derived from dat.sel1.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
accest(...)

## Default S3 method:
accest(dat, cl, clmeth, pars = NULL, tr.idx = NULL, verb=TRUE, clmpi=NULL, seed=NULL, ...)

## S3 method for class 'formula'
accest(formula, data = NULL, ..., subset, na.action = na.omit)

## S3 method for class 'dlist'
accest(dlist,  clmeth,pars = NULL, tr.idx = NULL, ...)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dlist

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

dat

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl

A factor specifying the class for each observation if no formula principal argument is given.

clmeth

Classifier function. For details, see note below.

pars

A list of parameters using by the resampling method such as Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout). See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

verb

Should iterations be printed out?

clmpi

snow cluster information

seed

Seed.

...

Additional parameters to be passed to clmeth.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Details

Seexxxx for common details.

Value

An object of class accest, including the components:

clmeth

Classification method used.

acc

Average accuracy.

acc.iter

Accuracy at each iteration.

acc.std

Standard derivation of accuracy.

mar

Average predictive margin.

mar.iter

Predictive margin of each iteration.

auc

Average area under receiver operating curve (AUC).

auc.iter

AUC of each iteration.

sampling

Sampling scheme used.

niter

Number of iterations.

nreps

Number of replications at each iteration.

acc.boot

Detailed bootstrap accuracy estimates when bootstrap validation method is employed.

argfct

Arguments passed to the classifier.

pred.all

For each iteration, list of the fold/bootstrap id and the true and predicted classes.

cl.task

Discrimination task.

mod

List of information return by the user defined classifier function.

Author(s)

David Enot [email protected] and Wanchang Lin [email protected]

See Also

valipars, trainind

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## -----------------------------------------------------------------
## simple customised function
## sameasrf simply reproduces the RF modelling task
sameasrf <- function(data,...){
  dots <- list(...)

  ## Build RF model and predict dat.te 
  mod  <- randomForest(data$tr,data$cl,...)
  ## Soft predictions (optional if ROC/margin analyses required)
  prob <- predict(mod,data$te,type="vote")
  ## Hard predictions 
  pred <- predict(mod,data$te)

  # For illustration, mod does not contain anything 
  list(mod=NULL,pred=pred,prob=prob,arg=dots)
}

## -----------------------------------------------------------------
## compare accest with randomForest 
## and sameasrf
data(iris)
dat=as.matrix(iris[,1:4])
cl=as.factor(iris[,5])
pars   <- valipars(sampling = "boot",niter = 2, nreps=10)
tr.idx <- trainind(iris$Species,pars)

set.seed(71)
acc.1 <- accest(dat,cl, clmeth = "sameasrf", 
                   pars = pars,tr.idx = tr.idx,ntree = 200)
summary(acc.1)

set.seed(71)
acc.2 <- accest(dat,cl, clmeth = "randomForest", 
                   pars = pars,tr.idx = tr.idx,ntree = 200)
summary(acc.2)

### compare acc.1 and acc.2 bootstrap error estimates
print(acc.1$acc.boot-acc.2$acc.boot)

#########################################
## Try formula type
set.seed(71)
acc.3 <- accest(Species~., data = iris, clmeth = "randomForest", 
                   pars = pars,tr.idx = tr.idx,ntree = 200)
summary(acc.3)

## Try dlist type from dat.sel1
set.seed(71)
dat2=dat.sel1(dat,cl,pars=pars)
acc.4 <- accest(dat2[[1]], clmeth = "randomForest", 
                   pars = pars,tr.idx = tr.idx,ntree = 200)
summary(acc.4)

wilsontom/FIEmspro documentation built on Feb. 19, 2018, 9:03 a.m.