classifier: Wrapper Function for Classifiers

Description Usage Arguments Value Note Author(s) See Also Examples

View source: R/mt_accest.R

Description

Wrapper function for classifiers. The classification model is built up on the training data and error estimation is performed on the test data.

Usage

1
2
classifier(dat.tr, cl.tr, dat.te=NULL, cl.te=NULL, method,
           pred.func=predict,...)

Arguments

dat.tr

A data frame or matrix of training data. The classification model are built on it.

cl.tr

A factor or vector of training class.

dat.te

A data frame or matrix of test data. Error rates are calculated on this data set.

cl.te

A factor or vector of test class.

method

Classification method to be used. Any classification methods can be employed if they have method predict (except knn) with output of predicted class label or one component with name of class in the returned list, such as randomForest, svm, knn and lda. Either a function or a character string naming the function to be called

pred.func

Predict method (default is predict). Either a function or a character string naming the function to be called.

...

Additional parameters to method.

Value

A list including components:

err

Error rate of test data.

cl

The original class of test data.

pred

The predicted class of test data.

posterior

Posterior probabilities for the classes if method provides posterior output.

acc

Accuracy rate of classification.

margin

The margin of predictions from classifier method if it provides posterior output.

The margin of a data point is defined as the proportion of probability for the correct class minus maximum proportion of probabilities for the other classes. Positive margin means correct classification, and vice versa. This idea come from package randomForest. For more details, see margin.

auc

The area under receiver operating curve (AUC) if classifier method produces posterior probabilities and the classification is for two-class problem.

Note

The definition of margin is based on the posterior probabilities. Classifiers, such as randomForest, svm, lda, qda, pcalda and plslda, do output posterior probabilities. But knn does not.

Author(s)

Wanchang Lin

See Also

accest, maccest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
data(abr1)
dat <- preproc(abr1$pos[,110:500], method="log10")  
cls <- factor(abr1$fact$class)        

## tmp <- dat.sel(dat, cls, choices=c("1","2"))
## dat <- tmp[[1]]$dat
## cls <- tmp[[1]]$cls

idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace = FALSE) 
## constrcuct train and test data 
train.dat  <- dat[idx,]
train.cl   <- cls[idx]
test.dat   <- dat[-idx,]       
test.cl    <- cls[-idx] 

## estimates accuracy
res <- classifier(train.dat, train.cl, test.dat, test.cl, 
                  method="randomForest")
res
## get confusion matrix
cl.rate(obs=res$cl, res$pred)   ## same as: cl.rate(obs=test.cl, res$pred)

## Measurements of Forensic Glass Fragments
data(fgl, package = "MASS")    # in MASS package
dat <- subset(fgl, grepl("WinF|WinNF",type))
## dat <- subset(fgl, type %in% c("WinF", "WinNF"))
x   <- subset(dat, select = -type)
y   <- factor(dat$type)

## construct train and test data 
idx   <- sample(1:nrow(x), round((2/3)*nrow(x)), replace = FALSE) 
tr.x  <- x[idx,]
tr.y  <- y[idx]
te.x  <- x[-idx,]        
te.y  <- y[-idx] 

res.1 <- classifier(tr.x, tr.y, te.x, te.y, method="svm")
res.1
cl.rate(obs=res.1$cl, res.1$pred) 

## classification performance for the two-class case.
pos <- "WinF"                              ## select positive level
cl.perf(obs=res.1$cl, pre=res.1$pred, pos=pos)
## ROC and AUC
cl.roc(stat=res.1$posterior[,pos],label=res.1$cl, pos=pos)

mt documentation built on Feb. 2, 2022, 1:07 a.m.