accest: Estimate Classification Accuracy By Resampling Method

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/mt_accest.R

Description

Estimate classification accuracy rate by resampling method.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
accest(dat, ...)

## Default S3 method:
accest(dat, cl, method, pred.func=predict,pars = valipars(), 
       tr.idx = NULL, ...) 

## S3 method for class 'formula'
accest(formula, data = NULL, ..., subset, na.action = na.omit)

aam.cl(x,y,method, pars = valipars(),...)

aam.mcl(x,y,method, pars = valipars(),...)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat,x

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl,y

A factor specifying the class for each observation if no formula principal argument is given.

method

Classification method whose accuracy rate is to be estimated, such as randomForest, svm, knn and lda. For details, see note below. Either a function or a character string naming the function to be called.

pred.func

Predict method (default is predict). Either a function or a character string naming the function to be called.

pars

A list of parameters using by the resampling method such as Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout). See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

...

Additional parameters to method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Details

The accuracy rates of classification are estimated by techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification and Linear Discriminant Analysis based on resampling methods, including Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout).

Value

accest returns an object including the components:

method

Classification method used.

acc

Overall accuracy rate.

acc.iter

Average accuracy rate for each iteration.

acc.all

Accuracy rate for each iteration and replication.

auc

Overall area under receiver operating curve (AUC).

auc.iter

Average AUC for each iteration.

auc.all

AUC for each iteration and replication.

mar

Overall prediction margin.

mar.iter

Average prediction margin for each iteration.

mar.all

Prediction margin for each iteration and replication.

err

Overall error rate.

err.iter

Average error rate for each iteration.

err.all

Error rate for each iteration and replication.

sampling

Sampling scheme used.

niter

Number of iteration.

nreps

Number of replications in each iteration if resampling is not loocv.

conf

Overall confusion matrix.

res.all

All results which can be further processed.

acc.boot

A list of bootstrap accuracy such as .632 and .632+ if the resampling method is bootstrap.

aam.cl returns a vector with acc (accuracy), auc(area under ROC curve) and mar(class margin).

aam.mcl returns a matrix with columns of acc (accuracy), auc(area under ROC curve) and mar(class margin).

Note

The accest can take any classification models if their argument format is model(formula, data, subset, na.action, ...) and their corresponding method predict.model(object, newdata, ...) can either return the only predicted class label or a list with a component called class, such as lda and pcalda.

If classifier method provides posterior probabilities, the prediction margin mar will be generated, otherwise NULL.

If classifier method provides posterior probabilities and the classification is for two-class problem, auc will be generated, otherwise NULL.

aam.cl is a wrapper function of accest, returning accuracy rate, AUC and classification margin. aam.mcl accepts multiple classifiers in one running.

Author(s)

Wanchang Lin

See Also

binest, maccest, valipars, trainind, classifier

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Iris data
data(iris)
# Use KNN classifier and bootstrap for resampling
acc <- accest(Species~., data = iris, method = "knn",
              pars = valipars(sampling = "boot",niter = 2, nreps=5))
acc
summary(acc)
acc$acc.boot

# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species

## -----------------------------------------------------------------------
# Random Forest with 5-fold stratified cv 
pars   <- valipars(sampling = "cv",niter = 4, nreps=5, strat=TRUE)
tr.idx <- trainind(y,pars=pars)
acc1   <- accest(x, y, method = "randomForest", pars = pars, tr.idx=tr.idx)
acc1
summary(acc1)
# plot the accuracy in each iteration
plot(acc1)

## -----------------------------------------------------------------------
# Forensic Glass data in chap.12 of MASS
data(fgl, package = "MASS")    # in MASS package
# Randomised validation (holdout) of SVM for fgl data
acc2 <- accest(type~., data = fgl, method = "svm", cost = 100, gamma = 1, 
              pars = valipars(sampling = "rand",niter = 10, nreps=4,div = 2/3) )
              
acc2
summary(acc2)
# plot the accuracy in each iteration
plot(acc2)

## -----------------------------------------------------------------------
## Examples of amm.cl and aam.mcl
aam.1 <- aam.cl(x,y,method="svm",pars=pars)
aam.2 <- aam.mcl(x,y,method=c("svm","randomForest"),pars=pars)

## If use two classes, AUC will be calculated
idx <- (y == "setosa")
aam.3 <- aam.cl(x[!idx,],factor(y[!idx]),method="svm",pars=pars)
aam.4 <- aam.mcl(x[!idx,],factor(y[!idx]),method=c("svm","randomForest"),pars=pars)

mt documentation built on Feb. 2, 2022, 1:07 a.m.

Related to accest in mt...