Description Usage Arguments Details Value Examples
This function calculates resampling based performance measures over a grid of tuning parameters for one of the implemented classifiers (one-class SVM, biased SVM, maxent).
1 2 3 |
x |
a data frame with the training data. The samples are in the rows and the features in the columns. |
y |
a vector containing the labels encoding if a sample is positive or unlabeled. |
positive |
The positive class in |
method |
a one-class classification method. Implemented are |
metric |
A performance metric for positive/unlabeled data used for model selection.
Default for |
trControl |
see |
index |
a list of training indices for the resampling iterations. This will be passed
to the identically named argument of the |
summaryFunction |
a function to compute performance metrics across resamples. This will be passed
to the identically named argument of the |
allowParallel |
enable or disable parallel processing. Even if |
verboseIter |
Logical for printing progress, does only work if parallel processing is disabled (defaults to |
... |
other arguments that can be passed to train. Be careful with trainControl... ! |
trainOcc
calls train
and returns an object of class
trainOcc
which is a child of train
, i.e. methods defined in caret
for train
can also be used.
Via the trControl
argument you can customize the way how train acts
(see trainControl
) but note the following (see also the example, where the (trainOcc-) defaults of
trControl
are given):
make sure that you define a suitable summaryFunction
functions which
defines returns metrics for positive/unlabeled data (default: puSummary
).
classProbs
has to be set to TRUE
if the continuous outputs of
the one-class classifier are required to calculate all performance metric(s), i.e. the ones
which rely on the continuous predictions, such as the puAuc
.
savePredictions
and returnResamp
should also be set to
TRUE
in order to make all diagnostic methods available for later analaysis.
A trainOcc
object with is a child of the object train
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ## Not run:
## a synthetic data set
data(bananas)
## this is the default setting of trControl in trainOcc
cntrl <- trainControl(method = "cv",
number = 10,
summaryFunction = puSummary, #!
classProbs = TRUE, #!
savePredictions = TRUE, #!
returnResamp = "all", #!
allowParallel = TRUE)
## but lets use repeated k-fold cross-validation
set.seed(123)
rcv.idx <- createMultiFolds(puFactor(bananas$tr[,1]), k=5, times=5)
cntrl <- trainControl(index = rcv.idx,
summaryFunction = puSummary,
classProbs = TRUE,
savePredictions = TRUE,
returnResamp = "all",
allowParallel = TRUE)
tocc <- trainOcc(x=bananas$tr[, -1], y=bananas$tr[, 1], trControl=cntrl)
## be aware that the PU-performance metrics are not always choosing the
## optimal model
## you may want to investigate other performance metrics stored in the
## model selection table.
tocc
## neatly arranged by sorting
sort(tocc, by="puF")
## particularly the true positive rate (tpr) and the probability of
## positive prediction (ppp) are informative. you want to find a model
## with high tpr but low ppp.
plot_PPPvsTPR(tocc)
## based on this plot you may want to select candidate models for more
## thoroughly evaluation: use identifyPoints=TRUE
\dontrun{ candiModels <- plot_PPPvsTPR(tocc, identifyPoints=TRUE) }
## the former assignment returns a list like the one created here:
candiModels <- modelPosition(tocc, modRow=c(80, 86, 44))
## plot the resampling distributions
resamps <- resamples(tocc, modRow=candiModels$row)
bwplot(resamps, scales="free")
## also the diagnostic distributions plot can be help
## therefore (a large subset of ) the unlabeled data needs to be predicted
tocc.m80 <- update(tocc, modRow=candiModels$row[1]) # set the final model
pred.m80 <- predict(tocc, bananas$x) # predict
tocc.m86 <- update(tocc, modRow=candiModels$row[2])
pred.m86 <- predict(tocc, bananas$x)
tocc.m44 <- update(tocc, modRow=candiModels$row[3])
pred.m44 <- predict(tocc, bananas$x)
par(mfrow=c(1,3))
hist(tocc.m80, pred.m80, th=0)
hist(tocc.m86, pred.m86, th=0)
hist(tocc.m44, pred.m44, th=0)
## here we can also see the model in the 2D feature space. this is usually
## not possible because the feature space is high diminsional.
par(mfrow=c(1,1))
featurespace(tocc.m80, th=0)
featurespace(tocc.m86, th=0)
featurespace(tocc.m44, th=0)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.