Description Usage Arguments Details Value Author(s) References See Also Examples

This function obtains hold-out and random sub-sampling estimates of performance metrics for a given predictive task and method to solve it (i.e. a workflow). The function is general in the sense that the workflow function that the user provides as the solution to the task, can implement or call whatever modeling technique the user wants.

The function implements hold-out and random sub-sampling (repeated
hold-out) estimation. Different settings concerning this methodology are
available through the argument `estTask`

(check the help page of
`Holdout`

).

Please note that most of the times you will not call this function
directly (though there is nothing wrong in doing it) but instead you
will use the function `performanceEstimation`

, that allows you to
carry out performance estimation for multiple workflows on multiple tasks,
using some estimation method you want (e.g. hold-out). Still, when you
simply want to have the hold-out estimate of one workflow on one task,
you may prefer to use this function directly.

1 | ```
hldEstimates(wf,task,estTask,cluster)
``` |

`wf` |
an object of the class |

`task` |
an object of the class |

`estTask` |
an object of the class |

`cluster` |
an optional parameter that can either be |

The idea of this function is to carry out a hold-out
experiment with the goal of obtaining reliable estimates of the
predictive performance of a certain approach to a predictive
task. This approach (denoted here as a *workflow*) will be evaluated on
the given predictive task using some user-selected metrics,
and this function will
provide hold-out or random sub-sampling estimates of the true value of these
evaluation metrics. Hold-out estimates are obtained by randomly
partition the given data set into train and test sets. The training
set is used to obtain a model for the predictive task, which is then
tested by making predictions for the test set. This random split of
the given data can be repeated several times leading to what is
usually known as random sub-sampling estimates. In the end the average of
the scores over the several repetitions (if using *pure*
hold-out this is only one) are the hold-out estimates of the selected
metrics.

Parallel execution of the estimation experiment is only recommended for minimally large data sets otherwise you may actually increase the computation time due to communication costs between the processes.

The result of the function is an object of class `EstimationResults`

.

Luis Torgo ltorgo@dcc.fc.up.pt

Torgo, L. (2014) *An Infra-Structure for Performance
Estimation and Experimental Comparison of Predictive Models in R*. arXiv:1412.0436 [cs.MS]
http://arxiv.org/abs/1412.0436

`Holdout`

,
`Workflow`

,
`standardWF`

,
`PredTask`

,
`EstimationTask`

,
`performanceEstimation`

,
`cvEstimates`

,
`bootEstimates`

,
`loocvEstimates`

,
`mcEstimates`

,
`EstimationResults`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ```
## Not run:
## Estimating the mean absolute error and the normalized mean squared
## error of rpart on the swiss data, using 70%-30% hold-out
library(e1071)
data(swiss)
## Now the evaluation
eval.res <- hldEstimates(
Workflow(wf="standardWF",wfID="svmApproach",
learner="svm",learner.pars=list(cost=10,gamma=0.1)
),
PredTask(Infant.Mortality ~ ., swiss),
EstimationTask(metrics=c("mae","nmse"),
method=Holdout(nReps=5,hldSz=0.3))
)
## Check a summary of the results
summary(eval.res)
## An example with a user-defined workflow function implementing a
## simple approach using linear regression models but also containing
## some data-preprocessing and well as results post-processing.
myLM <- function(form,train,test,k=10,.outModel=FALSE) {
require(DMwR)
## fill-in NAs on both the train and test sets
ntr <- knnImputation(train,k)
nts <- knnImputation(test,k,distData=train)
## obtain a linear regression model and simplify it
md <- lm(form,ntr)
md <- step(md)
## get the model predictions
p <- predict(md,nts)
## post-process the predictions (this is an example assuming the target
## variable is always positive so we change negative predictions into 0)
p <- ifelse(p < 0,0,p)
## now get the final return object
res <- list(trues=responseValues(form,nts), preds=p)
if (.outModel) res <- c(res,list(model=m))
res
}
## Now for the Holdout estimation
data(algae,package="DMwR")
eval.res2 <- hldEstimates(
Workflow(wf="myLM",k=5),
PredTask(a1 ~ ., algae[,1:12],"alga1"),
EstimationTask("mse",method=Holdout(nReps=5)))
## Check a summary of the results
summary(eval.res2)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.