Description Usage Arguments Details Value Author(s) References See Also Examples
This function performs a Monte Carlo experiment with the goal of estimating the performance of a learner on a data set. This is a generic function in the sense that it can be used with any learner, data set and performance metrics. This is achieved by requiring the user to supply a function that takes care of the learning, testing and evaluation of the learner. This function is called for each iteration of the Monte Carlo experiment.
| 1 | monteCarlo(learner, data.set, mcSet, itsInfo = F, verbose = T)
 | 
| learner | This is an object of the class  | 
| data.set | This is an object of the class  | 
| mcSet | This is an object of the class  | 
| itsInfo | Boolean value determining whether the object returned by the function should include as an attribute a list with as many components as there are iterations in the experimental process, with each component containing information that the user-defined function decides to return on top of the standard error statistics. See the Details section for more information. | 
| verbose | A boolean value controlling the level of output of the function
execution, defaulting to  | 
This function estimates a set of evaluation statistics through a Monte Carlo experiment. The user supplies a learning system and a data set, together with the experiment settings. These settings should specify, among others, the size of the training (TR) and testing sets (TS) and the number of repetitions (R) of the train+test cycle. The function randomly selects a set of R numbers in the interval [TR+1,NDS-TS+1], where NDS is the size of the data set. For each of these R numbers the previous TR observations of the data set are used to learn a model and the subsequent TS observations for testing it and obtaining the wanted evaluation statistics. The resulting R estimates of the evaluation statistics are averaged at the end of this process resulting in the Monte Carlo estimates of these metrics.
This function is particularly adequate for obtaining estimates of performance for time series prediction problems. The reason is that the experimental repetitions ensure that the order of the rows in the original data set are never swaped. If this order is related to time stamps, as is the case in time series, this is an important issue to ensure that a prediction model is never tested on past observations of the time series.
If the itsInfo parameter is set to the value
TRUE then the hldRun object that is the result
of the function will have an attribute named itsInfo
that will contain extra information from the individual repetitions of
the hold out process. This information can be accessed by the user by
using the function attr(),
e.g. attr(returnedObject,'itsInfo'). For this
information to be collected on this attribute the user needs to code
its user-defined functions in a way that it returns the vector of the
evaluation statistics with an associated attribute named
itInfo (note that it is "itInfo" and not "itsInfo" as
above), which should be a list containing whatever information the
user wants to collect on each repetition. This apparently complex
infra-structure allows you to pass whatever information you which from
each iteration of the experimental process. A typical example is the
case where you want to check the individual predictions of the model
on each test case of each repetition. You could pass this vector of
predictions as a component of the list forming the attribute
itInfo of the statistics returned by your user-defined
function. In the end of the experimental process you will be able to
inspect/use these predictions by inspecting the attribute
itsInfo of the mcRun object returned by the
monteCarlo() function. See the Examples section on the help page
of the function holdout() for an
illustration of this potentiality.
The result of the function is an object of class mcRun.
Luis Torgo ltorgo@dcc.fc.up.pt
Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).
http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR
experimentalComparison,
mcRun,
mcSettings,
slidingWindowTest, growingWindowTest,
crossValidation, holdOut, loocv, bootstrap
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ## The following is an example of a possible approach to a time series
## problem, although in this case the used data is clearly not a time
## series being selected only for illustration purposes
data(swiss)
## The base learner used in the experiment
mc.rpartXse <- function(form, train, test, ...) {
    model <- rpartXse(form, train, ...)
    preds <- predict(model, test)
    regr.eval(resp(form, test), preds,
              stats=c('mae','nmse'), train.y=resp(form, train))
}
## Estimate the MAE and NMSE of the learner rpartXse when asked to
## obtain predictions for a test set with 10 observations given a
## training set with 20 observations. The predictions for the 10
## observations are obtained using a sliding window learn+test approach
## (see the help of function slidingWindowTest() ) with a
## model re-learning step of 5 observations.
## Estimates are obtained by repeating 10 times the train+test process
x <- monteCarlo(learner("slidingWindowTest",
                      pars=list(learner=learner("mc.rpartXse",pars=list(se=1)),
                                relearn.step=5
                               )
                        ),
                      dataset(Infant.Mortality ~ ., swiss),
                      mcSettings(10,20,10,1234)
                 )
summary(x)
 | 
Loading required package: lattice
Loading required package: grid
 10  repetitions Monte Carlo Simulation using: 
	 seed =  1234 
	 train size =  20  cases 
	 test size =  10  cases 
Repetition  1 
	 start test =  21 ; test size =  10 
**
Repetition  2 
	 start test =  23 ; test size =  10 
**
Repetition  3 
	 start test =  25 ; test size =  10 
**
Repetition  4 
	 start test =  27 ; test size =  10 
**
Repetition  5 
	 start test =  29 ; test size =  10 
**
Repetition  6 
	 start test =  30 ; test size =  10 
**
Repetition  7 
	 start test =  31 ; test size =  10 
**
Repetition  8 
	 start test =  33 ; test size =  10 
**
Repetition  9 
	 start test =  36 ; test size =  10 
**
Repetition  10 
	 start test =  38 ; test size =  10 
**
== Summary of a Monte Carlo Simulation Experiment ==
 10  repetitions Monte Carlo Simulation using: 
	 seed =  1234 
	 train size =  20  cases 
	 test size =  10  cases 
* Data set ::  swiss
* Learner  ::  "slidingWindowTest"  with parameters 
	 learner  =  <S4 object of class structure("learner", package = "DMwR")> 
	 relearn.step  =  5 
* Summary of Experiment Results:
             mae      nmse      mae      nmse
avg     2.357285 1.2587104 1.776769 1.2186379
std     1.050826 0.5454862 1.001153 0.4632991
min     1.265000 1.0000000 0.698000 1.0000000
max     4.360000 2.3129666 3.827692 2.1924693
invalid 0.000000 0.0000000 0.000000 0.0000000
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.