This function performs a Monte Carlo experiment with the goal of estimating the performance of a given approach (a workflow) on a certain time series prediction task. The function is general in the sense that the workflow function that the user provides as the solution to the task, can implement or call whatever modeling technique the user wants.
The function implements Monte Carlo estimation and different settings
concerning this methodology are available through the argument
estTask (check the help page of
Please note that most of the times you will not call this function
directly, though there is nothing wrong in doing it, but instead you
will use the function
performanceEstimation, that allows you to
carry out performance estimation for multiple workflows on multiple tasks,
using some estimation method. Still, when you
simply want to have the Monte Carlo estimates for one workflow on one task,
you may prefer to use this function directly.
an object of the class
an object of the class
an object of the class
A boolean value controlling the level of output of the function
execution, defaulting to
an optional parameter that can either be
This function provides reliable estimates of a set of evaluation statistics through a Monte Carlo experiment. The user supplies a worflow function and a data set of a time series forecasting task, together with the estimation task. This task should include both the metrics to be estimated as well as the settings of the estimation methodology (MOnte Carlo) that include, among others, the size of the training (TR) and testing sets (TS) and the number of repetitions (R) of the train+test cycle. The function randomly selects a set of R numbers in the time interval [TR+1,NDS-TS+1], where NDS is the size of the full data set. For each of these R numbers the previous TR observations of the data set are used to learn a model and the subsequent TS observations for testing it and obtaining the wanted evaluation metrics. The resulting R estimates of the evaluation metrics are averaged at the end of this process resulting in the Monte Carlo estimates of these metrics.
This function is targeted at obtaining estimates of performance for time series prediction problems. The reason is that the experimental repetitions ensure that the order of the rows in the original data set are never swaped, as these rows are assumed to be ordered by time. This is an important issue to ensure that a prediction model is never tested on past observations of the time series.
For each train+test iteration the provided workflow function is called
and should return the predictions of the workflow for the given test
period. To carry out this train+test iteration the user may use the
standard time series workflow that is provided (check the help page of
timeseriesWF), or may provide hers/his own workflow that
should return a list as result. See the Examples section below for an
example of these functions. Further examples are given in the package
Parallel execution of the estimation experiment is only recommended for minimally large data sets otherwise you may actually increase the computation time due to communication costs between the processes.
The result of the function is an object of class
Luis Torgo firstname.lastname@example.org
Torgo, L. (2014) An Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R. arXiv:1412.0436 [cs.MS] http://arxiv.org/abs/1412.0436
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## The following is a small illustrative example using the quotes of the ## SP500 index. This example estimates the performance of a random ## forest on a illustrative example of trying to forecast the future ## variations of the adijusted close prices of the SP500 using a few ## predictors. The random forest is evaluated on 4 repetitions of a ## monte carlo experiment where 30% of the data is used for training ## the model that is then used to make predictions for the next 20%, ## using a sliding window approach with a relearn step of 10 periods ## (check the help page of the timeseriesWF() function to understand ## these and other settings) ## Not run: library(quantmod) library(randomForest) getSymbols('^GSPC',from='2008-01-01',to='2012-12-31') data.model <- specifyModel(Next(100*Delt(Ad(GSPC))) ~ Delt(Ad(GSPC),k=1:10)+Delt(Vo(GSPC),k=1:3)) data <- as.data.frame(modelData(data.model)) colnames(data) <- 'PercVarClose' spExp <- mcEstimates(Workflow("timeseriesWF",wfID="rfTrial", type="slide",relearn.step=10, learner='randomForest'), PredTask(PercVarClose ~ .,data,"sp500"), EstimationTask(metrics=c("mse","theil"), method=MonteCarlo(nReps=4,szTrain=.3,szTest=.2))) summary(spExp) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.