Description Usage Arguments Details Value Note Author(s) References See Also Examples
Powerful function that trains and tests a particular fit model under several runs and a given validation method. Since there can be a huge number of models, the fitted models are not stored. Yet, several useful statistics (e.g. predictions) are returned.
1 2 3 4 
x 
a symbolic description (formula) of the model to be fit. If 
data 
an optional data frame (columns denote attributes, rows show examples) containing the training data, when using a formula. 
Runs 
number of runs used (e.g. 1, 5, 10, 20, 30) 
method 
a vector with c(vmethod,vpar,seed) or c(vmethod,vpar,window,increment), where vmethod is:
vpar – number used by vmethod (optional, if not defined 2/3 for

model 
See 
task 
See 
search 
See 
mpar 
Only kept for compatibility with previous 
feature 
See

scale 
See 
transform 
See 
debug 
If TRUE shows some information about each run. 
... 
See 
Powerful function that trains and tests a particular fit model under several runs and a given validation method
(see [Cortez, 2010] for more details).
Several Runs
are performed. In each run, the same validation method is adopted (e.g. holdout
) and
several relevant statistics are stored. Note: this function can require some computational effort, specially if
a large dataset and/or a high number of Runs
is adopted.
A list
with the components:
$time – vector with time elapsed for each run.
$test – vector list, where each element contains the test (target) results for each run.
$pred – vector list, where each element contains the predicted results for each test set and each run.
$error – vector with an error metric
for each run (the error depends on the metric
parameter of mpar
, valid options are explained in mmetric
).
$mpar – vector list, where each element contains the fit model mpar parameters (for each run).
$model – the model
.
$task – the task
.
$method – the external validation method
.
$sen – a matrix with the 1D sensitivity analysis input importances. The number of rows is Runs
times vpar, if kfold
, else is Runs
.
$sresponses – a vector list with a size equal to the number of attributes (useful for graph="VEC"
).
Each element contains a list with the 1D sensitivity analysis input responses
(n
– name of the attribute; l
– number of levels; x
– attribute values; y
– 1D sensitivity responses.
Important note: sresponses (and "VEC" graphs) are only available if feature="sabs"
or "simp"
related (see feature
).
$runs – the Runs
.
$attributes – vector list with all attributes (features) selected in each run (and fold if kfold
) if a feature selection algorithm is used.
$feature – the feature
.
See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html
Paulo Cortez http://www3.dsi.uminho.pt/pcortez
To check for more details about rminer and for citation purposes:
P. Cortez.
Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool.
In P. Perner (Ed.), Advances in Data Mining  Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572583, Berlin, Germany, July, 2010. Springer. ISBN: 9783642143991.
@Springer: http://www.springerlink.com/content/e7u36014r04h0334
http://www3.dsi.uminho.pt/pcortez/2010rminer.pdf
This tutorial shows additional code examples:
P. Cortez.
A tutorial on using the rminer R package for data mining tasks.
Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes,
Portugal, July 2015.
http://hdl.handle.net/1822/36210
fit
, predict.fit
, mgraph
, mmetric
, savemining
, holdout
and Importance
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116  ### dontrun is used when the execution of the example requires some computational effort.
### simple regression example
x1=rnorm(200,100,20); x2=rnorm(200,100,20)
y=0.7*sin(x1/(25*pi))+0.3*sin(x2/(25*pi))
# mining with an ensemble of neural networks, each fixed with size=2 hidden nodes
# assumes a default holdout (random split) with 2/3 for training and 1/3 for testing:
M=mining(y~x1+x2,Runs=2,model="mlpe",search=2)
print(M)
print(mmetric(M,metric="MAE"))
### more regression examples:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
# 5 runs of an external holdout with 2/3 for training and 1/3 for testing, fixed seed 12345
# feature selection: sabs method
# model selection: 5 searches for size, internal 2fold cross validation fixed seed 123
# with optimization for minimum MAE metric
M=mining(y~.,data=sin1reg,Runs=5,method=c("holdout",2/3,12345),model="mlpe",
search=list(search=mparheuristic("mlpe",n=5),method=c("kfold",2,123),metric="MAE"),
feature="sabs")
print(mmetric(M,metric="MAE"))
print(M$mpar)
print("median hidden nodes (size) and number of MLPs (nr):")
print(centralpar(M$mpar))
print("attributes used by the model in each run:")
print(M$attributes)
mgraph(M,graph="RSC",Grid=10,main="sin1 MLPE scatter plot")
mgraph(M,graph="REP",Grid=10,main="sin1 MLPE scatter plot",sort=FALSE)
mgraph(M,graph="REC",Grid=10,main="sin1 MLPE REC")
mgraph(M,graph="IMP",Grid=10,main="input importances",xval=0.1,leg=names(sin1reg))
# average influence of x1 on the model:
mgraph(M,graph="VEC",Grid=10,main="x1 VEC curve",xval=1,leg=names(sin1reg)[1])
## End(Not run)
### regression example with holdout rolling windows:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
# rolling with 20 test samples, training window size of 300 and increment of 50 in each run:
# note that Runs argument is automatically set to 14 in this example:
M=mining(y~.,data=sin1reg,method=c("holdoutrol",20,300,50),
model="mlpe",debug=TRUE)
## End(Not run)
### regression example with all rminer models:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
models=c("naive","ctree","rpart","kknn","mlp","mlpe","ksvm","randomForest","mr","mars",
"cubist","pcr","plsr","cppls","rvm")
for(model in models)
{
M=mining(y~.,data=sin1reg,method=c("holdout",2/3,12345),model=model)
cat("model:",model,"MAE:",round(mmetric(M,metric="MAE")$MAE,digits=3),"\n")
}
## End(Not run)
### classification example (task="prob")
## Not run:
data(iris)
# 10 runs of a 3fold cross validation with fixed seed 123 for generating the 3fold runs
M=mining(Species~.,iris,Runs=10,method=c("kfold",3,123),model="rpart")
print(mmetric(M,metric="CONF"))
print(mmetric(M,metric="AUC"))
print(meanint(mmetric(M,metric="AUC")))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
main="versicolor ROC")
mgraph(M,graph="LIFT",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
main="Versicolor ROC")
M2=mining(Species~.,iris,Runs=10,method=c("kfold",3,123),model="ksvm")
L=vector("list",2)
L[[1]]=M;L[[2]]=M2
mgraph(L,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg=c("DT","SVM"),main="ROC")
## End(Not run)
### other classification examples
## Not run:
### 1st example:
data(iris)
# 2 runs of an external 2fold validation, random seed
# model selection: SVM model with rbfdot kernel, automatic search for sigma,
# internal 3fold validation, random seed, minimum "AUC" is assumed
# feature selection: none, "s" is used only to store input importance values
M=mining(Species~.,data=iris,Runs=2,method=c("kfold",2,NA),model="ksvm",
search=list(search=mparheuristic("ksvm"),method=c("kfold",3)),feature="s")
print(mmetric(M,metric="AUC",TC=2))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="SVM",main="ROC",intbar=FALSE)
mgraph(M,graph="IMP",TC=2,Grid=10,main="input importances",xval=0.1,
leg=names(iris),axis=1)
mgraph(M,graph="VEC",TC=2,Grid=10,main="Petal.Width VEC curve",
data=iris,xval=4)
### 2nd example, ordered kfold, knearest neigbor:
M=mining(Species~.,iris,Runs=1,method=c("kfoldo",3),model="knn")
# confusion matrix:
print(mmetric(M,metric="CONF"))
### 3rd example, use of all rminer models:
models=c("naive","ctree","rpart","kknn","mlp","mlpe","ksvm","randomForest","bagging",
"boosting","lda","multinom","naiveBayes","qda")
models="naiveBayes"
for(model in models)
{
M=mining(Species~.,iris,Runs=1,method=c("kfold",3,123),model=model)
cat("model:",model,"ACC:",round(mmetric(M,metric="ACC")$ACC,digits=1),"\n")
}
## End(Not run)
### for more fitting examples check the help of function fit: help(fit,package="rminer")

