Description Usage Arguments Details Value Note Author(s) References See Also Examples
Powerful function that trains and tests a particular fit model under several runs and a given validation method. Since there can be a huge number of models, the fitted models are not stored. Yet, several useful statistics (e.g. predictions) are returned.
1 2 3 4 |
x |
a symbolic description (formula) of the model to be fit. If |
data |
an optional data frame (columns denote attributes, rows show examples) containing the training data, when using a formula. |
Runs |
number of runs used (e.g. 1, 5, 10, 20, 30) |
method |
a vector with c(vmethod,vpar,seed) or c(vmethod,vpar,window,increment), where vmethod is:
vpar – number used by vmethod (optional, if not defined 2/3 for
|
model |
See |
task |
See |
search |
See |
mpar |
Only kept for compatibility with previous |
feature |
See
|
scale |
See |
transform |
See |
debug |
If TRUE shows some information about each run. |
... |
See |
Powerful function that trains and tests a particular fit model under several runs and a given validation method
(see [Cortez, 2010] for more details).
Several Runs
are performed. In each run, the same validation method is adopted (e.g. holdout
) and
several relevant statistics are stored. Note: this function can require some computational effort, specially if
a large dataset and/or a high number of Runs
is adopted.
A list
with the components:
$object – fitted object values of the last run (used by multiple model fitting: "auto" mode). For "holdout", it is equal to a fit
object, while for "kfold" it is a list.
$time – vector with time elapsed for each run.
$test – vector list, where each element contains the test (target) results for each run.
$pred – vector list, where each element contains the predicted results for each test set and each run.
$error – vector with a (validation) measure (often it is a error value) according to search$metric
for each run (valid options are explained in mmetric
).
$mpar – vector list, where each element contains the fit model mpar parameters (for each run).
$model – the model
.
$task – the task
.
$method – the external validation method
.
$sen – a matrix with the 1-D sensitivity analysis input importances. The number of rows is Runs
times vpar, if kfold
, else is Runs
.
$sresponses – a vector list with a size equal to the number of attributes (useful for graph="VEC"
).
Each element contains a list with the 1-D sensitivity analysis input responses
(n
– name of the attribute; l
– number of levels; x
– attribute values; y
– 1-D sensitivity responses.
Important note: sresponses (and "VEC" graphs) are only available if feature="sabs"
or "simp"
related (see feature
).
$runs – the Runs
.
$attributes – vector list with all attributes (features) selected in each run (and fold if kfold
) if a feature selection algorithm is used.
$feature – the feature
.
See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html
Paulo Cortez http://www3.dsi.uminho.pt/pcortez/
To check for more details about rminer and for citation purposes:
P. Cortez.
Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool.
In P. Perner (Ed.), Advances in Data Mining - Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572-583, Berlin, Germany, July, 2010. Springer. ISBN: 978-3-642-14399-1.
@Springer: https://link.springer.com/chapter/10.1007/978-3-642-14400-4_44
http://www3.dsi.uminho.pt/pcortez/2010-rminer.pdf
This tutorial shows additional code examples:
P. Cortez.
A tutorial on using the rminer R package for data mining tasks.
Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes,
Portugal, July 2015.
http://hdl.handle.net/1822/36210
For the grid search and other optimization methods:
P. Cortez.
Modern Optimization with R.
Use R! series, Springer, September 2014, ISBN 978-3-319-08262-2.
https://www.springer.com/gp/book/9783319082622
fit
, predict.fit
, mparheuristic
, mgraph
, mmetric
, savemining
, holdout
and Importance
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | ### dontrun is used when the execution of the example requires some computational effort.
### simple regression example
set.seed(123); x1=rnorm(200,100,20); x2=rnorm(200,100,20)
y=0.7*sin(x1/(25*pi))+0.3*sin(x2/(25*pi))
# mining with an ensemble of neural networks, each fixed with size=2 hidden nodes
# assumes a default holdout (random split) with 2/3 for training and 1/3 for testing:
M=mining(y~x1+x2,Runs=2,model="mlpe",search=2)
print(M)
print(mmetric(M,metric="MAE"))
### more regression examples:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
# 5 runs of an external holdout with 2/3 for training and 1/3 for testing, fixed seed 12345
# feature selection: sabs method
# model selection: 5 searches for size, internal 2-fold cross validation fixed seed 123
# with optimization for minimum MAE metric
M=mining(y~.,data=sin1reg,Runs=5,method=c("holdout",2/3,12345),model="mlpe",
search=list(search=mparheuristic("mlpe",n=5),method=c("kfold",2,123),metric="MAE"),
feature="sabs")
print(mmetric(M,metric="MAE"))
print(M$mpar)
print("median hidden nodes (size) and number of MLPs (nr):")
print(centralpar(M$mpar))
print("attributes used by the model in each run:")
print(M$attributes)
mgraph(M,graph="RSC",Grid=10,main="sin1 MLPE scatter plot")
mgraph(M,graph="REP",Grid=10,main="sin1 MLPE scatter plot",sort=FALSE)
mgraph(M,graph="REC",Grid=10,main="sin1 MLPE REC")
mgraph(M,graph="IMP",Grid=10,main="input importances",xval=0.1,leg=names(sin1reg))
# average influence of x1 on the model:
mgraph(M,graph="VEC",Grid=10,main="x1 VEC curve",xval=1,leg=names(sin1reg)[1])
## End(Not run)
### regression example with holdout rolling windows:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
# rolling with 20 test samples, training window size of 300 and increment of 50 in each run:
# note that Runs argument is automatically set to 14 in this example:
M=mining(y~.,data=sin1reg,method=c("holdoutrol",20,300,50),
model="mlpe",debug=TRUE)
## End(Not run)
### regression example with all rminer models:
## Not run:
# simple nonlinear regression task; x3 is a random variable and does not influence y:
data(sin1reg)
models=c("naive","ctree","rpart","kknn","mlp","mlpe","ksvm","randomForest","mr","mars",
"cubist","pcr","plsr","cppls","rvm")
for(model in models)
{
M=mining(y~.,data=sin1reg,method=c("holdout",2/3,12345),model=model)
cat("model:",model,"MAE:",round(mmetric(M,metric="MAE")$MAE,digits=3),"\n")
}
## End(Not run)
### classification example (task="prob")
## Not run:
data(iris)
# 10 runs of a 3-fold cross validation with fixed seed 123 for generating the 3-fold runs
M=mining(Species~.,iris,Runs=10,method=c("kfold",3,123),model="rpart")
print(mmetric(M,metric="CONF"))
print(mmetric(M,metric="AUC"))
print(meanint(mmetric(M,metric="AUC")))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
main="versicolor ROC")
mgraph(M,graph="LIFT",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
main="Versicolor ROC")
M2=mining(Species~.,iris,Runs=10,method=c("kfold",3,123),model="ksvm")
L=vector("list",2)
L[[1]]=M;L[[2]]=M2
mgraph(L,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg=c("DT","SVM"),main="ROC")
## End(Not run)
### other classification examples
## Not run:
### 1st example:
data(iris)
# 2 runs of an external 2-fold validation, random seed
# model selection: SVM model with rbfdot kernel, automatic search for sigma,
# internal 3-fold validation, random seed, minimum "AUC" is assumed
# feature selection: none, "s" is used only to store input importance values
M=mining(Species~.,data=iris,Runs=2,method=c("kfold",2,NA),model="ksvm",
search=list(search=mparheuristic("ksvm"),method=c("kfold",3)),feature="s")
print(mmetric(M,metric="AUC",TC=2))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="SVM",main="ROC",intbar=FALSE)
mgraph(M,graph="IMP",TC=2,Grid=10,main="input importances",xval=0.1,
leg=names(iris),axis=1)
mgraph(M,graph="VEC",TC=2,Grid=10,main="Petal.Width VEC curve",
data=iris,xval=4)
### 2nd example, ordered kfold, k-nearest neigbor:
M=mining(Species~.,iris,Runs=1,method=c("kfoldo",3),model="knn")
# confusion matrix:
print(mmetric(M,metric="CONF"))
### 3rd example, use of all rminer models:
models=c("naive","ctree","rpart","kknn","mlp","mlpe","ksvm","randomForest","bagging",
"boosting","lda","multinom","naiveBayes","qda")
models="naiveBayes"
for(model in models)
{
M=mining(Species~.,iris,Runs=1,method=c("kfold",3,123),model=model)
cat("model:",model,"ACC:",round(mmetric(M,metric="ACC")$ACC,digits=1),"\n")
}
## End(Not run)
### multiple models: automl or ensembles
## Not run:
data(iris)
d=iris
names(d)[ncol(d)]="y" # change output name
inputs=ncol(d)-1
metric="AUC"
# simple automl (1 search per individual model),
# internal holdout and external holdout:
sm=mparheuristic(model="automl",n=NA,task="prob",inputs=inputs)
mode="auto"
imethod=c("holdout",4/5,123) # internal validation method
emethod=c("holdout",2/3,567) # external validation method
search=list(search=sm,smethod=mode,method=imethod,metric=metric,convex=0)
M=mining(y~.,data=d,model="auto",search=search,method=emethod,fdebug=TRUE)
# 1 single model was selected:
cat("best",emethod[1],"selected model:",M$object@model,"\n")
cat(metric,"=",round(as.numeric(mmetric(M,metric=metric)),2),"\n")
# simple automl (1 search per individual model),
# internal kfold and external kfold:
imethod=c("kfold",3,123) # internal validation method
emethod=c("kfold",5,567) # external validation method
search=list(search=sm,smethod=mode,method=imethod,metric=metric,convex=0)
M=mining(y~.,data=d,model="auto",search=search,method=emethod,fdebug=TRUE)
# kfold models were selected:
kfolds=as.numeric(emethod[2])
models=vector(length=kfolds)
for(i in 1:kfolds) models[i]=M$object$model[[i]]
cat("best",emethod[1],"selected models:",models,"\n")
cat(metric,"=",round(as.numeric(mmetric(M,metric=metric)),2),"\n")
# example with weighted ensemble:
M=mining(y~.,data=d,model="WE",search=search,method=emethod,fdebug=TRUE)
for(i in 1:kfolds) models[i]=M$object$model[[i]]
cat("best",emethod[1],"selected models:",models,"\n")
cat(metric,"=",round(as.numeric(mmetric(M,metric=metric)),2),"\n")
## End(Not run)
### for more fitting examples check the help of function fit: help(fit,package="rminer")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.