fit: Fit a supervised data mining model (classification or...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/model.R

Description

Fit a supervised data mining model (classification or regression) model. Wrapper function that allows to fit distinct data mining (16 classification and 18 regression) methods under the same coherent function structure. Also, it tunes the hyperparameters of the models (e.g. kknn, mlpe and ksvm) and performs some feature selection methods.

Usage

1
2
3
4
fit(x, data = NULL, model = "default", task = "default", 
    search = "heuristic", mpar = NULL, feature = "none", 
    scale = "default", transform = "none", 
    created = NULL, fdebug = FALSE, ...)

Arguments

x

a symbolic description (formula) of the model to be fit.
If data=NULL it is assumed that x contains a formula expression with known variables (see first example below).

data

an optional data frame (columns denote attributes, rows show examples) containing the training data, when using a formula.

model

Typically this should be a character object with the model type name (data mining method, as explained in valid character options).

Valid character options are the typical R base learning functions, namely one of:

  • naive most common class (classification) or mean output value (regression)

  • ctree – conditional inference tree (classification and regression, uses ctree from party package)

  • cv.glmnet – generalized linear model with lasso or elasticnet regularization (classification and regression, uses cv.glmnet from glmnet package; note: cross-validation is used to automatically set the lambda parameter that is needed to compute the predictions)

  • rpart or dt – decision tree (classification and regression, uses rpart from rpart package)

  • kknn or knn – k-nearest neighbor (classification and regression, uses kknn from kknn package)

  • ksvm or svm – support vector machine (classification and regression, uses ksvm from kernlab package)

  • mlp – multilayer perceptron with one hidden layer (classification and regression, uses nnet from nnet package)

  • mlpe – multilayer perceptron ensemble (classification and regression, uses nnet from nnet package)

  • randomForest or randomforest – random forest algorithm (classification and regression, uses randomForest from randomForest package)

  • xgboost – eXtreme Gradient Boosting (Tree) (classification and regression, uses xgboost from xgboost package; note: nrounds parameter is set by default to 2)

  • bagging – bagging (classification, uses bagging from adabag package)

  • boosting – boosting (classification, uses boosting from adabag package)

  • lda – linear discriminant analysis (classification, uses lda from MASS package)

  • multinom or lr – logistic regression (classification, uses multinom from nnet package)

  • naiveBayes or naivebayes – naive bayes (classification, uses naiveBayes from e1071 package)

  • qda – quadratic discriminant analysis (classification, uses qda from MASS package)

  • cubist – M5 rule-based model (regression, uses cubist from Cubist package)

  • lm – standard multiple/linear regression (uses lm)

  • mr – multiple regression (regression, equivalent to lm but uses nnet from nnet package with zero hidden nodes and linear output function)

  • mars – multivariate adaptive regression splines (regression, uses mars from mda package)

  • pcr – principal component regression (regression, uses pcr from pls package)

  • plsr – partial least squares regression (regression, uses plsr from pls package)

  • cppls – canonical powered partial least squares (regression, uses cppls from pls package)

  • rvm – relevance vector machine (regression, uses rvm from kernlab package)

model can also be a list with the fields (see example below):

  • $fit – a fit function that accepts the arguments x, data and ..., the goal is to accept here any R classification or regression model, mainly for its use within the mining or Importance functions, or to use a hyperparameter search (via search).

  • $predict – a predict function that accepts the arguments object, newdata, this function should behave as any rminer prediction, i.e., return: a factor when task=="class"; a matrix with Probabilities x Instances when task=="prob"; and a vector when task=="reg".

  • $name – optional field with the name of the method.

Note: current rminer version emphasizes the use of native fitting functions from their respective packages, since these functions contain several specific hyperparameters that can now be searched or set using the search or ... arguments. For compatibility with previous rminer versions, older model options are kept.

task

data mining task. Valid options are:

  • prob (or p) – classification with output probabilities (i.e. the sum of all outputs equals 1).

  • class (or c) – classification with discrete outputs (factor)

  • reg (or r) – regression (numeric output)

  • default tries to guess the best task (prob or reg) given the model and output variable type (if factor then prob else reg)

search

used to tune hyperparameter(s) of the model, such as: kknn – number of neighbors (k); mlp or mlpe – number of hidden nodes (size) or decay; ksvm – gaussian kernel parameter (sigma); randomForestmtry parameter). Valid options for a simpler search use:

  • heuristic – simple heuristic, one search parameter (e.g. size=inputs/2 for mlp or size=10 if classification and inputs/2>10, sigma is set using kpar="automatic" and kernel="rbfdot" of ksvm). Important Note: instead of the "heuristic" options, it is advisable to use the explicit mparheuristic function that is designed for a wider option of models (all "heuristic" options were kept due to compatibility issues and work only for: kknn; mlp or mlpe; ksvm, with kernel="rbfdot"; and randomForest).

  • heuristic5 – heuristic with a 5 range grid-search (e.g. seq(1,9,2) for kknn, seq(0,8,2) for mlp or mlpe, 2^seq(-15,3,4) for ksvm, 1:5 for randomRorest)

  • heuristic10 – heuristic with a 10 range grid-search (e.g. seq(1,10,1) for kknn, seq(0,9,1) for mlp or mlpe, 2^seq(-15,3,2) for ksvm, 1:10 for randomRorest)

  • UD, UD1 or UD2 – uniform design 2-Level with 13 (UD or UD2) or 21 (UD1) searches (only works for ksvm and kernel="rbfdot").

  • a-vector – numeric vector with all hyperparameter values that will be searched within an internal grid-search (the number of searches is length(search) when convex=0)

A more complex but advised use of search is to use a list with:

  • $smethod – type of search method. Valid options are (more options will be developed in next versions):

    • none – no search is executed, one single fit is performed.

    • matrix – matrix search (tests only n searches, all search parameters are of size n).

    • grid – normal grid search (tests all combinations of search parameters).

    • 2L - nested 2-Level grid search. First level range is set by $search and then the 2nd level performs a fine tuning, with length($search) searches around (original range/2) best value in first level (2nd level is only performed on numeric searches).

    • UD, UD1 or UD2 – uniform design 2-Level with 13 (UD or UD2) or 21 (UD1) searches (note: only works for model="ksvm" and kernel="rbfdot"). Under this option, $search should contain the first level ranges, such as c(-15,3,-5,15) for classification (gamma min and max, C min and max, after which a 2^ transform is applied) or c(-8,0,-1,6,-8,-1) for regression (last two values are epsilon min and max, after which a 2^ transform is applied).

  • $search – a-list with all hyperparameter values to be searched or character with previous described options (e.g. "heuristic", "heuristic5", "UD"). If a character, then $smethod equal to "none" or "grid" or "UD" is automatically assumed.

  • $convex – number that defines how many searches are performed after a local minimum/maximum is found (if >0, the search can be stopped without testing all grid-search values)

  • $method – type of internal estimation method used during the search (see method argument of mining for details)

  • $metric – used to compute a metric value during internal estimation. Can be a single character such as "SAD" or a list with all the arguments used by the mmetric function except y and x, such as
    search$metric=list(metric="AUC",TC=3,D=0.7). See mmetric for more details.

Note: if mpar argument is used, then the mpar values are automatically fed into search. However, a direct use of the search argument is advised instead of mpar, since search is more flexible and powerful.

mpar

Important note: this argument only is kept in this version due to compatibility with previous rminer versions. Instead of mpar, you should use the more flexible and powerful search argument.

vector with extra default (fixed) model parameters (used for modeling, search and feature selection) with:

  • c(vmethod,vpar,metric) – generic use of mpar (including most models);

  • c(C,epsilon,vmethod,vpar,metric) – if ksvm and C and epsilon are explicitly set;

  • c(nr,maxit,vmethod,vpar,metric) – if mlp or mlpe and nr and maxit are explicitly set;

C and epsilon are default values for svm (if any of these is =NA then heuristics are used to set the value).
nr is the number of mlp runs or mlpe individual models, while maxit is the maximum number of epochs (if any of these is =NA then heuristics are used to set the value).
For help on vmethod and vpar see mining.
metric is the internal error function (e.g. used by search to select the best model), valid options are explained in mmetric. When mpar=NULL then default values are used. If there are NA values (e.g. mpar=c(NA,NA)) then default values are used.

feature

feature selection and sensitivity analysis control. Valid fit function options are:

  • none – no feature selection;

  • a fmethod character value, such as sabs (see below);

  • a-vector – vector with c(fmethod,deletions,Runs,vmethod,vpar,defaultsearch)

  • a-vector – vector with c(fmethod,deletions,Runs,vmethod,vpar)

fmethod sets the type. Valid options are:

  • sbs – standard backward selection;

  • sabs – sensitivity analysis backward selection (faster);

  • sabsv – equal to sabs but uses variance for sensitivity importance measure;

  • sabsr – equal to sabs but uses range for sensitivity importance measure;

  • sabsg – equal to sabs (uses gradient for sensitivity importance measure);

deletions is the maximum number of feature deletions (if -1 not used).
Runs is the number of runs for each feature set evaluation (e.g. 1).
For help on vmethod and vpar see mining.
defaultsearch is one hyperparameter used during the feature selection search, after selecting the best feature set then search is used (faster). If not defined, then search is used during feature selection (may be slow).
When feature is a vector then default values are used to fill missing values or NA values. Note: feature selection capabilities are expected to be enhanced in next rminer versions.

scale

if data needs to be scaled (i.e. for mlp or mlpe). Valid options are:

  • default – uses scaling when needed (i.e. for mlp or mlpe)

  • none – no scaling;

  • inputs – standardizes (0 mean, 1 st. deviation) input attributes;

  • all – standardizes (0 mean, 1 st. deviation) input and output attributes;

If needed, the predict function of rminer performs the inverse scaling.

transform

if the output data needs to be transformed (e.g. log transform). Valid options are:

  • none – no transform;

  • log – y=(log(y+1)) (the inverse function is applied in the predict function);

  • positive – all predictions are positive (negative values are turned into zero);

  • logpositive – both log and logpositive;

created

time stamp for the model. By default, the system time is used. Else, you can specify another time.

fdebug

if TRUE show some search details.

...

additional and specific parameters send to each fit function model (e.g. dt, randomforest, kernlab). A few examples:
– the rpart function is used for decision trees, thus you can have:
control=rpart.control(cp=.05) (see crossvaldata example).
– the ksvm function is used for support vector machines, thus you can change the kernel type: kernel="polydot" (see examples below).
Important note: if you use package functions and get an error, then try to explicitly define the package. For instance, you might need to use fit(several-arguments,control=Cubist::cubistControl()) instead of
fit(several-arguments,control=cubistControl()).

Details

Fits a classification or regression model given a data.frame (see [Cortez, 2010] for more details). The ... optional arguments should be used to fix values used by specific model functions (see examples). Notes:
- if there is an error in the fit, then a warning is issued (see example).
- the new search argument is very flexible and allows a powerful design of supervised learning models.
- the search correct use is very dependent on the R learning base functions. For example, if you are tuning model="rpart" then read carefully the help of function rpart.
- mpar argument is only kept due to compatibility issues and should be avoided; instead, use the more flexible search.

Details about some models:

Value

Returns a model object. You can check all model elements with str(M), where M is a model object. The slots are:

Note

See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html

Author(s)

Paulo Cortez http://www3.dsi.uminho.pt/pcortez

References

See Also

mparheuristic,mining, predict.fit, mgraph, mmetric, savemining, CasesSeries, lforecast, holdout and Importance. Check all rminer functions using: help(package=rminer).

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
### dontrun is used when the execution of the example requires some computational effort.

### simple regression (with a formula) example.
x1=rnorm(200,100,20); x2=rnorm(200,100,20)
y=0.7*sin(x1/(25*pi))+0.3*sin(x2/(25*pi))
M=fit(y~x1+x2,model="mlpe")
new1=rnorm(100,100,20); new2=rnorm(100,100,20)
ynew=0.7*sin(new1/(25*pi))+0.3*sin(new2/(25*pi))
P=predict(M,data.frame(x1=new1,x2=new2,y=rep(NA,100)))
print(mmetric(ynew,P,"MAE"))

### simple classification example.
## Not run: 
data(iris)
M=fit(Species~.,iris,model="rpart")
plot(M@object); text(M@object) # show model
P=predict(M,iris)
print(mmetric(iris$Species,P,"CONF"))
print(mmetric(iris$Species,P,"ALL"))
mgraph(iris$Species,P,graph="ROC",TC=2,main="versicolor ROC",
baseline=TRUE,leg="Versicolor",Grid=10)

M2=fit(Species~.,iris,model="ctree")
plot(M2@object) # show model
P2=predict(M2,iris)
print(mmetric(iris$Species,P2,"CONF"))

# ctree with different setup:
# (ctree_control is from the party package)
M3=fit(Species~.,iris,model="ctree",controls = party::ctree_control(testtype="MonteCarlo"))
plot(M3@object) # show model

## End(Not run)

### simple binary classification example with cv.glmnet and xgboost
## Not run: 
data(sa_ssin_2)
H=holdout(sa_ssin_2$y,ratio=2/3)
# cv.glmnet:
M=fit(y~.,sa_ssin_2[H$tr,],model="cv.glmnet",task="cla") # pure classes
P=predict(M,sa_ssin_2[H$ts,])
cat("1st prediction, class:",as.character(P[1]),"\n")
cat("Confusion matrix:\n")
print(mmetric(sa_ssin_2[H$ts,]$y,P,"CONF")$conf)

M2=fit(y~.,sa_ssin_2[H$tr,],model="cv.glmnet") # probabilities
P2=predict(M2,sa_ssin_2[H$ts,])
L=M2@levels
cat("1st prediction, prob:",L[1],"=",P2[1,1],",",L[2],"=",P2[1,2],"\n")
cat("Confusion matrix:\n")
print(mmetric(sa_ssin_2[H$ts,]$y,P2,"CONF")$conf)
cat("AUC of ROC curve:\n")
print(mmetric(sa_ssin_2[H$ts,]$y,P2,"AUC"))

M3=fit(y~.,sa_ssin_2[H$tr,],model="cv.glmnet",nfolds=3) # use 3 folds instead of 10
plot(M3@object) # show cv.glmnet object
P3=predict(M3,sa_ssin_2[H$ts,])

# xgboost:
M4=fit(y~.,sa_ssin_2[H$tr,],model="xgboost",verbose=1) # nrounds=2, show rounds:
P4=predict(M4,sa_ssin_2[H$ts,])
print(mmetric(sa_ssin_2[H$ts,]$y,P4,"AUC"))
M5=fit(y~.,sa_ssin_2[H$tr,],model="xgboost",nrounds=3,verbose=1) # nrounds=3, show rounds:
P5=predict(M5,sa_ssin_2[H$ts,])
print(mmetric(sa_ssin_2[H$ts,]$y,P5,"AUC"))

## End(Not run)

### classification example with discrete classes, probabilities and holdout
## Not run: 
data(iris)
H=holdout(iris$Species,ratio=2/3)
M=fit(Species~.,iris[H$tr,],model="ksvm",task="class")
M2=fit(Species~.,iris[H$tr,],model="ksvm",task="prob")
P=predict(M,iris[H$ts,])
P2=predict(M2,iris[H$ts,])
print(mmetric(iris$Species[H$ts],P,"CONF"))
print(mmetric(iris$Species[H$ts],P2,"CONF"))
print(mmetric(iris$Species[H$ts],P,"CONF",TC=1))
print(mmetric(iris$Species[H$ts],P2,"CONF",TC=1))
print(mmetric(iris$Species[H$ts],P2,"AUC"))

### exploration of some rminer classification models:
models=c("lda","naiveBayes","kknn","randomForest","cv.glmnet","xgboost")
for(m in models)
 { cat("model:",m,"\n") 
   M=fit(Species~.,iris[H$tr,],model=m)
   P=predict(M,iris[H$ts,])
   print(mmetric(iris$Species[H$ts],P,"AUC")[[1]])
 }

## End(Not run)

### classification example with hyperparameter selection 
###    note: for regression, similar code can be used
### SVM 
## Not run: 
data(iris)
# large list of SVM configurations:
# SVM with kpar="automatic" sigma rbfdot kernel estimation and default C=1:
#  note: each execution can lead to different [email protected] due to sigest stochastic nature:
M=fit(Species~.,iris,model="ksvm")
print(M@mpar) # model hyperparameters/arguments
# same thing, explicit use of mparheuristic:
M=fit(Species~.,iris,model="ksvm",search=list(search=mparheuristic("ksvm")))
print(M@mpar) # model hyperparameters

# SVM with C=3, sigma=2^-7
M=fit(Species~.,iris,model="ksvm",C=3,kpar=list(sigma=2^-7))
print(M@mpar)
# SVM with different kernels:
M=fit(Species~.,iris,model="ksvm",kernel="polydot",kpar="automatic") 
print(M@mpar)
# fit already has a scale argument, thus the only way to fix scale of "tanhdot"
# is to use the special search argument with the "none" method:
s=list(smethod="none",search=list(scale=2,offset=2))
M=fit(Species~.,iris,model="ksvm",kernel="tanhdot",search=s) 
print(M@mpar)
# heuristic: 10 grid search values for sigma, rbfdot kernel (fdebug is used only for more verbose):
s=list(search=mparheuristic("ksvm",10)) # advised "heuristic10" usage
M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
print(M@mpar)
# same thing, uses older search="heuristic10" that works for fewer rminer models
M=fit(Species~.,iris,model="ksvm",search="heuristic10",fdebug=TRUE)
print(M@mpar)
# identical search under a different and explicit code:
s=list(search=2^seq(-15,3,2))
M=fit(Species~.,iris,model="ksvm",search=2^seq(-15,3,2),fdebug=TRUE)
print(M@mpar)

# uniform design "UD" for sigma and C, rbfdot kernel, two level of grid searches, 
# under exponential (2^x) search scale:
M=fit(Species~.,iris,model="ksvm",search="UD",fdebug=TRUE)
print(M@mpar)
M=fit(Species~.,iris,model="ksvm",search="UD1",fdebug=TRUE)
print(M@mpar)
M=fit(Species~.,iris,model="ksvm",search=2^seq(-15,3,2),fdebug=TRUE)
print(M@mpar)
# now the more powerful search argument is used for modeling SVM:
# grid 3 x 3 search:
s=list(smethod="grid",search=list(sigma=2^c(-15,-5,3),C=2^c(-5,0,15)),convex=0,
            metric="AUC",method=c("kfold",3,12345))
print(s)
M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
print(M@mpar)
# identical search with different argument smethod="matrix" 
s$smethod="matrix"
s$search=list(sigma=rep(2^c(-15,-5,3),times=3),C=rep(2^c(-5,0,15),each=3))
print(s)
M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
print(M@mpar)
# search for best kernel (only works for kpar="automatic"):
s=list(smethod="grid",search=list(kernel=c("rbfdot","laplacedot","polydot","vanilladot")),
       convex=0,metric="AUC",method=c("kfold",3,12345))
print(s)
M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
print(M@mpar)
# search for best parameters of "rbfdot" or "laplacedot" (which use same kpar):
s$search=list(kernel=c("rbfdot","laplacedot"),sigma=2^seq(-15,3,5))
print(s)
M=fit(Species~.,iris,model="ksvm",search=s,fdebug=TRUE)
print(M@mpar)

### randomForest
# search for mtry and ntree
s=list(smethod="grid",search=list(mtry=c(1,2,3),ntree=c(100,200,500)),
            convex=0,metric="AUC",method=c("kfold",3,12345))
print(search)
M=fit(Species~.,iris,model="randomForest",search=s,fdebug=TRUE)
print(M@mpar)

### rpart
# simpler way to tune cp in 0.01 to 0.9 (10 searches):
s=list(search=mparheuristic("rpart",n=10,lower=0.01,upper=0.9),method=c("kfold",3,12345))
M=fit(Species~.,iris,model="rpart",search=s,fdebug=TRUE)
print(M@mpar)

# same thing but with more lines of code
# note: this code can be adapted to tune other rpart parameters,
#       while mparheuristic only tunes cp
# a vector list needs to be used for the search$search parameter
lcp=vector("list",10) # 10 grid values for the complexity cp
names(lcp)=rep("cp",10) # same cp name 
scp=seq(0.01,0.9,length.out=10) # 10 values from 0.01 to 0.18
for(i in 1:10) lcp[[i]]=scp[i] # cycle needed due to [[]] notation
s=list(smethod="grid",search=list(control=lcp),
            convex=0,metric="AUC",method=c("kfold",3,12345))
M=fit(Species~.,iris,model="rpart",search=s,fdebug=TRUE)
print(M@mpar)

### ctree 
# simpler way to tune mincriterion in 0.1 to 0.98 (9 searches):
mint=c("kfold",3,123) # internal validation method
s=list(search=mparheuristic("ctree",n=8,lower=0.1,upper=0.99),method=mint)
M=fit(Species~.,iris,model="ctree",search=s,fdebug=TRUE)
print(M@mpar)
# same thing but with more lines of code
# note: this code can be adapted to tune other ctree parameters,
#       while mparheuristic only tunes mincriterion
# a vector list needs to be used for the search$search parameter
lmc=vector("list",9) # 9 grid values for the mincriterion
smc=seq(0.1,0.99,length.out=9)
for(i in 1:9) lmc[[i]]=party::ctree_control(mincriterion=smc[i]) 
s=list(smethod="grid",search=list(controls=lmc),method=mint,convex=0)
M=fit(Species~.,iris,model="ctree",search=s,fdebug=TRUE)
print(M@mpar)

### some MLP fitting examples:
# simplest use:
M=fit(Species~.,iris,model="mlpe")  
print(M@mpar)
# same thing, with explicit use of mparheuristic:
M=fit(Species~.,iris,model="mlpe",search=list(search=mparheuristic("mlpe")))
print(M@mpar)

print(M@mpar) # hidden nodes and number of ensemble mlps
# setting some nnet parameters:
M=fit(Species~.,iris,model="mlpe",size=3,decay=0.1,maxit=100,rang=0.9) 
print(M@mpar) # mlpe hyperparameters
# MLP, 5 grid search fdebug is only used to put some verbose in the console:
s=list(search=mparheuristic("mlpe",n=5)) # 5 searches for size
print(s) # show search
M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
print(M@mpar)
# previous searches used a random holdout (seed=NULL), now a fixed seed (123) is used:
s=list(smethod="grid",search=mparheuristic("mlpe",n=5),convex=0,metric="AUC",
            method=c("holdout",2/3,123))
print(search)
M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
print(M@mpar)
# faster and greedy grid search:
s$convex=1;s$search=list(size=0:9)
print(search)
M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
print(M@mpar)
# 2 level grid with total of 5 searches 
#  note of caution: some "2L" ranges may lead to non integer (e.g. 1.3) values at
#  the 2nd level search. And some R functions crash if non integer values are used for
#  integer parameters.
s$smethod="2L";s$convex=0;s$search=list(size=c(4,8,12))
print(s)
M=fit(Species~.,iris,model="mlpe",search=s,fdebug=TRUE)
print(M@mpar)

## End(Not run)

### example of an error (warning) generated using fit:
## Not run: 
data(iris)
# size needs to be a positive integer, thus 0.1 leads to an error:
M=fit(Species~.,iris,model="mlp",size=0.1)  
print(M@object)

## End(Not run)

### exploration of some rminer regression models:
## Not run: 
data(sa_ssin)
H=holdout(sa_ssin$y,ratio=2/3,seed=12345)
models=c("lm","mr","ctree","mars","cubist","cv.glmnet","xgboost","rvm")
for(m in models)
 { cat("model:",m,"\n") 
   M=fit(y~.,sa_ssin[H$tr,],model=m)
   P=predict(M,sa_ssin[H$ts,])
   print(mmetric(sa_ssin$y[H$ts],P,"MAE"))
 }

## End(Not run)

### regression example with hyperparameter selection:
## Not run: 
data(sa_ssin)
# some SVM experiments:
# default SVM:
M=fit(y~.,data=sa_ssin,model="svm")
print(M@mpar)
# SVM with (Cherkassy and Ma, 2004) heuristics to set C and epsilon:
M=fit(y~.,data=sa_ssin,model="svm",C=NA,epsilon=NA)
print(M@mpar)
# SVM with Uniform Design set sigma, C and epsilon:
M=fit(y~.,data=sa_ssin,model="ksvm",search="UD",fdebug=TRUE)
print(M@mpar)

# sensitivity analysis feature selection
M=fit(y~.,data=sa_ssin,model="ksvm",search=list(search=mparheuristic("ksvm",n=5)),feature="sabs") 
print(M@mpar)
print(M@attributes) # selected attributes (1, 2 and 3 are the relevant inputs)

# example that shows how transform works:
M=fit(y~.,data=sa_ssin,model="mr") # linear regression
P=predict(M,data.frame(x1=-1000,x2=0,x3=0,x4=0,y=NA)) # P should be negative
print(P)
M=fit(y~.,data=sa_ssin,model="mr",transform="positive")
P=predict(M,data.frame(x1=-1000,x2=0,x3=0,x4=0,y=NA)) # P is not negative
print(P)

## End(Not run)

### pure classification example with a generic R model ###
## Not run: 
### nnet is adopted here but virtually ANY fitting function/package could be used:

# since the default nnet prediction is to provide probabilities, there is
# a need to create this "wrapping" function:
predictprob=function(object,newdata)
{ predict(object,newdata,type="class") }
# list with a fit and predict function:
# nnet::nnet (package::function)
model=list(fit=nnet::nnet,predict=predictprob,name="nnet")
data(iris)
# note that size is not a fit parameter and it is sent directly to nnet:
M=fit(Species~.,iris,model=model,size=3,task="class") 
P=predict(M,iris)
print(P)

## End(Not run) 

rminer documentation built on May 1, 2019, 7:48 p.m.