Description Usage Arguments Value Author(s) See Also Examples
View source: R/tdmClassifyLoop.r
tdmClassifyLoop contains a double loop (opts$NRUN and CV-folds)
and calls tdmClassify
. It is called by all classification R-functions main_*.
It splits - if tset
is NULL - the data in dset
into training and validation
data according to opts$TST.kind
.
It returns an object of class TDMclassifier
.
1 | tdmClassifyLoop(dset, response.variables, input.variables, opts, tset = NULL)
|
dset |
the data frame containing training and validation data. |
response.variables |
name of column which carries the target variable - or - vector of names specifying multiple target columns (these columns are not used during prediction, only for evaluation) |
input.variables |
vector with names of input columns |
opts |
a list from which we need here the following entries
|
tset |
[NULL] If not NULL, this is the test data set. If NULL, we are in tuning and the validation data
set is build from |
result
, an object of class TDMclassifier
, this is a list with results, containing
lastRes |
last run, last fold: result from |
C_train |
classification error on training set |
G_train |
gain on training set |
R_train |
relative gain on training set (percentage of max. gain on this set) |
*_vali |
— similar, with vali set instead of training set — |
*_vali2 |
— similar, with vali2 set instead of training set — |
Err |
a data frame with as many rows as opts$NRUN and 9 columns corresponding to the nine variables described above |
predictions |
last run: data frame with dimensions [nrow(dset),length(response.variable)]. In case of CV, all CV predictions (for each record in dset), in other cases mixed validation / train set predictions. |
predictTest |
predictions on the test set |
predProbList |
a list, |
Each performance measure C_*, G_*, R_*
is a vector of length opts$NRUN
. To be specific, C_train[i]
is the
classification error on the training set from the i
-th run. This error is mean(res$allEVAL$cerr.trn)
, i.e. the
mean of the classification errors from all response variables when res
is the return value of tdmClassify
.
In the case of cross validation, for each performance measure an additional averaging over all folds is done.
Wolfgang Konen (wolfgang.konen@th-koeln.de), THK
print.TDMclassifier
, tdmClassify
, tdmRegress
, tdmRegressLoop
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #*# --------- demo/demo00-0classif.r ---------
#*# This demo shows a simple data mining process (phase 1 of TDMR) for classification on
#*# dataset iris.
#*# The data mining process in tdmClassifyLoop calls randomForest as the prediction model.
#*# It is called opts$NRUN=2 times with different random train-validation set splits.
#*# Therefore data frame result$Err has two rows
#*#
opts=tdmOptsDefaultsSet() # set all defaults for data mining process
opts$TST.SEED <- opts$MOD.SEED <- 5 # reproducible results
#opts$VERBOSE <- opts$SRF.verbose <- 0 # no printed outut
gdObj <- tdmGraAndLogInitialize(opts); # init graphics and log file
data(iris)
response.variables="Species" # names, not data (!)
input.variables=setdiff(names(iris),"Species")
result = tdmClassifyLoop(iris,response.variables,input.variables,opts)
print(result$Err)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.