tdmClassifyLoop: Core classification double loop returning a 'TDMclassifier'...
In TDMR: Tuned Data Mining in R

Description Usage Arguments Value Author(s) See Also Examples

tdmClassifyLoop contains a double loop (opts$NRUN and CV-folds) and calls tdmClassify. It is called by all classification R-functions main_*.
It splits - if tset is NULL - the data in dset into training and validation data according to opts$TST.kind.
It returns an object of class TDMclassifier.

1	tdmClassifyLoop(dset, response.variables, input.variables, opts, tset = NULL)

`dset`	the data frame containing training and validation data.
`response.variables`	name of column which carries the target variable - or - vector of names specifying multiple target columns (these columns are not used during prediction, only for evaluation)
`input.variables`	vector with names of input columns
`opts`	a list from which we need here the following entries `NRUN` number of runs (outer loop) `TST.SEED` =NULL: get a new random number seed with `tdmRandomSeed`. =any value: set the random number seed to this value to get reproducible random numbers and thus reproducible training-test-set-selection. (only relevant in case TST.kind=="cv" or "rand") (see also MOD.SEED in `tdmClassify`) `TST.kind` how to create cvi, handed over to `tdmModCreateCVindex`. If TST.kind="col", then cvi is taken from dset[,opts$TST.col]. `GD.RESTART` [TRUE] =TRUE/FALSE: do/don't restart graphic devices `GD.DEVICE` ["non"\|"win"\|"pdf"\|"png"]
`tset`	[NULL] If not NULL, this is the test data set. If NULL, we are in tuning and the validation data set is build from `dset` according to the procedure prescribed in `opts$TST.*`.

result, an object of class TDMclassifier, this is a list with results, containing

`lastRes`	last run, last fold: result from `tdmClassify`
`C_train`	classification error on training set
`G_train`	gain on training set
`R_train`	relative gain on training set (percentage of max. gain on this set)
`*_vali`	— similar, with vali set instead of training set —
`*_vali2`	— similar, with vali2 set instead of training set —
`Err`	a data frame with as many rows as opts$NRUN and 9 columns corresponding to the nine variables described above
`predictions`	last run: data frame with dimensions [nrow(dset),length(response.variable)]. In case of CV, all CV predictions (for each record in dset), in other cases mixed validation / train set predictions.
`predictTest`	predictions on the test set `tset` (NULL if `tset==NULL` )
`predProbList`	a list, `predProbList[[i]]` has the prediction probabilities of the ith run. See info on `predProb` in `tdmClassify`.

Each performance measure C_*, G_*, R_* is a vector of length opts$NRUN. To be specific, C_train[i] is the classification error on the training set from the i-th run. This error is mean(res$allEVAL$cerr.trn), i.e. the mean of the classification errors from all response variables when res is the return value of tdmClassify. In the case of cross validation, for each performance measure an additional averaging over all folds is done.

Wolfgang Konen (wolfgang.konen@th-koeln.de), THK

print.TDMclassifier, tdmClassify, tdmRegress, tdmRegressLoop

#*# --------- demo/demo00-0classif.r ---------
#*# This demo shows a simple data mining process (phase 1 of TDMR) for classification on
#*# dataset iris.
#*# The data mining process in tdmClassifyLoop calls randomForest as the prediction model.
#*# It is called opts$NRUN=2 times with different random train-validation set splits.
#*# Therefore data frame result$Err has two rows
#*#
opts=tdmOptsDefaultsSet()                       # set all defaults for data mining process
opts$TST.SEED <- opts$MOD.SEED <- 5             # reproducible results
#opts$VERBOSE <- opts$SRF.verbose <- 0          # no printed outut    
gdObj <- tdmGraAndLogInitialize(opts);          # init graphics and log file

data(iris)
response.variables="Species"                    # names, not data (!)
input.variables=setdiff(names(iris),"Species")

result = tdmClassifyLoop(iris,response.variables,input.variables,opts)

print(result$Err)