Description Usage Arguments Details Value Note Author(s) See Also
View source: R/tdmOptsDefaults.r
Set up and return a list opts
with default settings. The list opts
contains all DM-related settings which are needed by main_<TASK>.
For better readability, most elements of opts
are arranged in groups:
dir.* | path-related settings |
READ.* | data-reading-related settings |
TST.* | resampling-related settings (training, validation and test set, CV) |
PRE.* | preprocessing parameters |
SRF.* | several parameters for tdmModSortedRFimport |
MOD.* | general settings for models and model building |
RF.* | several parameters for model RF (Random Forest) |
SVM.* | several parameters for model SVM (Support Vector Machines) |
ADA.* | several parameters for model ADA (AdaBoost) |
CLS.* | classification-related settings |
GD.* | settings for the graphic devices |
1 | tdmOptsDefaultsSet(opts = NULL, path = ".")
|
opts |
(optional) the options already set |
path |
["."] where to find everything for the DM task. |
The path-related settings are relative to opts$path
, if it is def'd, else relative to the current dir.
Finally, the function tdmOptsDefaultsFill(opts)
is called to fill in further details, depending on the current
settings of opts
.
a list opts
, with defaults set for all options relevant for a DM task,
containing the following elements
path |
["."] where to find everything for the DM task |
dir.txt |
[data] where to find .txt/.csv files |
dir.data |
[data] where to find other data files, including .Rdata |
dir.output |
[Output] where to put output files |
filename |
["default.txt"] the task data |
filetest |
[NULL] the test data, only relevant for READ.TstFn!=NULL |
data.title |
["Default Data"] title for plots |
READ.TXT |
[T] =T: read data from .csv and save as .Rdata, =F: read from .Rdata |
READ.NROW |
[-1] read this amount of rows or -1 for 'read all rows' |
READ.TrnFn |
function to be passed into |
READ.TstFn |
[NULL] function to be passed into |
READ.INI |
[TRUE] read the task data initially, i.e. prior to tuning, using |
TST.kind |
["rand"] one of the choices from {"cv","rand","col"}, see |
TST.COL |
["TST.COL"] name of column with train/test/disregard-flag |
TST.NFOLD |
[3] number of CV-folds (only for TST.kind=="cv") |
TST.valiFrac |
[0.1] set this fraction of the train-validation data aside for validation (only for TST.kind=="rand") |
TST.testFrac |
[0.1] set prior to tuning this fraction of data aside for testing (if tdm$umode=="SP_T" and opts$READ.INI==TRUE) or set this fraction of data aside for testing after tuning (if tdm$umode=="RSUB" or =="CV") |
TST.trnFrac |
[NULL] train set fraction, if NULL then |
TST.SEED |
[NULL] a seed for the random test set selection ( |
PRE.PCA |
["none" (default)|"linear"] PCA preprocessing: [don't | do normal PCA (prcomp) ] |
PRE.PCA.REPLACE |
[T] =T: replace with the PCA columns the original numerical columns, =F: add the PCA columns |
PRE.PCA.npc |
[0] if >0: add monomials of degree 2 from the first PRE.PCA.npc columns (PCs) (only active, if opts$PRE.PCA!="none") |
PRE.SFA |
["none" (default)|"2nd"] SFA preprocessing (see package |
PRE.SFA.REPLACE |
[F] =T: replace the original numerical columns with the SFA columns; =F: add the SFA columns |
PRE.SFA.npc |
[0] if >0: add monomials of degree 2 from the first PRE.SFA.npc columns (only acitve, if opts$PRE.SFA!="none") |
PRE.SFA.PPRANGE |
[11] number of inputs after SFA preprocessing, only those inputs enter into SFA expansion |
PRE.SFA.ODIM |
[5] number of SFA output dimensions (slowest signals) to return |
PRE.SFA.doPB |
[T] =F|T: don't | do parametric bootstrap for SFA in case of marginal training data |
PRE.SFA.fctPB |
[sfaPBootstrap] the function to call in case of parametric bootstrap, see |
PRE.allNonVali |
[F] if =T, then use all non-validation data in the training-validation set for PCA or SFA preprocessing. If =F, use only the training set for PCA or SFA processing (only relevant if opts$PRE.PCA!="none" or opts$PRE.SFA!="none"). |
PRE.Xpgroup |
[0.99] bind the fraction 1-PRE.Xpgroup in column OTHER (see |
PRE.MaxLevel |
[32] bind the N-32+1 least frequent cases in column OTHER (see |
SRF.kind |
["xperc" (default) |"ndrop" |"nkeep" |"none" ] the method used for feature selection, see |
SRF.ndrop |
[0] how many variables to drop (only relevant if SRF.kind=="ndrop") |
SRF.nkeep |
[NULL] how many variables to keep, NULL="keep all" (only relevant if SRF.kind=="nkeep") |
SRF.XPerc |
[0.95] if >=0, keep that importance percentage, starting with the most important variables (if SRF.kind=="xperc") |
SRF.calc |
[T] =T: calculate importance & save on SRF.file, =F: load from srfFile (srfFile = Output/<confFile>.SRF.Rdata) |
SRF.ntree |
[50] number of RF trees |
SRF.samp |
sampsize for RF in importance estimation. See RF.samp for further info on sampsize. |
SRF.verbose |
[2] |
SRF.maxS |
[40] how many variables to show in plot |
SRF.minlsi |
[1] a lower bound for the length of SRF$input.variables |
SRF.method |
["RFimp"] |
SRF.scale |
[TRUE] option 'scale' for call importance() in |
MOD.SEED |
[NULL] a seed for the random model initialization (if model is non-deterministic). If NULL, use |
MOD.method |
["RF" (default) |"MC.RF" |"SVM" |"NB" ]: use [RF | MetaCost-RF | SVM | Naive Bayes ] in |
RF.ntree |
[500] |
RF.samp |
[1000] sampsize for RF in model training. If RF.samp is a scalar, then it specifies the total size of the sample. For classification, it can also be a vector of length n.class (= # of levels in response variable), then it specifies the size of each strata. The sum of the vector is the total sample size. If NULL, RF.samp will be replaced by 3000 later in tdmModAdjustSampsize*. |
RF.mtry |
[NULL] |
RF.nodesize |
[1] |
RF.OOB |
[TRUE] if =T, return OOB-training set error as tuning measure; if =F, return validation set error |
RF.p.all |
[FALSE] |
SVM.kernel |
[3] =1: linear, =2: polynomial, =3: RBF, =4: sigmoid |
SVM.epsilon |
[0.005] needed only for regression |
SVM.gamma |
[0.005] |
SVM.coef0 |
[0.0] (needed only for opts$SVM.kernel=="polynomial" or =="sigmoid") |
SVM.degree |
[3] (needed only for opts$SVM.kernel=="polynomial") |
SVM.tolerance |
[0.008] |
ADA.coeflearn |
[1] =1: "Breiman", =2: "Freund", =3: "Zhu" as value for boosting(...,coeflearn,...) (AdaBoost) |
ADA.mfinal |
[10] number of trees in AdaBoost = mfinal boosting(...,mfinal,...) |
ADA.rpart.minsplit |
[20] minimum number of observations in a node in order for a split to be attempted |
CLS.cutoff |
[NULL] vote fractions for the classes (vector of length n.class = # of levels in response variable). The class i with maximum ratio (% votes)/CLS.cutoff[i] wins. If NULL, then each class gets the cutoff 1/n.class (i.e. majority vote wins). The smaller CLS.cutoff[i], the more likely class i will win. |
CLS.CLASSWT |
[NULL] class weights for the n.class classes, e.g. |
CLS.gainmat |
[NULL] (n.class x n.class) gain matrix. If NULL, CLS.gainmat will be set to unit matrix in |
rgain.type |
["rgain" (default) |"meanCA" |"minCA" ] in case of |
ncopies |
[0] if >0, activate |
fct.postproc |
[NULL] name of a function with signature |
GD.DEVICE |
["win"] ="win": all graphics to (several) windows ( |
GD.RESTART |
[T] =T: restart the graphics device (i.e. close all 'old' windows or re-open
multi-page pdf) in each call to |
GD.CLOSE |
[T] =T: close graphics device "png", "pdf" at the end of main_*.r (suitable for main_*.r solo) or |
NRUN |
[2] how many runs with different train & test samples - or - how many CV-runs, if |
APPLY_TIME |
[FALSE] |
test2.show |
[FALSE] |
test2.string |
["default cutoff"] |
VERBOSE |
[2] =2: print much output, =1: less, =0: none |
The variables opts$PRE.PCA.numericV and opts$PRE.SFA.numericV (string vectors of numeric input columns to be used for PCA or SFA)
are not set by tdmOptsDefaultsSet
or tdmOptsDefaultsFill
. Either they are supplied by the user or,
if NULL, TDMR will set them to input.variables
in tdmClassifyLoop
, assuming that all columns are numeric.
Wolfgang Konen, THK, 2013 - 2018
tdmOptsDefaultsFill
tdmDefaultsFill
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.