calc_randomForest_args: Calculate Random Forest - Arguments

Description Arguments Details See Also


The following parameters can be used in the ... argument in function getap, also within function gdmm, to override the values in the analysis procedure file and so to modify the calculation of random forest models - see examples.

gdmm(dataset, ap=getap(...))



Logical. If used in getap, if classification via randomForest should be performed in the given dataset.


Character vector. One or more class variables to define the grouping used for classification.


Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below.


Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation.


The minimum number of observations (W) that should be in the smallest subgroup (as defined by the classification grouping variable) *AFTER* the split into rnf.valid crossvalidation segments (below). If W is equal or higher than rnf.cvBootCutoff, the crossvalidation is done via splitting the training data in rnf.valid (see below) segments, otherwise the crossvalidation is done via bootstrap resampling, with the number of bootstrap iterations resulting from the multiplication of the number of observations in this smallest subgroup (as defined by the classification grouping variable) in *all* of the training data with rnf.cvBootFactor. To never perform the CV of the training data via bootstrap, set the parameter cl_gen_neverBootstrapForCV in the settings.r file to TRUE. An example: With rnf.cvBootCutoff set to 15 and a 8-fold crossvalidation rnf.valid <- 8, the required minimum number of observations in the smallest subgroup *after* the split in 8 segments would be 15, and in all the training data to perform the desired 8-fold CV would be (8x15=) 120, in what case then 8 times 15 observations will form the test data to be projected into models made from (120-15=) 105 observations. If there would be less than 120 observations, lets say for example, only 100 observations in the smallest group as defined by the classification grouping variable, bootstrap resampling with rnf.cvBootFactor * 100 iterations would be performed. In this example, if we would also be satisfied with a 5-fold crossvalidation, then we would have enough data: 100 / 5 = 20, and with the rnf.cvBootCutoff value being 15, the 5-fold crossvalidation would be performed.


The factor used to multiply the number of observations within the smallest subgroup defined by the classification grouping variable with, resulting in the number of iterations of a possible bootstrap crossvalidation of the trainign data – see .cvBootCutoff.


The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above.


Logical, if variable reduction via PCA should be applied; if TRUE, the subsequent classifications are performed on the PCA scores, see rnf.pcaNComp below.


Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for random forest classification.


For a list of all parameters that can be used in the ... argument in getap and in the plot functions please see anproc_file.

See Also

gdmm, siWlg for reducing the number of wavelengths in a dataset

Other Calc. arguments: calc_NNET_args, calc_SVM_args, calc_aqg_args, calc_discrimAnalysis_args, calc_pca_args, calc_pls_args, calc_sim_args, split_dataset

Other Classification functions: calc_NNET_args, calc_SVM_args, calc_discrimAnalysis_args, plot_classifX_indepPred

Other RNF documentation: plot_randomForest_args, plot_rnf,aquap_cube-method

bpollner/aquap2 documentation built on Jan. 30, 2019, 9:08 a.m.