anproc_file: Analysis Procedure File
In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

anproc_file

R Documentation

Analysis Procedure File

Description

The analysis procedure file is used to first split the dataset according to the provided values in the 'split dataset' section, and then, in the 'statistics' section (starting with do.pca), to tell the system which statistics to apply resp. what models to calculate on those datasets. It also contains specific and general plotting options that are used by the plot function. Arguments used to control the split-process, the behaviour of statistics / calculations / specific plotting options and the general plotting options start with a certain prefix:

"spl" for all arguments related to the split-process. (For a separate listing please see split_dataset)
"pca" for all arguments related to PCA models (except do.pca). (For a separate listing see calc_pca_args and plot_pca_args)
"sim" for all arguments related to SIMCA models (except do.sim). (For a separate listing see calc_sim_args and plot_sim_args)
"pls" for all arguments related to PLSR models (except do.pls). (For a separate listing see calc_pls_args and plot_pls_args)
"aqg" for all arguments related to Aquagrams (except do.aqg). (For a separate listing see calc_aqg_args and plot_aqg_args)
"da" for all arguments related to Discriminant Analysis classification (except do.da). (For a separate listing see calc_discrimAnalysis_args and plot_discrimAnalysis_args)
"rnf" for all arguments related to RandomForest classification (except do.rnf). (For a separate listing see calc_randomForest_args and plot_randomForest_args)
"svm" for all arguments related to Support Vector Machines classification (except do.svm). (For a separate listing see calc_SVM_args and plot_SVM_args)
"nnet" for all arguments related to Neural Networks classification (except do.aqg). (For a separate listing see calc_NNET_args and plot_NNET_args)
"pg" for the general plotting options that are used in each of the plotting functions. (For a separate listing see plot_pg_args)

By providing any of the arguments of the analysis procedure file to the function getap, also when using it inside the function gdmm, you can override the values in the file with the provided values. See examples at gdmm.

Arguments

`spl.var`	NULL or character vector. If NULL, no splitting of the dataset will be performed. Provide a character vector with the column names of class variables to split the dataset along these variables.
`spl.wl`	NULL or character vector. If NULL, all in the dataset available wavelengths will be used. Provide a character vector in the format "wlFrom-to-wlTo" (e.g. c("1000-to-2000", "1300-to-1600", ...)) to use all previously defined splits in these wavelengths.
`dpt.pre`	Character vector, which of the available modules of data pre-treatments to apply AFTER a (possible) split by variable `spl.var` and wavelength `spl.w.`, and BEFORE a (possible) splitting of the dataset according to the provided split-variables below (csAvg, noise, exOut). Leave at NULL for no data pre-treatment. Possible values are 'sgol', 'snv', 'msc', 'emsc', 'osc', 'deTr', 'gsd'. Add additional parameters to some of the single strings via the separator '@'. For further information and examples see `dpt_modules`.
`spl.do.csAvg`	Logical. If all the consecutive scans of a single sample should be reduced, i.e. averaged into a single spectrum.
`spl.csAvg.raw`	Logical. If, should the consecutive scans of a single sample be reduced, an other dataset containing every single consecutive scan should be kept as well as well.
`spl.do.noise`	Logical. If artifical noise should be added to the dataset.
`spl.noise.raw`	If, should the noise-test be performed, the raw data will be used as well in addition to the noise-data.
`spl.do.exOut`	Logical. If exclusion of outliers should be performed.
`spl.exOut.raw`	Logical. If, should exclusion of outliers be performed, the raw original data should be used as well. If set to TRUE, outliers will be flagged in the dataset in any case.
`spl.exOut.var`	Character vector. The variables that should be used for the grouping defining the scope for outlier detection. The name of the resulting column consists of the class variable prefix (as defined in the settings.r file in `p_ClassVarPref`), the general designator for an outlier-column (as defined in the settings.r file in `p_outlierCol`) followed by an underscore '`_`', and each of the provided variables (without the class variable prefix) separated by a '.' dot. For example, if the provided variables are `C_Group` and `C_Time`, the column containing the outlier-flags might be called `C_outlier_Group.Time`.
`dpt.post`	Character vector, which of the available modules of data pre-treatments to apply AFTER (possibly) splitting the dataset. Leave at NULL for no additional data treatment. Possible values are 'sgol', 'snv', 'msc', 'emsc', 'osc', 'deTr', 'gsd'. Add additional parameters to some of the single strings via the separator '@'. For examples and further information see `dpt_modules`.
`do.pca`	Logical. If used in a plotting function, if PCA score / loading plots should be plotted.
`pca.colorBy`	NULL or character vector. Which class-variables should be used for coloring the PCA score plot. Set to NULL for using all available class variables for coloring.
`pca.elci`	'def' or numeric length one. The confidence interval for the ellipse to be drawn around groups in score plots. Leave at 'def' to read in the default from the settings.r file; provide a numeric length one (e.g. 0.95); or set to NULL for not drawing ellipses at all.
`pca.elcolorBy`	Character vector or NULL. The variables to use for plotting additional confidence intervall ellipses. Set to NULL for not drawing additional CI-ellipses. Provide one variable (gets recycled) or a vector with equal length as `pca.colorBy` to have the additional CI-ellipses along these variables.
`pca.what`	Character length one. What element of the PCA analysis to plot. Possible values are 'both', 'scores', 'loadings'.
`pca.sc`	Numeric length 2. Two PCs to be plotted against each other in the score plots.
`pca.sc.pairs`	Numeric vector of length >=2, indicating what PCs to plot in the score pairs plot. Set to NULL for not plotting the pairs plot.
`pca.lo`	Numeric vector of length >=2, indicating what PCs to plot in the loadingplot.
`sim.vars`	NULL or character vector. Which variables should be used to group the data. Set to NULL for using all available class-variables, or provide a character vector with the column names of class variables to group the data along those for calculating SIMCA models.
`sim.K`	Numeric length one. The number of components used for calculating the SIMCA models. In mode 'robust' leave at '0' for automatic detection of optimal number of components. [It is a capital 'K' in the argument.]
`do.sim`	Logical. If used in a plotting function, if analysis of SIMCA models should be plotted.
`pls.regOn`	NULL or character vector. Which variables should be used to regress on. Set to NULL for using all numerical variables to regress on, or provide a character vector with the column names of numerical variables to use those for regression in the PLSR.
`pls.ncomp`	NULL or integer length one. The number of components used in PLSR. Set to NULL for automatic detection, or provide an integer to use this number of components in the PLSR.
`pls.valid`	Character. Which crossvalidation to use. Possible values are: "def" Read in the default value from settings.r (parameter `plsr_calc_typeOfCrossvalid`) A numeric length one for this n-fold crossvalidation. The default is to always exclude resp. include consecutive scans together. A valid name of a class variable for performing a crossvalidation based on the grouping defined by this variable. For a class variable containing e.g. four different levels, a 4-fold crossvalidation with always all members of one group being excluded is performed. This is overruling any grouping that would result from the consecutive scans, please see below. "LOO" for a leave-one-out crossvalidation If a vector with the same length as the vector in `pls.regOn` is provided, each element of `pls.valid` is used for crossvalidating the corresponding element in `pls.regOn`. Any of the above mentioned input types can be mixed, so the input could be e.g. `pls.valid <- c("C_FooBar", 10, "C_BarFoo", 10)`. The corresponding `pls.regOn` input for this would then be e.g. `pls.regOn <- c("Y_FooBar", "Y_FooBar", "Y_BarFoo", "Y_BarFoo")`. Please note that via the parameter `plsr_calc_CV_consecsTogether` in the settings file you can select if for crossvalidation the consecutive scans (i.e. the scans with the same sample number) should always be excluded or included together. The default is to always exclude resp. include the consecutive scans of a single sample together.
`pls.exOut`	Logical. If a plsr-specific box-plot based outlier-detection algorithm should be used on the data of a first plsr model to determine the outliers that then will be excluded in the final plsr model. Possible values are: "def" Read in the default value from settings.r (parameter `plsr_calc_excludePlsrOutliers`) TRUE for excluding plsr specific outliers FALSE for not performing the plsr specific outlier exclusion If a vector with the same length as the vector in `pls.regOn` is provided, each element of `pls.exOut` is used to perform the corresponding outlier-detection (or not) for each element in `pls.regOn`.
`do.pls`	Logical. If used in a plotting function, if analysis from PLSR models should be plotted.
`pls.colorBy`	NULL or character. What class-variable should be used for coloring in the RMSEC and RMSECV plots. Set to NULL for no coloring, or provide a character length one with a single column name of a class variable that should be used for coloring.
`pls.what`	What types of plsr analysis to plot. Possible values are 'both', 'errors', 'regression'.
`pls.rdp`	Logical (TRUE or FALSE). If errors in the error plots should be given in RDP or not.
`aqg.vars`	NULL or character vector. Which class variables should be used for grouping the data for the Aquagram. Provide a character vector with the column names of one or more class variables for grouping data and generate an Aquagram for every one of them.
`aqg.nrCorr`	Character or Logical. If the number of observations in each spectral pattern should be corrected (if necessary by random sampling) so that all the spectral pattern are calculated out from the same number of observations. If left at the default "def", the default value from the settings will be used. Provide "TRUE" or "FALSE" to switch number correction manually on or off.
`aqg.spectra`	Logical or Character. If left at "FALSE" (the default) no additional spectra are calculated / prepared for plotting. Other possible values are one or more of: "raw" for the raw spectra "avg" for the averaged spectra of the data represented in the aquagram; "subtr" for subtractions in the averaged spectra (see `"minus"` below) "all" for all of the aforementioned
`aqg.minus`	Character length one, character vector or NULL. Which of the levels present in each of the class-variables provided in `aqg.vars` should be used for subtractions – the average of this 'minus' gets subtracted from all the other averages. `aqg.minus` is used for the subtractions in the raw spectra as well as for the subtractions within the Aquagram, should you choose any of the -diff modes. If a vector with the same length as the vector in `aqg.vars` is provided, each element of `aqg.minus` is used to perform the corresponding subtraction for each element in `aqg.vars`. If a character length one is provided and the input in `aqg.vars` is longer than one, the single value in `aqg.minus` gets recycled and is used in each element in `aqg.vars` for subtractions.
`aqg.mod`	Character. What mode, what kind of Aquagram should be calculated? Possible values are: 'classic', 'classic-diff', 'sfc', 'sfc-diff', 'aucs', 'aucs-diff', 'aucs.tn', 'aucs.tn-diff', 'aucs.tn.dce', 'aucs.tn.dce-diff', 'aucs.dce', 'aucs.dce-diff', and 'def' for reading in the default from settings.r. Please see `calc_aqg_modes` for an explanation of the different modes.
`aqg.TCalib`	Character, numeric or NULL. The default (leave at 'def') can be set in the settings. If 'NULL' the complete temperature range of the calibration data is used for calibration. Provide a numeric length two [c(x1, x2)] for manually determining the calibration range. Provide a character 'symm@x', with 'x' being the plus and minus delta in temperature from the temperature of the experiment for having a calibration range from Texp-x to Texp+x. The 'Factory' default is 'symm@2'. Applies to all modes except the 'classic' and 'sfc' modes. If, in any of the modes showing percentages, the numbers on the Aquagram are below 0 or above 100, then the calibration range has to be extended. To record your own temperature calibration spectra, please see `genTempCalibExp`.
`aqg.Texp`	Numeric length one. The temperature at which the spectra were taken. The default (leave at 'def') can be set in the settings. Please see also `genTempCalibExp`.
`aqg.bootCI`	Logical. If confidence intervalls for the selected wavelengths should be calculated within each group (using bootstrap). Leave at 'def' for getting the default from the settings.
`aqg.R`	Character or numeric. Given aqg.bootCI = TRUE, how many bootstrap replicates should be performed? Leave at 'def' for choossing the default from the settings, where the factory-default is "nrow@3" for for 3 x nrow(samples). By manually providing a character in the form of 'nrow@x' where x is any number, you can set the factor with which the number of rows get multiplicated, the result of this multiplication is then used for the number of bootstrap replicates. By providing a length one numeric you can directly set the number of bootstrap replicates.
`aqg.smoothN`	Only used in the 'classic' and 'sfc' modes. Numeric length 1. Must be odd. Smoothing points for the Sav. Golay smoothing that is applied before making the calculations. Change to NULL or anything not-numeric to switch off smoothing.
`aqg.selWls`	Only used in the 'classic' and 'sfc' modes. Numerical vector. If provided and in the mode "classic", classic-diff", "sfc" and "sfc-diff" these numbers will be used to determine the coordinates of the aquagram. Leave at 'def' to use the defaults from the settings file.
`aqg.msc`	Only used in the 'classic' and 'sfc' modes. Logical. If MSC should be performed.
`aqg.reference`	Only used in the 'classic' and 'sfc' modes. An optional numerical vector (loadings, etc..) used for MSC.
`do.aqg`	Logical. If used in a plotting function, if Aquagrams should be plotted.
`aqg.fsa`	'Fix scale for Aquagram'. Logical, numeric or Character. If left at the default logical FALSE, every single aquagram will be plotted in its own, independent scale. If a numeric vector length two is provided, all the aquagrams to be plotted (normal AND bootstrapped ones) will be in the provided range, no independently scaled aquagrams will be plotted. If character, the following values are possible: `"both"`: both independently scaled AND automatically calculated fix-scaled aquagrams will be plotted `"only"`: only the automatically calculated fix-scaled aquagrams will be plotted. (normal AND bootstrap)
`aqg.fss`	'Fix scale for subtraction spectra'. Logical, numeric or character. If left at the default logical FALSE', every single subtraction-spectra plot will be plotted in its own, independendent scale. If a numeric vector length two is provided, all the subtraction-spectra to be plotted (if 'plotSpectra' contains 'subtr', and 'minus' contains a valid value) will be in the provided range, no independently scaled subtraction-spectra will be plotted. If character, the following values are possible: `"both"`: both independently scaled AND automatically calculated fix-scaled spectra will be plotted `"only"`: only the automatically calulated fix-scaled subtraction spectra will be plotted
`aqg.ccol`	Custom Color - NULL, Numeric or Character vector. Custom colors for drawing the lines in the aquagram. Length must exactly match the number of groups to be plotted in the aquagram. If not, the default coloring from the dataset is used. This can be used when plotting aquagrams with different numbers of groups: only this group that matches the number of provided custom colors is colored differently. Especially useful when you have more than 8 lines to be plotted – custom-color similar groups in similar colors.
`aqg.clt`	Character or Integer vector. Custom line type for plotting the lines in the Aquagram. If left at the default 'def', the vector provided in the settings.r file is taken (and recycled). If an integer vector is provided, this is used (and recycled) as line-types in the Aquagram.
`aqg.pplot`	Logical or character 'def'. If, should spectra be plotted, an additional plot with picked peaks should be added. If left at the default value 'def', the default from the settings.r file is used.
`aqg.plines`	Logical, numeric or character 'def'. If set to `FALSE`, no additional lines, if set to `TRUE` all the additional lines will be plotted. If an integer vector [2..5] is provided, one or more of the additional lines get plotted. See `adLinesToVector` for details. If left at the default value 'def', the default from the settings.r file (parameter `aqg_AdLines`) is used.
`aqg.discr`	Logical or character 'def'. If set to TRUE, negative (resp. positive) peaks can be only found in peak-heights below (resp. above) zero.
`do.da`	Logical. If used in `getap`, if classification via discriminant analysis (`lda`, `qda`, `fda`, `MclustDA`) should be performed in the given dataset.
`da.type`	Character vector. The type of discriminant analysis (DA) to perform; possible values (one or more) are: `'lda', 'qda', 'fda', 'mclustda'`: `lda` Linear DA using `lda`. `qda` Quadratic DA using `qda`. `fda` Flexible DA using `fda`. `mclustda` DA based on Gaussian finite mixture modeling using `MclustDA`.
`da.classOn`	Character vector. One or more class variables to define the grouping used for classification.
`da.testCV`	Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below.
`da.percTest`	Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation.
`da.cvBootCutoff`	The minimum number of observations (W) that should be in the smallest subgroup (as defined by the classification grouping variable) AFTER the split into `da.valid` crossvalidation segments (below). If W is equal or higher than `da.cvBootCutoff`, the crossvalidation is done via splitting the training data in `da.valid` (see below) segments, otherwise the crossvalidation is done via bootstrap resampling, with the number of bootstrap iterations resulting from the multiplication of the number of observations in this smallest subgroup (as defined by the classification grouping variable) in all of the training data with `da.cvBootFactor`. To never perform the CV of the training data via bootstrap, set the parameter `cl_gen_neverBootstrapForCV` in the settings.r file to `TRUE`. An example: With `da.cvBootCutoff` set to `15` and a 8-fold crossvalidation `da.valid <- 8`, the required minimum number of observations in the smallest subgroup after the split in 8 segments would be 15, and in all the training data to perform the desired 8-fold CV would be (8x15=) 120, in what case then 8 times 15 observations will form the test data to be projected into models made from (120-15=) 105 observations. If there would be less than 120 observations, lets say for example, only 100 observations in the smallest group as defined by the classification grouping variable, bootstrap resampling with `da.cvBootFactor * 100` iterations would be performed. In this example, if we would also be satisfied with a 5-fold crossvalidation, then we would have enough data: 100 / 5 = 20, and with the `da.cvBootCutoff` value being 15, the 5-fold crossvalidation would be performed.
`da.cvBootFactor`	The factor used to multiply the number of observations within the smallest subgroup defined by the classification grouping variable with, resulting in the number of iterations of a possible bootstrap crossvalidation of the trainign data – see `.cvBootCutoff`.
`da.valid`	The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above.
`da.pcaRed`	Logical, if variable reduction via PCA should be applied; if TRUE, the subsequent classifications are performed on the PCA scores, see `da.pcaNComp` below.
`da.pcaNComp`	Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for DA.
`do.rnf`	Logical. If used in `getap`, if classification via `randomForest` should be performed in the given dataset.
`rnf.classOn`	Character vector. One or more class variables to define the grouping used for classification.
`rnf.testCV`	Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below.
`rnf.percTest`	Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation.
`rnf.cvBootCutoff`	The minimum number of observations (W) that should be in the smallest subgroup (as defined by the classification grouping variable) AFTER the split into `rnf.valid` crossvalidation segments (below). If W is equal or higher than `rnf.cvBootCutoff`, the crossvalidation is done via splitting the training data in `rnf.valid` (see below) segments, otherwise the crossvalidation is done via bootstrap resampling, with the number of bootstrap iterations resulting from the multiplication of the number of observations in this smallest subgroup (as defined by the classification grouping variable) in all of the training data with `rnf.cvBootFactor`. To never perform the CV of the training data via bootstrap, set the parameter `cl_gen_neverBootstrapForCV` in the settings.r file to `TRUE`. An example: With `rnf.cvBootCutoff` set to `15` and a 8-fold crossvalidation `rnf.valid <- 8`, the required minimum number of observations in the smallest subgroup after the split in 8 segments would be 15, and in all the training data to perform the desired 8-fold CV would be (8x15=) 120, in what case then 8 times 15 observations will form the test data to be projected into models made from (120-15=) 105 observations. If there would be less than 120 observations, lets say for example, only 100 observations in the smallest group as defined by the classification grouping variable, bootstrap resampling with `rnf.cvBootFactor * 100` iterations would be performed. In this example, if we would also be satisfied with a 5-fold crossvalidation, then we would have enough data: 100 / 5 = 20, and with the `rnf.cvBootCutoff` value being 15, the 5-fold crossvalidation would be performed.
`rnf.cvBootFactor`	The factor used to multiply the number of observations within the smallest subgroup defined by the classification grouping variable with, resulting in the number of iterations of a possible bootstrap crossvalidation of the trainign data – see `.cvBootCutoff`.
`rnf.valid`	The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above.
`rnf.pcaRed`	Logical, if variable reduction via PCA should be applied; if TRUE, the subsequent classifications are performed on the PCA scores, see `rnf.pcaNComp` below.
`rnf.pcaNComp`	Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for random forest classification.
`do.svm`	Logical. If used in `getap`, if classification via `svm` should be performed in the given dataset.
`svm.classOn`	Character vector. One or more class variables to define the grouping used for classification.
`svm.testCV`	Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below.
`svm.percTest`	Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation.
`svm.cvBootCutoff`	The minimum number of observations (W) that should be in the smallest subgroup (as defined by the classification grouping variable) AFTER the split into `svm.valid` crossvalidation segments (below). If W is equal or higher than `svm.cvBootCutoff`, the crossvalidation is done via splitting the training data in `svm.valid` (see below) segments, otherwise the crossvalidation is done via bootstrap resampling, with the number of bootstrap iterations resulting from the multiplication of the number of observations in this smallest subgroup (as defined by the classification grouping variable) in all of the training data with `svm.cvBootFactor`. To never perform the CV of the training data via bootstrap, set the parameter `cl_gen_neverBootstrapForCV` in the settings.r file to `TRUE`. An example: With `svm.cvBootCutoff` set to `15` and a 8-fold crossvalidation `svm.valid <- 8`, the required minimum number of observations in the smallest subgroup after the split in 8 segments would be 15, and in all the training data to perform the desired 8-fold CV would be (8x15=) 120, in what case then 8 times 15 observations will form the test data to be projected into models made from (120-15=) 105 observations. If there would be less than 120 observations, lets say for example, only 100 observations in the smallest group as defined by the classification grouping variable, bootstrap resampling with `svm.cvBootFactor * 100` iterations would be performed. In this example, if we would also be satisfied with a 5-fold crossvalidation, then we would have enough data: 100 / 5 = 20, and with the `svm.cvBootCutoff` value being 15, the 5-fold crossvalidation would be performed.
`svm.cvBootFactor`	The factor used to multiply the number of observations within the smallest subgroup defined by the classification grouping variable with, resulting in the number of iterations of a possible bootstrap crossvalidation of the trainign data – see `.cvBootCutoff`.
`svm.valid`	The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above.
`svm.pcaRed`	Logical, if variable reduction via PCA should be applied; if TRUE, the subsequent classifications are performed on the PCA scores, see `svm.pcaNComp` below.
`svm.pcaNComp`	Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for SVM classification.
`do.nnet`	Logical. If used in `getap`, if classification via artificial neural networks (`nnet`) should be performed in the given dataset.
`nnet.classOn`	Character vector. One or more class variables to define the grouping used for classification.
`nnet.testCV`	Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below.
`nnet.percTest`	Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation.
`nnet.cvBootCutoff`	The minimum number of observations (W) that should be in the smallest subgroup (as defined by the classification grouping variable) AFTER the split into `nnet.valid` crossvalidation segments (below). If W is equal or higher than `nnet.cvBootCutoff`, the crossvalidation is done via splitting the training data in `nnet.valid` (see below) segments, otherwise the crossvalidation is done via bootstrap resampling, with the number of bootstrap iterations resulting from the multiplication of the number of observations in this smallest subgroup (as defined by the classification grouping variable) in all of the training data with `nnet.cvBootFactor`. To never perform the CV of the training data via bootstrap, set the parameter `cl_gen_neverBootstrapForCV` in the settings.r file to `TRUE`. An example: With `nnet.cvBootCutoff` set to `15` and a 8-fold crossvalidation `nnet.valid <- 8`, the required minimum number of observations in the smallest subgroup after the split in 8 segments would be 15, and in all the training data to perform the desired 8-fold CV would be (8x15=) 120, in what case then 8 times 15 observations will form the test data to be projected into models made from (120-15=) 105 observations. If there would be less than 120 observations, lets say for example, only 100 observations in the smallest group as defined by the classification grouping variable, bootstrap resampling with `nnet.cvBootFactor * 100` iterations would be performed. In this example, if we would also be satisfied with a 5-fold crossvalidation, then we would have enough data: 100 / 5 = 20, and with the `nnet.cvBootCutoff` value being 15, the 5-fold crossvalidation would be performed.
`nnet.cvBootFactor`	The factor used to multiply the number of observations within the smallest subgroup defined by the classification grouping variable with, resulting in the number of iterations of a possible bootstrap crossvalidation of the trainign data – see `.cvBootCutoff`.
`nnet.valid`	The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above.
`nnet.pcaRed`	Logical, if variable reduction via PCA should be applied; if TRUE, the subsequent classifications are performed on the PCA scores, see `nnet.pcaNComp` below.
`nnet.pcaNComp`	Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for nnet classification.
`reserved`	– No plotting parameter yet defined –
`pg.where`	Character length one. If left at the default 'def', the value from the settings.r file is read in (parameter `gen_plot_pgWhereDefault`). For plotting to PDFs provide "pdf", for plotting to graphics device provide anything but "pdf".
`pg.main`	Character length one. The additional text on the title of each single plot.
`pg.sub`	Character length one. The additional text on the subtitle of each single plot.
`pg.fns`	Character length one. The additional text in the filename of the pdf.

Details

The default name for the analysis procedure file can be set in settings.r. Any other .r file can be loaded by providing a valid .r filename to the appropriate argument, e.g. in the function getap. By providing any of the arguments of the analysis procedure file to the function getap also when using it inside the function gdmm or to any of the plot functions, you can override the values in the file with the provided values. See examples at gdmm and plot.

Important

As the AUC-mods of the Aquagram compare the actual data to your previously recoreded temperature calibration data (see genTempCalibExp and tempCalib_procedures), the application of some data-treatment functions (see e.g. do_gapDer) can lead to unexpected and distorted results in the Aquagram.

bpollner/aquap2
Multivariate Data Analysis Tools for R including Aquaphotomics Methods

anproc_file: Analysis Procedure File
In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

Analysis Procedure File

Description

Arguments

Details

Important

See Also

Related to anproc_file in bpollner/aquap2...

R Package Documentation

Browse R Packages

We want your feedback!

bpollner/aquap2 Multivariate Data Analysis Tools for R including Aquaphotomics Methods

anproc_file: Analysis Procedure File In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

Analysis Procedure File

Description

Arguments

Details

Important

See Also

Related to anproc_file in bpollner/aquap2...

R Package Documentation

Browse R Packages

We want your feedback!

bpollner/aquap2
Multivariate Data Analysis Tools for R including Aquaphotomics Methods

anproc_file: Analysis Procedure File
In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods