anproc_file | R Documentation |
The analysis procedure file is used to first split the dataset
according to the provided values in the 'split dataset' section, and
then, in the 'statistics' section (starting with do.pca
), to tell
the system which statistics to apply resp. what models to calculate on those
datasets. It also contains specific and general plotting options that are used
by the plot
function.
Arguments used to control the split-process, the behaviour of statistics /
calculations / specific plotting options and the general plotting options
start with a certain prefix:
"spl" for all arguments related to the split-process.
(For a separate listing please see split_dataset
)
"pca" for all arguments related to PCA models (except do.pca).
(For a separate listing see calc_pca_args
and
plot_pca_args
)
"sim" for all arguments related to SIMCA models (except do.sim).
(For a separate listing see calc_sim_args
and
plot_sim_args
)
"pls" for all arguments related to PLSR models (except do.pls).
(For a separate listing see calc_pls_args
and
plot_pls_args
)
"aqg" for all arguments related to Aquagrams (except do.aqg).
(For a separate listing see calc_aqg_args
and
plot_aqg_args
)
"da" for all arguments related to Discriminant Analysis classification
(except do.da). (For a separate listing see
calc_discrimAnalysis_args
and
plot_discrimAnalysis_args
)
"rnf" for all arguments related to RandomForest classification
(except do.rnf). (For a separate listing see
calc_randomForest_args
and
plot_randomForest_args
)
"svm" for all arguments related to Support Vector Machines
classification (except do.svm). (For a separate listing see
calc_SVM_args
and plot_SVM_args
)
"nnet" for all arguments related to Neural Networks classification
(except do.aqg). (For a separate listing see calc_NNET_args
and plot_NNET_args
)
"pg" for the general plotting options that are used in each of the
plotting functions. (For a separate listing see
plot_pg_args
)
By providing any of the arguments of the analysis procedure file to the
function getap
, also when using it inside the function
gdmm
, you can override the values in the file with the
provided values. See examples at gdmm
.
spl.var |
NULL or character vector. If NULL, no splitting of the dataset will be performed. Provide a character vector with the column names of class variables to split the dataset along these variables. |
spl.wl |
NULL or character vector. If NULL, all in the dataset available wavelengths will be used. Provide a character vector in the format "wlFrom-to-wlTo" (e.g. c("1000-to-2000", "1300-to-1600", ...)) to use all previously defined splits in these wavelengths. |
dpt.pre |
Character vector, which of the available modules of data
pre-treatments to apply AFTER a (possible) split by variable
|
spl.do.csAvg |
Logical. If all the consecutive scans of a single sample should be reduced, i.e. averaged into a single spectrum. |
spl.csAvg.raw |
Logical. If, should the consecutive scans of a single sample be reduced, an other dataset containing every single consecutive scan should be kept as well as well. |
spl.do.noise |
Logical. If artifical noise should be added to the dataset. |
spl.noise.raw |
If, should the noise-test be performed, the raw data will be used as well in addition to the noise-data. |
spl.do.exOut |
Logical. If exclusion of outliers should be performed. |
spl.exOut.raw |
Logical. If, should exclusion of outliers be performed, the raw original data should be used as well. If set to TRUE, outliers will be flagged in the dataset in any case. |
spl.exOut.var |
Character vector. The variables that should be used
for the grouping defining the scope for outlier detection. The name of the
resulting column consists of the class variable prefix (as defined in the
settings.r file in |
dpt.post |
Character vector, which of the available modules of data
pre-treatments to apply AFTER (possibly) splitting the dataset. Leave
at NULL for no additional data treatment. Possible values are
'sgol', 'snv', 'msc', 'emsc', 'osc', 'deTr', 'gsd'. Add additional parameters to some of the
single strings via the separator '@'. For examples and further information
see |
do.pca |
Logical. If used in a plotting function, if PCA score / loading plots should be plotted. |
pca.colorBy |
NULL or character vector. Which class-variables should be used for coloring the PCA score plot. Set to NULL for using all available class variables for coloring. |
pca.elci |
'def' or numeric length one. The confidence interval for the ellipse to be drawn around groups in score plots. Leave at 'def' to read in the default from the settings.r file; provide a numeric length one (e.g. 0.95); or set to NULL for not drawing ellipses at all. |
pca.elcolorBy |
Character vector or NULL. The variables to use for
plotting additional confidence intervall ellipses. Set to NULL for *not*
drawing additional CI-ellipses. Provide one variable (gets recycled) or a
vector with equal length as |
pca.what |
Character length one. What element of the PCA analysis to plot. Possible values are 'both', 'scores', 'loadings'. |
pca.sc |
Numeric length 2. Two PCs to be plotted against each other in the score plots. |
pca.sc.pairs |
Numeric vector of length >=2, indicating what PCs to plot in the score pairs plot. Set to NULL for *not* plotting the pairs plot. |
pca.lo |
Numeric vector of length >=2, indicating what PCs to plot in the loadingplot. |
sim.vars |
NULL or character vector. Which variables should be used to group the data. Set to NULL for using all available class-variables, or provide a character vector with the column names of class variables to group the data along those for calculating SIMCA models. |
sim.K |
Numeric length one. The number of components used for calculating the SIMCA models. In mode 'robust' leave at '0' for automatic detection of optimal number of components. [It is a capital 'K' in the argument.] |
do.sim |
Logical. If used in a plotting function, if analysis of SIMCA models should be plotted. |
pls.regOn |
NULL or character vector. Which variables should be used to regress on. Set to NULL for using all numerical variables to regress on, or provide a character vector with the column names of numerical variables to use those for regression in the PLSR. |
pls.ncomp |
NULL or integer length one. The number of components used in PLSR. Set to NULL for automatic detection, or provide an integer to use this number of components in the PLSR. |
pls.valid |
Character. Which crossvalidation to use. Possible values are:
If a vector with the same length as the vector in |
pls.exOut |
Logical. If a plsr-specific box-plot based outlier-detection algorithm should be used on the data of a first plsr model to determine the outliers that then will be excluded in the final plsr model. Possible values are:
If a vector with the same length as the vector in |
do.pls |
Logical. If used in a plotting function, if analysis from PLSR models should be plotted. |
pls.colorBy |
NULL or character. What class-variable should be used for coloring in the RMSEC and RMSECV plots. Set to NULL for no coloring, or provide a character length one with a single column name of a class variable that should be used for coloring. |
pls.what |
What types of plsr analysis to plot. Possible values are 'both', 'errors', 'regression'. |
pls.rdp |
Logical (TRUE or FALSE). If errors in the error plots should be given in RDP or not. |
aqg.vars |
NULL or character vector. Which class variables should be used for grouping the data for the Aquagram. Provide a character vector with the column names of one or more class variables for grouping data and generate an Aquagram for every one of them. |
aqg.nrCorr |
Character or Logical. If the number of observations in each spectral pattern should be corrected (if necessary by random sampling) so that all the spectral pattern are calculated out from the same number of observations. If left at the default "def", the default value from the settings will be used. Provide "TRUE" or "FALSE" to switch number correction manually on or off. |
aqg.spectra |
Logical or Character. If left at "FALSE" (the default) no additional spectra are calculated / prepared for plotting. Other possible values are one or more of:
|
aqg.minus |
Character length one, character vector or NULL. Which of the
levels present in each of the class-variables provided in |
aqg.mod |
Character. What mode, what kind of Aquagram should be calculated?
Possible values are: 'classic', 'classic-diff', 'sfc', 'sfc-diff', 'aucs', 'aucs-diff', 'aucs.tn', 'aucs.tn-diff', 'aucs.tn.dce', 'aucs.tn.dce-diff', 'aucs.dce', 'aucs.dce-diff', and 'def' for reading in
the default from settings.r. Please see |
aqg.TCalib |
Character, numeric or NULL. The default (leave at 'def') can be
set in the settings. If 'NULL' the complete temperature range of the
calibration data is used for calibration. Provide a numeric length two
[c(x1, x2)] for manually determining the calibration range. Provide a
character 'symm@x', with 'x' being the plus and minus delta in temperature
from the temperature of the experiment for having a calibration range from
Texp-x to Texp+x. The 'Factory' default is 'symm@2'.
Applies to all modes except the 'classic' and 'sfc' modes.
If, in any of the modes showing percentages, the numbers on the
Aquagram are below 0 or above 100, then the calibration range has to be
extended. To record your own temperature calibration spectra, please see
|
aqg.Texp |
Numeric length one. The temperature at which the
spectra were taken. The default (leave at 'def') can be set in the settings.
Please see also |
aqg.bootCI |
Logical. If confidence intervalls for the selected wavelengths should be calculated within each group (using bootstrap). Leave at 'def' for getting the default from the settings. |
aqg.R |
Character or numeric. Given aqg.bootCI = TRUE, how many bootstrap replicates should be performed? Leave at 'def' for choossing the default from the settings, where the factory-default is "nrow@3" for for 3 x nrow(samples). By manually providing a character in the form of 'nrow@x' where x is any number, you can set the factor with which the number of rows get multiplicated, the result of this multiplication is then used for the number of bootstrap replicates. By providing a length one numeric you can directly set the number of bootstrap replicates. |
aqg.smoothN |
Only used in the 'classic' and 'sfc' modes. Numeric length 1. Must be odd. Smoothing points for the Sav. Golay smoothing that is applied before making the calculations. Change to NULL or anything not-numeric to switch off smoothing. |
aqg.selWls |
Only used in the 'classic' and 'sfc' modes. Numerical vector. If provided and in the mode "classic", classic-diff", "sfc" and "sfc-diff" these numbers will be used to determine the coordinates of the aquagram. Leave at 'def' to use the defaults from the settings file. |
aqg.msc |
Only used in the 'classic' and 'sfc' modes. Logical. If MSC should be performed. |
aqg.reference |
Only used in the 'classic' and 'sfc' modes. An optional numerical vector (loadings, etc..) used for MSC. |
do.aqg |
Logical. If used in a plotting function, if Aquagrams should be plotted. |
aqg.fsa |
'Fix scale for Aquagram'. Logical, numeric or Character. If left at the default logical FALSE, every single aquagram will be plotted in its own, independent scale. If a numeric vector length two is provided, all the aquagrams to be plotted (normal AND bootstrapped ones) will be in the provided range, no independently scaled aquagrams will be plotted. If character, the following values are possible:
|
aqg.fss |
'Fix scale for subtraction spectra'. Logical, numeric or character. If left at the default logical FALSE', every single subtraction-spectra plot will be plotted in its own, independendent scale. If a numeric vector length two is provided, all the subtraction-spectra to be plotted (if 'plotSpectra' contains 'subtr', and 'minus' contains a valid value) will be in the provided range, no independently scaled subtraction-spectra will be plotted. If character, the following values are possible:
|
aqg.ccol |
Custom Color - NULL, Numeric or Character vector. Custom colors for drawing the lines in the aquagram. Length must exactly match the number of groups to be plotted in the aquagram. If not, the default coloring from the dataset is used. This can be used when plotting aquagrams with different numbers of groups: only this group that matches the number of provided custom colors is colored differently. Especially useful when you have more than 8 lines to be plotted – custom-color similar groups in similar colors. |
aqg.clt |
Character or Integer vector. Custom line type for plotting the lines in the Aquagram. If left at the default 'def', the vector provided in the settings.r file is taken (and recycled). If an integer vector is provided, this is used (and recycled) as line-types in the Aquagram. |
aqg.pplot |
Logical or character 'def'. If, should spectra be plotted, an additional plot with picked peaks should be added. If left at the default value 'def', the default from the settings.r file is used. |
aqg.plines |
Logical, numeric or character 'def'. If set to |
aqg.discr |
Logical or character 'def'. If set to TRUE, negative (resp. positive) peaks can be only found in peak-heights below (resp. above) zero. |
do.da |
Logical. If used in |
da.type |
Character vector. The type of discriminant analysis (DA) to
perform; possible values (one or more) are:
|
da.classOn |
Character vector. One or more class variables to define the grouping used for classification. |
da.testCV |
Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below. |
da.percTest |
Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation. |
da.cvBootCutoff |
The minimum number of observations (W) that should be
in the smallest subgroup (as defined by the classification grouping variable)
*AFTER* the split into |
da.cvBootFactor |
The factor used to multiply the number of observations
within the smallest subgroup defined by the classification grouping variable
with, resulting in the number of iterations of a possible bootstrap
crossvalidation of the trainign data – see |
da.valid |
The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above. |
da.pcaRed |
Logical, if variable reduction via PCA should be applied; if
TRUE, the subsequent classifications are performed on the PCA scores, see
|
da.pcaNComp |
Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for DA. |
do.rnf |
Logical. If used in |
rnf.classOn |
Character vector. One or more class variables to define the grouping used for classification. |
rnf.testCV |
Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below. |
rnf.percTest |
Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation. |
rnf.cvBootCutoff |
The minimum number of observations (W) that should be
in the smallest subgroup (as defined by the classification grouping variable)
*AFTER* the split into |
rnf.cvBootFactor |
The factor used to multiply the number of observations
within the smallest subgroup defined by the classification grouping variable
with, resulting in the number of iterations of a possible bootstrap
crossvalidation of the trainign data – see |
rnf.valid |
The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above. |
rnf.pcaRed |
Logical, if variable reduction via PCA should be applied; if
TRUE, the subsequent classifications are performed on the PCA scores, see
|
rnf.pcaNComp |
Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for random forest classification. |
do.svm |
Logical. If used in |
svm.classOn |
Character vector. One or more class variables to define the grouping used for classification. |
svm.testCV |
Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below. |
svm.percTest |
Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation. |
svm.cvBootCutoff |
The minimum number of observations (W) that should be
in the smallest subgroup (as defined by the classification grouping variable)
*AFTER* the split into |
svm.cvBootFactor |
The factor used to multiply the number of observations
within the smallest subgroup defined by the classification grouping variable
with, resulting in the number of iterations of a possible bootstrap
crossvalidation of the trainign data – see |
svm.valid |
The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above. |
svm.pcaRed |
Logical, if variable reduction via PCA should be applied; if
TRUE, the subsequent classifications are performed on the PCA scores, see
|
svm.pcaNComp |
Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for SVM classification. |
do.nnet |
Logical. If used in |
nnet.classOn |
Character vector. One or more class variables to define the grouping used for classification. |
nnet.testCV |
Logical, if the errors of the test-data should be crossvalidated. If set to true, CV and testing is repeated in alternating datasets. See below. |
nnet.percTest |
Numeric length one. The percentage of the dataset that should be set aside for testing the models; these data are never seen during training and crossvalidation. |
nnet.cvBootCutoff |
The minimum number of observations (W) that should be
in the smallest subgroup (as defined by the classification grouping variable)
*AFTER* the split into |
nnet.cvBootFactor |
The factor used to multiply the number of observations
within the smallest subgroup defined by the classification grouping variable
with, resulting in the number of iterations of a possible bootstrap
crossvalidation of the trainign data – see |
nnet.valid |
The number of segments the training data should be divided into in case of a "traditional" crossvalidation of the training data; see above. |
nnet.pcaRed |
Logical, if variable reduction via PCA should be applied; if
TRUE, the subsequent classifications are performed on the PCA scores, see
|
nnet.pcaNComp |
Character or integer vector. Provide the character "max" to use the maximum number of components (i.e. the number of observations minus 1), or an integer vector specifying the components resp. their scores to be used for nnet classification. |
reserved |
– No plotting parameter yet defined – |
pg.where |
Character length one. If left at the default 'def', the value
from the settings.r file is read in (parameter |
pg.main |
Character length one. The additional text on the title of each single plot. |
pg.sub |
Character length one. The additional text on the subtitle of each single plot. |
pg.fns |
Character length one. The additional text in the filename of the pdf. |
The default name for the analysis procedure file can be set in
settings.r. Any other .r file can be loaded by providing a valid .r filename
to the appropriate argument, e.g. in the function getap
.
By providing any of the arguments of the analysis procedure file to the
function getap
also when using it inside the function
gdmm
or to any of the plot
functions, you can
override the values in the file with the provided values. See examples at
gdmm
and plot
.
As the AUC-mods of the Aquagram compare the actual data to
your previously recoreded temperature calibration data (see
genTempCalibExp
and tempCalib_procedures
), the
application of some data-treatment functions (see e.g. do_gapDer
)
can lead to unexpected and distorted results in the Aquagram.
getap
, gdmm
Other fileDocs:
metadata_file
,
settings_file
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.