split_dataset: Split Dataset

split_datasetR Documentation

Split Dataset

Description

The following parameters can be used in the ... argument in function getap, also within function gdmm, to override the values in the analysis procedure file and so to modify the split-dataset process - see examples.

getap(...)

gdmm(dataset, ap=getap(...))

Arguments

spl.var

NULL or character vector. If NULL, no splitting of the dataset will be performed. Provide a character vector with the column names of class variables to split the dataset along these variables.

spl.wl

NULL or character vector. If NULL, all in the dataset available wavelengths will be used. Provide a character vector in the format "wlFrom-to-wlTo" (e.g. c("1000-to-2000", "1300-to-1600", ...)) to use all previously defined splits in these wavelengths.

dpt.pre

Character vector, which of the available modules of data pre-treatments to apply AFTER a (possible) split by variable spl.var and wavelength spl.w., and BEFORE a (possible) splitting of the dataset according to the provided split-variables below (csAvg, noise, exOut). Leave at NULL for no data pre-treatment. Possible values are 'sgol', 'snv', 'msc', 'emsc', 'osc', 'deTr', 'gsd'. Add additional parameters to some of the single strings via the separator '@'. For further information and examples see dpt_modules.

spl.do.csAvg

Logical. If all the consecutive scans of a single sample should be reduced, i.e. averaged into a single spectrum.

spl.csAvg.raw

Logical. If, should the consecutive scans of a single sample be reduced, an other dataset containing every single consecutive scan should be kept as well as well.

spl.do.noise

Logical. If artifical noise should be added to the dataset.

spl.noise.raw

If, should the noise-test be performed, the raw data will be used as well in addition to the noise-data.

spl.do.exOut

Logical. If exclusion of outliers should be performed.

spl.exOut.raw

Logical. If, should exclusion of outliers be performed, the raw original data should be used as well. If set to TRUE, outliers will be flagged in the dataset in any case.

spl.exOut.var

Character vector. The variables that should be used for the grouping defining the scope for outlier detection. The name of the resulting column consists of the class variable prefix (as defined in the settings.r file in p_ClassVarPref), the general designator for an outlier-column (as defined in the settings.r file in p_outlierCol) followed by an underscore '_', and each of the provided variables (without the class variable prefix) separated by a '.' dot. For example, if the provided variables are C_Group and C_Time, the column containing the outlier-flags might be called C_outlier_Group.Time.

dpt.post

Character vector, which of the available modules of data pre-treatments to apply AFTER (possibly) splitting the dataset. Leave at NULL for no additional data treatment. Possible values are 'sgol', 'snv', 'msc', 'emsc', 'osc', 'deTr', 'gsd'. Add additional parameters to some of the single strings via the separator '@'. For examples and further information see dpt_modules.

Details

For a list of all parameters that can be used in the ... argument in getap and in the plot functions please see anproc_file.

See Also

dpt_modules

Other Calc. arguments: calc_NNET_args, calc_SVM_args, calc_aqg_args, calc_discrimAnalysis_args, calc_pca_args, calc_pls_args, calc_randomForest_args, calc_sim_args

Examples

## Not run: 
dataset <- gfd() # will load or import data
cube <- gdmm(dataset, getap(spl.var="C_Group")) # split the dataset by "C_Group"
cube <- gdmm(dataset, getap(spl.var=c("C_Group", "C_Temp"))) # split the dataset 
# by "C_Group", then by "C_Temp"
cube <- gdmm(dataset, getap(spl.wl="1300-to-1600")) # override 'spl.wl' in the 
# analysis procedure 

## End(Not run)

bpollner/aquap2 documentation built on March 29, 2024, 7:33 a.m.