subdivideDataset: Selects a subset of a multivariate (e.g., spectral) dataset.
In griffithdan/plantspec: NIR Calibration and Spectral Data Management in R

Description Usage Arguments Value Author(s) References Examples

This function accepts spectra in a spectra.list or spectra.matrix object and selects a subset of that dataset. Importantly, the function can be set to select either a calibration or validation subset. These are fundamentally different. When you select a calibration dataset the intention is to choose a representative subset of all spectral data on which to perform wet lab analysis. However, when selecting a subset of samples (which already have wet lab analysis) in order to validate a model, it is important that both the validation (test set) and and calibration (training set) are representative–otherwise, the calibration model will be fit to well sampled spectral space but validated on outlying points. The calibration selection uses the Kennard-Stone algorithm whereas the validation selection uses the Duplex algorithm, which is a modification the original author's proposed. Finally, this function can also perform calibration or validation selection in one of five distinct methods (see the method parameter for details).

1 2	subdivideDataset(spectra, component = NULL, type = "validation", p = 0.2, method = "KS", seed.set = NULL, output = "logical")

`spectra`	An object of class `spectra.list` or `spectra.matrix` containing the spectra to write.
`component`	Method "SPXY" and "MDKS" incorporate Y-value data in subset selection. If using one of these two methods, a vector of Y data should be provided here.
`type`	One of "calibration" or "validation" depending on the type of subset required.
`p`	The proportion of the dataset to select as the "calibration" or "validation" group.
`method`	The desired method. Selected from: "KS" - Standard, Kennard-Stone selection. When `type = validation` the performs Duplex selection. "PCAKS" - Selection is performed on the principal components from a PCA of the spectra. "SPXY" - Selection occurs on both X (spectra) and Y (component) data with equal weighting. "MDKS" - Mahalanobis distance is used instead of euclidean distances. Selection occurs on both X (spectra) and Y (component) data with equal weighting. "random" - Simple random selection, regardless of multivariate distribution.
`seed.set`	A single numeric value. If method is "random" then you can set the seed so that the same selection is produced each time.
`output`	One of "logical" or "names." If "logical" then the function will return a logical vector where TRUE values are the selected samples. If "names" then the names of the selected spectra are returned.

A vector. Depending on output, either a logical of list of names indicating selected spectra.

Daniel M Griffith

Kennard, R. W. and Stone, L. A. (1969) Computer aided design of experiments. Technometrics, 11, 137-148.

Galvao, R., Araujo, M., Jose, G., Pontes, M., Silva, E. & Saldanha, T. (2005). A method for calibration and validation subset partitioning. Talanta, 67, 736<e2><80><93>740.

Saptoro, Agus; Tad<c3><a9>, Moses O.; and Vuthaluru, Hari (2012) "A Modified Kennard-Stone Algorithm for Optimal Division of Data for Developing Artificial Neural Network Models," Chemical Product and Process Modeling: Vol. 7: Iss. 1, Article 13. DOI: 10.1515/1934-2659.1645

Snee, R.D., 1977. Validation of regression models: methods and examples. Technometrics, 19, 415-428.

## Not run: 
data(shootout)
val_set <- subdivideDataset(spectra = shootout_scans, type = "validation", method = "KS")
table(val_set)

## End(Not run)

griffithdan/plantspec documentation built on May 17, 2019, 8:37 a.m.

griffithdan/plantspec index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

griffithdan/plantspec
NIR Calibration and Spectral Data Management in R

subdivideDataset: Selects a subset of a multivariate (e.g., spectral) dataset.
In griffithdan/plantspec: NIR Calibration and Spectral Data Management in R

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to subdivideDataset in griffithdan/plantspec...

R Package Documentation

Browse R Packages

We want your feedback!

griffithdan/plantspec NIR Calibration and Spectral Data Management in R

subdivideDataset: Selects a subset of a multivariate (e.g., spectral) dataset. In griffithdan/plantspec: NIR Calibration and Spectral Data Management in R

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to subdivideDataset in griffithdan/plantspec...

R Package Documentation

Browse R Packages

We want your feedback!

griffithdan/plantspec
NIR Calibration and Spectral Data Management in R

subdivideDataset: Selects a subset of a multivariate (e.g., spectral) dataset.
In griffithdan/plantspec: NIR Calibration and Spectral Data Management in R