kenStone | R Documentation |
Select calibration samples from a large multivariate data using the Kennard-Stone algorithm
kenStone(X, k, metric = "mahal", pc, group, .center = TRUE, .scale = FALSE, init = NULL)
X |
a numeric matrix. |
k |
number of calibration samples to be selected. |
metric |
distance metric to be used: 'euclid' (Euclidean distance) or 'mahal' (Mahalanobis distance, default). |
pc |
optional. If not specified, distance are computed in the Euclidean
space. Alternatively, distance are computed
in the principal component score space and |
group |
An optional |
.center |
logical value indicating whether the input matrix should be
centered before Principal Component Analysis. Default set to |
.scale |
logical value indicating whether the input matrix should be
scaled before Principal Component
Analysis. Default set to |
init |
(optional) a vector of integers indicating the indices of the
observations/rows that are to be used as observations that must be included
at the first iteration of the search process. Default is |
The Kennard–Stone algorithm allows to select samples with a uniform distribution over the predictor space (Kennard and Stone, 1969). It starts by selecting the pair of points that are the farthest apart. They are assigned to the calibration set and removed from the list of points. Then, the procedure assigns remaining points to the calibration set by computing the distance between each unassigned points \mjeqni_0i_0 and selected points \mjeqnii and finding the point for which:
\mjdeqnd_selected = \max\limits_i_0(\min\limits_i(d_i,i_0))d_sel ected = \max_i_0(\min_i(d_i,i0))
This essentially selects point \mjeqni_0i_0 which is the farthest apart from its closest neighbors \mjeqnii in the calibration set. The algorithm uses the Euclidean distance to select the points. However, the Mahalanobis distance can also be used. This can be achieved by performing a PCA on the input data and computing the Euclidean distance on the truncated score matrix according to the following definition of the Mahalanobis \mjeqnHH distance:
\mjdeqnH_ij^2 = \sum_a=1^A (\hat t_ia - \hat t_ja)^2 / \hat \lambda_aH_ij^2 = sum_a=1^A (hat t_ia - hat t_ja)^2 / hat lambda_a
where \mjeqn\hat t_iahatt_ia is the \mjeqna^tha^th principal component score of point \mjeqnii, \mjeqn\hat t_jahatt_ja is the corresponding value for point \mjeqnjj, \mjeqn\hat \lambda_ahat lambda_a is the eigenvalue of principal component \mjeqnaa and \mjeqnAA is the number of principal components included in the computation.
a list with the following components:
model
: numeric vector giving the row indices of the input data
selected for calibration
test
: numeric vector giving the row indices of the remaining
observations
pc
: if the pc
argument is specified, a numeric matrix of the
scaled pc scores
Antoine Stevens & Leonardo Ramirez-Lopez with contributions from Thorsten Behrens and Philipp Baumann
Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148.
duplex
, shenkWest
, naes
,
honigs
data(NIRsoil) sel <- kenStone(NIRsoil$spc, k = 30, pc = .99) plot(sel$pc[, 1:2], xlab = "PC1", ylab = "PC2") # points selected for calibration points(sel$pc[sel$model, 1:2], pch = 19, col = 2) # Test on artificial data X <- expand.grid(1:20, 1:20) + rnorm(1e5, 0, .1) plot(X, xlab = "VAR1", ylab = "VAR2") sel <- kenStone(X, k = 25, metric = "euclid") points(X[sel$model, ], pch = 19, col = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.