sampks | R Documentation |
The function divides the data X
in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard & Stone, 1969).
The two sets returned by the KS algorithm are not generated by the same probability distribution. One set has higher dispersion than the other. For being consistent with the literature, output train
of sampks
contains the set with the higher dispersion. (The train/test notions can be inverted depending on the objectives and usages).
sampks(X, k, diss = c("euclidean", "mahalanobis", "correlation"))
X |
A |
k |
An integer defining the number of training observations to select. |
diss |
The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)). |
A list of vectors of the indexes (i.e. row numbers in X
) of the selected observations.
Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.
set.seed(seed = 1)
n <- 10 ; p <- 3
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
set.seed(seed = NULL)
sampks(X, k = 7)
sampks(X, k = 7, diss = "mahalanobis")
###################################
data(datcass)
X <- datcass$Xr
fm <- pca_eigenk(X, ncomp = 10)
z <- sampks(fm$T, k = 140, diss = "mahalanobis")
z
plotxy(fm$T, zeroes = TRUE, pch = 16)
points(fm$T[z$test, 1:2], col = "red", pch = 16, cex = 1.3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.