sampdp | R Documentation |
The function divides the data X
in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977).
The training and test sets returned by the algorithm are of equal size. The evential remaining observations can be added a posteriori
to the training set.
sampdp(X, k, diss = c("euclidean", "mahalanobis", "correlation"))
X |
A matrix or data frame in which row observations are selected. |
k |
An integer defining the number of training observations to select. Must be <= |
diss |
The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "euclidean" (default; Euclidean distance), "mahalanobis" (Mahalanobis distance), or "correlation". Correlation dissimilarities are calculated by sqrt(.5 * (1 - rho)). |
A list of vectors of the indexes (i.e. row numbers in X
) of the selected observations.
Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.
Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581
set.seed(seed = 1)
n <- 10 ; p <- 3
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
set.seed(seed = NULL)
k <- 3
sampdp(X, k = 4)
sampdp(X, k = 4, diss = "mahalanobis")
###################################
data(datcass)
X <- datcass$Xr
fm <- pca_eigenk(X, ncomp = 10)
z <- sampdp(fm$T, k = 20, diss = "mahalanobis")
z
plotxy(fm$T, zeroes = TRUE, pch = 16)
points(fm$T[z$test, 1:2], col = "red", pch = 16, cex = 1.3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.