sampdp: Duplex sampling

View source: R/sampdp.R

sampdpR Documentation

Duplex sampling

Description

The function divides the data X in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add a posteriori the eventual remaining observations (not in "train" nor "test") to "train".

Usage


sampdp(X, k, diss = c("eucl", "mahal"))

Arguments

X

X-data (n, p) to be sampled.

k

An integer defining the number of training observations to select. Must be <= n / 2.

diss

The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

Value

Indexes (i.e. row numbers in X) for sets "train" and "test".

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581

Examples


n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 4
sampdp(X, k = k)
sampdp(X, k = k, diss = "mahal")


mlesnoff/rchemo documentation built on April 15, 2023, 1:25 p.m.