sampcla: Within-class sampling
In mlesnoff/rchemo: Dimension reduction, Regression and Discrimination for Chemometrics

sampcla

R Documentation

Within-class sampling

Description

The function divides a datset in two sets, "train" vs "test", using a stratified sampling on defined classes.

If argument y = NULL (default), the sampling is random within each class. If not, the sampling is systematic (regular grid) within each class over the quantitative variable y.

Usage


sampcla(x, y = NULL, m)

Arguments

`x`	A vector (length `m`) defining the class membership of the observations.
`y`	A vector (length `m`) defining the quantitative variable for the systematic sampling. If `NULL` (default), the sampling is random within each class.
`m`	Either an integer defining the equal number of test observation(s) to select per class, or a vector of integers defining the numbers to select for each class. In the last case, vector `m` must have a length equal to the number of classes present in `x`, and be ordered in the same way as the ordered class membership.

Value

Indexes (i.e. position in x) of the selected observations.

References

Naes, T., 1987. The design of calibration in near infra-red reflectance analysis by clustering. Journal of Chemometrics 1, 121-134.

Examples


x <- sample(c(1, 3, 4), size = 20, replace = TRUE)
#x <- sample(c("B", "3", "a"), size = 20, replace = TRUE)
#x <- as.factor(sample(c("B", "3", "a"), size = 20, replace = TRUE))
table(x)

sampcla(x, m = 2)
s <- sampcla(x, m = 2)$test
x[s]

sampcla(x, m = c(1, 2, 1))
s <- sampcla(x, m = c(1, 2, 1))$test
x[s]

y <- rnorm(length(x))
sampcla(x, y, m = 2)
s <- sampcla(x, y, m = 2)$test
x[s]

########## Representative stratified sampling from an unsupervised clustering

data(cassav)
X <- cassav$Xtrain
y <- cassav$ytrain
N <- nrow(X)

fm <- pcaeigenk(X, nlv = 10)
z <- stats::kmeans(x = fm$T, centers = 3, nstart = 25, iter.max = 50)
x <- z$cluster
z <- table(x)
z
p <- c(z) / N
p

psamp <- .20
m <- round(psamp * N * p)
m

## Random

res <- sampcla(x, m = m)
s <- res$test
table(x[s])

## Systematic for y

res <- sampcla(x, y, m = m)
s <- res$test
table(x[s])

mlesnoff/rchemo documentation built on April 15, 2023, 1:25 p.m.

mlesnoff/rchemo index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rchemo
Dimension reduction, Regression and Discrimination for Chemometrics

sampcla: Within-class sampling
In mlesnoff/rchemo: Dimension reduction, Regression and Discrimination for Chemometrics

Within-class sampling

Description

Usage

Arguments

Value

References

Examples

Related to sampcla in mlesnoff/rchemo...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rchemo Dimension reduction, Regression and Discrimination for Chemometrics

sampcla: Within-class sampling In mlesnoff/rchemo: Dimension reduction, Regression and Discrimination for Chemometrics

Within-class sampling

Description

Usage

Arguments

Value

References

Examples

Related to sampcla in mlesnoff/rchemo...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rchemo
Dimension reduction, Regression and Discrimination for Chemometrics

sampcla: Within-class sampling
In mlesnoff/rchemo: Dimension reduction, Regression and Discrimination for Chemometrics