computeSampling: Sampling raw data matrix
In RclusTool: Graphical Toolbox for Clustering and Classification of Data Frames

computeSampling

R Documentation

Sampling raw data matrix

Description

computes sampling on raw data matrix to reduce the number of observations, with generalization step.

Usage

computeSampling(
  x,
  label = NULL,
  K = 0,
  toKeep = NULL,
  sampling.size.max = 3000,
  K.max = 20,
  kmeans.variance.min = 0.95
)

Arguments

`x`	matrix of raw data (point by line).
`label`	vector of (named) labels.
`K`	number of clusters. If K=0 (default), this number is automatically computed thanks to the Elbow method.
`toKeep`	vector of row.names to keep in the sample (for constrained algorithms).
`sampling.size.max`	maximal number of observations to keep in the sample.
`K.max`	maximal number of clusters (K.Max=20 by default).
`kmeans.variance.min`	elbow method cumulative explained variance > criteria to stop K-search.

Details

computeSampling computes sampling on raw data matrix to reduce the number of observations, with generalization step.

Value

The function returns a list containing:

`selection.ids`	vector of selected row.names in the sample.
`selection.labs`	vector of selected labels in the sample.
`matching`	character specifying the matching for all observations and used for generalization of the clustering result.
`size.max`	maximal number of observations kept in the sample.
`K`	number of clusters.

Examples

dat <- rbind(matrix(rnorm(100, mean = 0, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 4, sd = 0.3), ncol = 2))
tf <- tempfile()
write.table(dat, tf, sep=",", dec=".")
x <- importSample(file.features=tf)

res.sampling <- computeSampling(x$features$initial$x)

RclusTool documentation built on May 29, 2024, 5:23 a.m.