computeSampling: Sampling raw data matrix

View source: R/sampleCompute.R

computeSamplingR Documentation

Sampling raw data matrix

Description

computes sampling on raw data matrix to reduce the number of observations, with generalization step.

Usage

computeSampling(
  x,
  label = NULL,
  K = 0,
  toKeep = NULL,
  sampling.size.max = 3000,
  K.max = 20,
  kmeans.variance.min = 0.95
)

Arguments

x

matrix of raw data (point by line).

label

vector of (named) labels.

K

number of clusters. If K=0 (default), this number is automatically computed thanks to the Elbow method.

toKeep

vector of row.names to keep in the sample (for constrained algorithms).

sampling.size.max

maximal number of observations to keep in the sample.

K.max

maximal number of clusters (K.Max=20 by default).

kmeans.variance.min

elbow method cumulative explained variance > criteria to stop K-search.

Details

computeSampling computes sampling on raw data matrix to reduce the number of observations, with generalization step.

Value

The function returns a list containing:

selection.ids

vector of selected row.names in the sample.

selection.labs

vector of selected labels in the sample.

matching

character specifying the matching for all observations and used for generalization of the clustering result.

size.max

maximal number of observations kept in the sample.

K

number of clusters.

Examples

dat <- rbind(matrix(rnorm(100, mean = 0, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 4, sd = 0.3), ncol = 2))
tf <- tempfile()
write.table(dat, tf, sep=",", dec=".")
x <- importSample(file.features=tf)

res.sampling <- computeSampling(x$features$initial$x)



RclusTool documentation built on May 29, 2024, 5:23 a.m.