computeSemiSupervised: Semi-supervised clustering

View source: R/semisupervised.R

computeSemiSupervisedR Documentation

Semi-supervised clustering

Description

Perform semi-supervised clustering based on pairwise constraints, dealing with the number of clusters K, automatically or not.

Usage

computeSemiSupervised(
  data.sample,
  ML,
  CNL,
  K = 0,
  kmax = 20,
  method.name = "Constrained_KM",
  maxIter = 2,
  pca = FALSE,
  pca.nb.dims = 0,
  spec = FALSE,
  use.sampling = FALSE,
  sampling.size.max = 0,
  scaling = FALSE,
  RclusTool.env = initParameters(),
  echo = TRUE
)

Arguments

data.sample

list containing features, profiles and clustering results.

ML

list of ML (must-link) constrained pairs (as row.names of features).

CNL

list of CNL (cannot-link) constrained pairs (as row.names of features).

K

number of clusters. If K=0 (default), this number is automatically computed thanks to the Elbow method.

kmax

maximum number of clusters.

method.name

character vector specifying the constrained algorithm to use. Must be 'Constrained_KM' (default) or 'Constrained_SC' (Constrained Spectral Clustering).

maxIter

number of iterations for SemiSupervised algorithm

pca

boolean: if TRUE, Principal Components Analysis is applied to reduce the data space.

pca.nb.dims

number of principal components kept. If pca.nb.dims=0, this number is computed automatically.

spec

boolean: if TRUE, spectral embedding is applied to reduce the data space.

use.sampling

boolean: if FALSE (default), data sampling is not used.

sampling.size.max

numeric: maximal size of the sampling set.

scaling

boolean: if TRUE, scaling is applied.

RclusTool.env

environment in which data and intermediate results are stored.

echo

boolean: if FALSE (default), no description printed in the console.

Details

computeSemiSupervised performs semi-supervised clustering based on pairwise constraints, dealing with the number of clusters K, automatically or not

Value

The function returns a list containing:

label

vector of labels.

summary

data.frame containing clusters summaries (min, max, sum, average, sd).

nbItems

number of observations.

See Also

computeCKmeans, computeCSC, KwaySSSC

Examples


dat <- rbind(matrix(rnorm(100, mean = 0, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2), 
             matrix(rnorm(100, mean = 4, sd = 0.3), ncol = 2))
tf <- tempfile()
write.table(dat, tf, sep=",", dec=".")
x <- importSample(file.features=tf)

pairs.abs <- visualizeSampleClustering(x, selection.mode = "pairs", 
		    profile.mode="whole sample", wait.close=TRUE)

res.ckm <- computeSemiSupervised(x, ML=pairs.abs$ML, CNL=pairs.abs$CNL, K=0)
plot(dat[,1], dat[,2], type = "p", xlab = "x", ylab = "y",
    col = res.ckm$label, main = "Constrained K-means clustering")





RclusTool documentation built on May 29, 2024, 5:23 a.m.