cluster.com: Functional data clustering via concentration inequalities
In fdcov: Analysis of Covariance Operators

Description Usage Arguments Details Value Author(s) References Examples

View source: R/cluster_com.R

cluster.com clusters sets of functional data via their covariance operators making use of an EM style algorithm with concentration inequalities.

1 2	cluster.com(dat, labl = NULL, grpCnt = 2, iter = 30, SOFT = FALSE, PRINTLK = TRUE, LOADING = FALSE, IGNORESTOP = FALSE)

`dat`	(n X m) data matrix of n samples of m long vectors.
`labl`	An optional vector of n labels to group curves. (see Details)
`grpCnt`	Number of clusters into which to split the data.
`iter`	Number of iterations for EM algorithm.
`SOFT`	Boolean flag for whether or not category probabilities should be returned.
`PRINTLK`	Boolean flag, which if TRUE, prints likelihood values for each iteration.
`LOADING`	Boolean flag, which if TRUE, prints a loading bar.
`IGNORESTOP`	Boolean flag, which if TRUE, will ignore early stopping conditions and cause the EM algorithm to run for the total amount of desired iterations.

This function clusters individual curves or sets of curves by considering the distance between their covariance operator and each estimated category covariance operator. The implemented algorithm reworks the concentration inequality based classification method classif.com into an EM style algorithm. This method iteratively updates the probability of a given observation belonging to each of the k categories. These probabilities are in turn used to update the category means. This process continues until either the total number of iterations is reached or a computed likelihood begins to decrease signaling the arrival of a local optimum.

If the argument labl is NULL, then every curve is clustered separately. If labl contains factors used to group the curves, then each set of curves is classified as one group. For example, if you have multiple speakers and multiple speech samples from each speaker, you can group the data from each speaker together in order to cluster based on each speakers' covariance operator rather than based on each speech sample individually.

If the flag SOFT is set to TRUE, then soft clustering occurs. In this case, given k different labels, a k-long probability vector is returned for each observation whose entries correspond to the probability that the observed function belongs to a specific label.

cluster.com returns a vector a labels with one entry for each row of data corresponding to one of the k categories ( or an array of probability vectors if SOFT=TRUE ).

Adam B Kashlak kashlak@ualberta.ca

Kashlak, Adam B, John A D Aston, and Richard Nickl (2016). "Inference on covariance operators via concentration inequalities: k-sample tests, classification, and clustering via Rademacher complexities", in review

## Not run: 
 # Load phoneme data 
 library(fds);
 # Setup data to be clustered
 dat  = rbind( t(aa$y[,1:20]),t(iy$y[,1:20]),t(sh$y[,1:20]) );
 # Cluster data into three groups
 clst = cluster.com(dat,grpCnt=3);
 matrix(clst,3,20,byrow=TRUE);
 
 # cluster groups of curves
 dat  = rbind( t(aa$y[,1:40]),t(iy$y[,1:40]),t(sh$y[,1:40]) );
 lab  = gl(30,4);
 # Cluster data into three groups
 clst = cluster.com(dat,labl=lab,grpCnt=3);
 matrix(clst,3,10,byrow=TRUE);

## End(Not run)