Description Usage Arguments Details Value Author(s) References Examples
cluster.com
clusters sets of functional data via their
covariance operators making use of an EM style algorithm with
concentration inequalities.
1 2 |
dat |
(n X m) data matrix of n samples of m long vectors. |
labl |
An optional vector of n labels to group curves. (see Details) |
grpCnt |
Number of clusters into which to split the data. |
iter |
Number of iterations for EM algorithm. |
SOFT |
Boolean flag for whether or not category probabilities should be returned. |
PRINTLK |
Boolean flag, which if TRUE, prints likelihood values for each iteration. |
LOADING |
Boolean flag, which if TRUE, prints a loading bar. |
IGNORESTOP |
Boolean flag, which if TRUE, will ignore early stopping conditions and cause the EM algorithm to run for the total amount of desired iterations. |
This function clusters individual curves or sets of curves
by considering the distance between their covariance operator
and each estimated category covariance operator. The implemented
algorithm reworks the concentration inequality based
classification method classif.com
into an EM style algorithm.
This method iteratively updates the probability of a given observation
belonging to each of the k categories. These probabilities are
in turn used to update the category means. This process continues
until either the total number of iterations is reached or a
computed likelihood begins to decrease signaling the arrival of
a local optimum.
If the argument labl is NULL, then every curve is clustered separately. If labl contains factors used to group the curves, then each set of curves is classified as one group. For example, if you have multiple speakers and multiple speech samples from each speaker, you can group the data from each speaker together in order to cluster based on each speakers' covariance operator rather than based on each speech sample individually.
If the flag SOFT is set to TRUE, then soft clustering occurs. In this case, given k different labels, a k-long probability vector is returned for each observation whose entries correspond to the probability that the observed function belongs to a specific label.
cluster.com
returns a vector a labels with one entry for
each row of data corresponding to one of the k categories
( or an array of probability vectors if SOFT=TRUE ).
Adam B Kashlak kashlak@ualberta.ca
Kashlak, Adam B, John A D Aston, and Richard Nickl (2016). "Inference on covariance operators via concentration inequalities: k-sample tests, classification, and clustering via Rademacher complexities", in review
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
# Load phoneme data
library(fds);
# Setup data to be clustered
dat = rbind( t(aa$y[,1:20]),t(iy$y[,1:20]),t(sh$y[,1:20]) );
# Cluster data into three groups
clst = cluster.com(dat,grpCnt=3);
matrix(clst,3,20,byrow=TRUE);
# cluster groups of curves
dat = rbind( t(aa$y[,1:40]),t(iy$y[,1:40]),t(sh$y[,1:40]) );
lab = gl(30,4);
# Cluster data into three groups
clst = cluster.com(dat,labl=lab,grpCnt=3);
matrix(clst,3,10,byrow=TRUE);
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.