Description Usage Arguments Details Value Examples
The function CompareSilCluster
compares the results of two medoid
clusterings. The null hypothesis is that the clustering is identical. A test
statistic is calcluated and a p-value obtained with bootstrapping. See
"Details" for a more elaborate description.
1 2 3 4 |
List |
A list of data matrices. It is assumed the rows are corresponding with the objects. |
type |
indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or "clusters". |
distmeasure |
A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto"). |
normalize |
Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in |
method |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets. |
nrclusters |
The number of clusters to cut the dendrogram in. This is necessary for the computation of the Jaccard coefficient. Default is NULL. |
names |
The labels to give to the elements in List. Default is NULL. |
nboot |
Number of bootstraps to be run. Default is 100. |
plottype |
Should be one of "pdf","new" or "sweave". If "pdf", a location should be provided in "location" and the figure is saved there. If "new" a new graphic device is opened and if "sweave", the figure is made compatible to appear in a sweave or knitr document, i.e. no new device is opened and the plot appears in the current device or document. Default is "new". |
location |
If plottype is "pdf", a location should be provided in "location" and the figure is saved there. Default is NULL. |
For the data or distance matrices in List, medoid clustering with nrclusters
is set up by the pam
function of the cluster and the silhouette
widths are retrieved. These widths indicate how well an object fits in its
current cluster. Values around one indicate an appropriate cluster while
values around zero indicate that the object might as well lie in its
neighbouring cluster. The silhouette widths are than regressed in function
of the cluster membership of the objects. First the widths are modelled
according to the cluster membership of object these were derived from. Next,
these are modeled in function of the membership determined by the other
object. The regression function is fit by the lm
function and the
r.squared
value is retrieved. Ther.squared
value indicates how
much of the variance of the silhouette widths is explained by the
membership. Optimally this value is high.
Next, a statistic is determined. Suppose that RXX is the r.squared
retrieved from regressing the silhouette widths of object X versus the
corresponding cluster membership of object X and RXY the r.squared
retrieved from regressing the silhouette widths of object X versus the
cluster membership determined by object Y and vice versa. The statistic is
obtained as:
Stat=abs(∑{RXX}-∑{RXY})
The lower the statistical value, the better the clustering is explained by the sources. Via bootstrapping a p-value is obtained.
A plots are made of the density of the statistic under the null hypotheses. The p-value is also indicated on this plot. Further, a list with two elements is returned:
Observed Statistic |
The observed statistical value |
P-Value |
The P-value of the obtained statistic retrieved after bootstrapping |
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
data(fingerprintMat)
data(targetMat)
List=list(fingerprintMat,targetMat)
Comparison=CompareSilCluster(List=List,type="data",
distmeasure=c("tanimoto","tanimoto"),normalize=c(FALSE,FALSE),method=c(NULL,NULL),
nrclusters=7,names=NULL,nboot=100,plottype="new",location=NULL)
Comparison
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.