Description Usage Arguments Details Value Examples
Performs a condition comparison on a given clustering
. The comparison
is performed on each cluster separately between each condcition
(cond
). Several statistics are used and, when analysed in conjunction,
they might give some insight regarding the heterogeneity of some of the
clusters.
1 |
clustering |
A clustering of the data. |
cond |
A factor indicating the condition which each data point is subject to. |
dmatrix |
A distance matrix describing the data to be analysed. |
n |
The number of random silhouettes to be performed. Keep in mind that the computation of several random silhouettes is the bottleneck of this process. |
remove.na |
Logical. Remove lines with NA (i.e. clusters which the silhouette could not be computed). |
For a given cluster, several metrics are computed, see the 'Return' section
for details about each metric. Some metrics make use of Random Silhouettes,
which is defined as follows: given a labeled data set, assign a random label
(from the set of labels) to each data point without changing the original
ratio of groups. Then compute the silhouette
index for this
data considering these randomly assigned labels, the average silhouette width
is the Random Silhouette for the data (with randomly assigned labels). Being
an stochastic process, the Monte Carlo approach is applied which gives a
vector of several Random Silhouettes.
A data frame with various statistics regarding data heterogeneity inside each cluster.
Each row of the data frame contains several metrics regarding the conditions found in an specific cluster. For now only a maximum of two conditions are supported. These metrics are described below:
Numeric. The percentage of data points belonging to condition 'x'.
Numeric. The ratio of data points belonging to condition 'x'. For example, considering another condition 'y', the 'x_ratio' would be computed as x_perc / y_perc.
Numeric. True silhouette. The silhouette for the data in
this cluster considering the conditions, as defined by the parameter
cond
, as groups.
Numeric. The Z-score computed based on the
silhouette
. See the 'Details' section.
Numeric. The p-value for 'true_sil'. Computed from the number of Random Silhouettes (see 'Details') that are greater than the 'true_sil' for this cluster.
Factor. Interquartile Range (IQR) based outlier detection. Considering the vector including the random silhouettes (see 'Details') and the 'true_sil', the method checks whether 'true_sil' is an outlier in said vector. This will be set to 'Diff' in case 'true_sil' is an outlier or 'Same' otherwise.
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.