get_auc_similarity_scores: Evaluate the consensus between sets of clusterings

Description Usage Arguments Details Value See Also Examples

View source: R/cluster_validation.R

Description

Methods for evaluating the consensus between sets of clusterings, usually in the context of subsetting of the data or different numbers of clusters.

Usage

1
2
3
4
5

Arguments

labels

a list. Each element of the list is a matrix that gives the results of a clustering routine in each column (see consensus_matrix). Usually each column would be the result of running the clustering on a subsample or bootstrap resample of the data.

method

method for calculation of similarity for the AUC measure, one of "consensus" or "nmi". See details.

colors

a vector of colors, of length equal to the length of labels

Details

For each element of the list labels, plot_cdf_consensus calculates the consensus between the clusterings in the matrix, i.e. the number of times that pairs of rows are in the same cluster for different clusterings (columns) of the matrix using the consensus_matrix function. Then the set of values (the N(N-1) values in the upper triangle of the matrix), are converted into a cdf function and plotted.

For each set of clusterings given by labels (i.e. for each matrix M which is an element of the list labels) get_auc_similarity_scores calculates a pairwise measure of similarity between the columns of M. These pairwise scores are plotted against their rank, and the final AUC measure is the area under this curve.

For method "consensus", the pairwise measure is given by calculating the consensus matrix using consensus_matrix with scale=FALSE. The consensus matrix is divided by the max of M.

For method "nmi", the pairwise value is the NMI value between each pair of columns of the matrix of clusterings using the NMI function.

Value

plot_cdf_consensus invisibily returns list of the upper triangle values, with the list of same length as that of labels.

get_auc_similarity_scores returns a vector, equal to length of the list labels, giving the AUC value for each element of labels.

This function is a plotting function does not return anything

See Also

consensus_matrix, NMI, plot_cdf_consensus

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data(exampleData)
moanin <- create_moanin_model(data=testData,meta=testMeta)
#small function to run splines_kmeans on subsample of 50 genes
subsampleCluster<-function(){
   ind<-sample(1:nrow(moanin),size=50)
   km<-splines_kmeans(moanin[ind,],n_clusters=3)
   assign<-splines_kmeans_score_and_label(moanin, km, 
       proportion_genes_to_label=1.0)$label
}
kmClusters1=replicate(10,subsampleCluster())
kmClusters2=replicate(10,subsampleCluster())
# Note, because of the small number of replicates (10), 
# these plots are not representative of what to expect.
out<-plot_cdf_consensus(labels=list(kmClusters1,kmClusters2)) 
get_auc_similarity_scores(list(kmClusters1,kmClusters2))
plot_model_explorer(list(kmClusters1,kmClusters2))

NelleV/moanin documentation built on July 28, 2021, 7:34 p.m.