cluster.sizestat: generate statistics on sizes of clusters
In girke-lab/ChemmineR: Cheminformatics Toolkit for R

cluster.sizestat

R Documentation

generate statistics on sizes of clusters

Description

'cluster.sizestat' is used to do simple statistics on sizes of clusters generated by 'cmp.cluster'. It will return a dataframe which maps a cluster size to the number of clusters with that size. It is often used along with 'cluster.visualize'.

Usage

cluster.sizestat(cls, cluster.result=1)

Arguments

`cls`	The clustering result returned by 'cmp.cluster'
`cluster.result`	If multiple cutoff values are used in clustering process, this argument tells which cutoff value is to be considered here.

Details

'cluster.sizestat' depends on the format that is returned by 'cmp.cluster' - it will treat the first column as the indecies, and the second column as the cluster sizes of effective clustering. Because of this, when multiple cutoffs are used when 'cmp.cluster' is called, 'cluster.sizestat' will only consider the clustering result of the first cutoff. If you want to work on an alternative cutoff, you have to manually reorder/remove columns.

Value

Returns a data frame of two columns.

`cluster size`	This column lists cluster sizes
`count`	This column lists number of clusters of a cluster size

Author(s)

Y. Eddie Cao

Examples

## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 

## Binning clustering using variable similarity cutoffs.
cluster <- cmp.cluster(db=apset, cutoff = c(0.65, 0.5))

## Statistics on sizes of clusters
cluster.sizestat(cluster[,c(1,2,3)])
cluster.sizestat(cluster[,c(1,4,5)])

girke-lab/ChemmineR documentation built on July 28, 2023, 10:36 a.m.