R/1002-clusterCMP.R
In BioMedR: Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

#' cluster compounds using a descriptor database
#'
#' 'clusterCMP' uses structural compound descriptors and clusters the compounds 
#' based on their pairwise distances. \code{clusterCMP} uses single linkage to 
#' measure distance between clusters when it merges clusters. It accepts both a 
#' single cutoff and a cutoff vector. By using a cutoff vector, it can generate 
#' results similar to hierarchical clustering after tree cutting.
#' 
#' \code{clusterCMP} will compute distances on the fly if \code{use.distances} 
#' is not set. Furthermore, if \code{save.distances} is not set, the distance 
#' values computed will never be stored and any distance between two compounds is 
#' guaranteed not to be computed twice. Using this method, \code{clusterCMP} can 
#' deal with large databases when a distance matrix in memory is not feasible. The 
#' speed of the clustering function should be slowed when using a transient distance 
#' calculation. 
#' When \code{save.distances} is set, \code{clusterCMP} will be forced to compute 
#' the distance matrix and save it in memory before the clustering. This is useful 
#' when additional clusterings are required in the future without re-computed the 
#' distance matrix. Set \code{save.distances} to TRUE if you only want to force the 
#' clustering to use this 2-step approach; otherwise, set it to the filename under 
#' which you want the distance matrix to be saved. After you save it, when you need 
#' to reuse the distance matrix, you can 'load' it, and supply it to \code{clusterCMP} 
#' via the \code{use.distances} argument.
#' \code{clusterCMP} supports a vector of several cutoffs. When you have multiple 
#' cutoffs, \code{clusterCMP} still guarantees that pairwise distances will never be
#' recomputed, and no copy of distances is kept in memory. It is guaranteed to be as 
#' fast as calling \code{clusterCMP} with a single cutoff that results in the longest 
#' processing time, plus some small overhead linear in processing time.
#' 
#' @param db  The desciptor database
#' 
#' @param cutoff The clustering cutoff. Can be a single value or a vector. The cutoff 
#'               gives the maximum distance between two compounds in order to group 
#'               them in the same cluster.
#' 
#' @param is.similarity Set when the cutoff supplied is a similarity cutoff. This cutoff 
#'                      is the minimum similarity value between two compounds such that 
#'                      they will be grouped in the same cluster.
#'
#' @param save.distances whether to save distance for future clustering. See details below.
#'                
#' @param use.distances Supply pre-computed distance matrix.
#' 
#' @param quiet Whether to suppress the progress information.
#' 
#' @param ... Further arguments to be passed to similarity.
#'                                                                                                                                                                                                            
#' @return 
#' Returns a \code{data.frame}. Besides a variable giving compound ID, each of the other 
#' variables in the data frame will either give the cluster IDs of compounds under some 
#' clustering cutoff, or the size of clusters that the compounds belong to. When N cutoffs 
#' are given, in total 2*N+1 variables will be generated, with N of them giving the cluster 
#' ID of each compound under each of the N cutoffs, and the other N of them giving the 
#' cluster size under each of the N cutoffs. The rows are sorted by cluster sizes.
#' 
#' @keywords cluster compounds clusterCMP
#'
#' @aliases clusterCMP
#' 
#' @author Min-feng Zhu <\email{wind2zhu@@163.com}>
#' 
#' @seealso See \code{\link{clusterStat}} for generate statistics on sizes of clusters. 
#' 
#' @export clusterCMP
#' 
#' @references
#' ...
#' 
#' @examples
#' data(sdfbcl) 
#' apbcl <- convSDFtoAP(sdfbcl)
#' fpbcl <- convAPtoFP(apbcl)
#' clusters <- clusterCMP(db = apbcl, cutoff = c(0.5, 0.85))
#' clusters2 <- clusterCMP(fpbcl, cutoff = c(0.5, 0.7), method = "Tversky") 
#' # clusters <- clusterCMP(apbcl, cutoff = 0.65, save.distances = "distmat.rda")
#' # load("distmat.rda")
#' # clusters <- clusterCMP(apbcl, cutoff = 0.60, use.distances = distmat)
#' 
clusterCMP <- function (db, cutoff, is.similarity = TRUE, save.distances = FALSE,
                        use.distances = NULL, quiet = FALSE, ...) {
  
  cluster_id = ChemmineR::cmp.cluster(db = db, cutoff = cutoff, is.similarity = is.similarity, 
                                        save.distances = save.distances, use.distances = use.distances,
                                      quiet = quiet, ...)
  return(cluster_id)
}

Any scripts or data that you put into this service are public.

BioMedR documentation built on July 5, 2019, 9:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BioMedR
Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

R/1002-clusterCMP.R
In BioMedR: Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

Try the BioMedR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BioMedR Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

R/1002-clusterCMP.R In BioMedR: Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

Try the BioMedR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BioMedR
Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions

R/1002-clusterCMP.R
In BioMedR: Generating Various Molecular Representations for Chemicals, Proteins, DNAs, RNAs and Their Interactions