M_ABC: Multi-source ABC clustering

Description Usage Arguments Value References Examples

Description

The Aggregating Bundles of Clusters (ABC, \insertCiteAmaratunga2008IntClust) was originally developed for a single gene expression data. We extend this method to incorporate multiple data sets of any source. Multi-Source ABC (M-ABC) is an iterative algorithm in which for each iteration a random sample of objects and features is taken of each data set. A clustering algorithm is run on each subset and an incidence matrix $C$ is set up by dividing the resulting dendrogram in $k$ clusters. After $r$ iterations, all incidence matrices are summed and divided by number of times two objects were selected simultaneously. This similarity value is transformed into a dissimilarity measure expressing the number of times the objects are not clustered together when both are selected. The obtained matrix is used a input into a clustering algorithm.

Usage

1
2
3
4
5
M_ABC(List, transpose = TRUE, distmeasure = c("tanimoto", "tanimoto"),
  weighting = c(FALSE, FALSE), stat = "var", normalize = c(FALSE, FALSE),
  method = c(NULL, NULL), gr = c(), bag = TRUE, numsim = 1000,
  numvar = c(100, 100), linkage = c("flexible", "flexible"),
  alpha = 0.625, NC = NULL, NC2 = NULL, mds = FALSE)

Arguments

List

A list of data matrices. It is assumed the rows are corresponding with the objects.

transpose

Logical, whether the data should be transposed to have the ABC orginal format of rows being the variables and columns the samples. Defaults to TRUE.

distmeasure

A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").

weighting

Logical value indicating whether the rows should be weighted in the resampling. Default is c(FALSE,FALSE) for two data sets.

stat

The statistic to be used in weighing the rows. Currently the F-statistic, Coefficient of Variation, Double Bump statistic, and Variance are allowed. The corresponding inputs for these should be "F", "cv", "db", and "var".If the rows are to be weighed equally, any other string will do.

normalize

Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in Normalization

method

A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.

gr

A prespecified grouping of the samples to be used in calculating the F-statistic if stat="F".

bag

Logical, indicating whether the columns should be bagged in each iteration. Defaults to TRUE.

numsim

The number of iterations to be used in the ABC Algorithm. Defaults to 1000.

numvar

The number of featurus to be used at each iteration to calculate the temporary clusters in the ABC Algorithm. Default is c(100,100) for two data sets.

linkage

Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets.

alpha

The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"

NC

Expected number of clusters in the data; passed to Wards Method in each iteration.

NC2

Expected number of clusters in the data; passed to Wards Method in the final calculation of the clusters. By default set to NC. If NC2="syl", a silhouette will be used to determine the most likely number of clusters.

mds

Logical, indicating whether the dissimilarities calculated in the ABC Algorithm should be plotted using Multi Dimensional Scaling. Defaults to FALSE

Value

The returned value is a list of two elements:

DistM

The resulting distance matrix matrix

Clust

The resulting clustering

The value has class 'Ensemble'.

References

\insertRef

Amaratunga2008IntClust

Examples

1
2
3
4
5
6
7
8
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_MABC=M_ABC(List=L,transpose=TRUE,distmeasure=c("tanimoto", "tanimoto"),
weighting=c(FALSE,FALSE),stat="var",normalize=c(FALSE,FALSE),method=c(NULL,NULL),
gr=c(),bag=TRUE, numsim=1000,numvar=c(100,100),linkage=c("flexible","flexible"),
alpha=0.625,NC=7, NC2=NULL, mds=FALSE)

IntClust documentation built on May 2, 2019, 5:51 a.m.