M_ABC: Multi-source ABC clustering
In IntClust: Integration of Multiple Data Sets with Clustering Techniques

Description Usage Arguments Value References Examples

The Aggregating Bundles of Clusters (ABC, \insertCiteAmaratunga2008IntClust) was originally developed for a single gene expression data. We extend this method to incorporate multiple data sets of any source. Multi-Source ABC (M-ABC) is an iterative algorithm in which for each iteration a random sample of objects and features is taken of each data set. A clustering algorithm is run on each subset and an incidence matrix $C$ is set up by dividing the resulting dendrogram in $k$ clusters. After $r$ iterations, all incidence matrices are summed and divided by number of times two objects were selected simultaneously. This similarity value is transformed into a dissimilarity measure expressing the number of times the objects are not clustered together when both are selected. The obtained matrix is used a input into a clustering algorithm.

M_ABC(List, transpose = TRUE, distmeasure = c("tanimoto", "tanimoto"),
  weighting = c(FALSE, FALSE), stat = "var", normalize = c(FALSE, FALSE),
  method = c(NULL, NULL), gr = c(), bag = TRUE, numsim = 1000,
  numvar = c(100, 100), linkage = c("flexible", "flexible"),
  alpha = 0.625, NC = NULL, NC2 = NULL, mds = FALSE)

`List`	A list of data matrices. It is assumed the rows are corresponding with the objects.
`transpose`	Logical, whether the data should be transposed to have the ABC orginal format of rows being the variables and columns the samples. Defaults to TRUE.
`distmeasure`	A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").
`weighting`	Logical value indicating whether the rows should be weighted in the resampling. Default is c(FALSE,FALSE) for two data sets.
`stat`	The statistic to be used in weighing the rows. Currently the F-statistic, Coefficient of Variation, Double Bump statistic, and Variance are allowed. The corresponding inputs for these should be "F", "cv", "db", and "var".If the rows are to be weighed equally, any other string will do.
`normalize`	Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in `Normalization`
`method`	A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.
`gr`	A prespecified grouping of the samples to be used in calculating the F-statistic if stat="F".
`bag`	Logical, indicating whether the columns should be bagged in each iteration. Defaults to TRUE.
`numsim`	The number of iterations to be used in the ABC Algorithm. Defaults to 1000.
`numvar`	The number of featurus to be used at each iteration to calculate the temporary clusters in the ABC Algorithm. Default is c(100,100) for two data sets.
`linkage`	Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets.
`alpha`	The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"
`NC`	Expected number of clusters in the data; passed to Wards Method in each iteration.
`NC2`	Expected number of clusters in the data; passed to Wards Method in the final calculation of the clusters. By default set to NC. If NC2="syl", a silhouette will be used to determine the most likely number of clusters.
`mds`	Logical, indicating whether the dissimilarities calculated in the ABC Algorithm should be plotted using Multi Dimensional Scaling. Defaults to FALSE

The returned value is a list of two elements:

`DistM`	The resulting distance matrix matrix
`Clust`	The resulting clustering

The value has class 'Ensemble'.

\insertRef

Amaratunga2008IntClust

data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_MABC=M_ABC(List=L,transpose=TRUE,distmeasure=c("tanimoto", "tanimoto"),
weighting=c(FALSE,FALSE),stat="var",normalize=c(FALSE,FALSE),method=c(NULL,NULL),
gr=c(),bag=TRUE, numsim=1000,numvar=c(100,100),linkage=c("flexible","flexible"),
alpha=0.625,NC=7, NC2=NULL, mds=FALSE)