ABC.SingleInMultiple: Single-source ABC clustering

Description Usage Arguments Value References Examples

Description

The Aggregating Bundles of Clusters (ABC, \insertCiteAmaratunga2008IntClust) was originally developed for a single gene expression data. ABC is an iterative algorithm in which for each iteration a random sample of objects and features is taken of each data set. A clustering algorithm is run on each subset and an incidence matrix $C$ is set up by dividing the resulting dendrogram in $k$ clusters. After $r$ iterations, all incidence matrices are summed and divided by number of times two objects were selected simultaneously. This similarity value is transformed into a dissimilarity measure expressing the number of times the objects are not clustered together when both are selected. The obtained matrix is used a input into a clustering algorithm.

Usage

1
2
3
4
ABC.SingleInMultiple(data, transpose = TRUE, distmeasure = "euclidean",
  weighting = FALSE, stat = "var", normalize = FALSE, method = NULL,
  gr = c(), bag = TRUE, numsim = 1000, numvar = 100, linkage = "ward",
  alpha = 0.625, NC = NULL, NC2 = NULL, mds = FALSE)

Arguments

data

A data matrix. It is assumed the rows are corresponding with the objects.

transpose

Logical, whether the data should be transposed to have the ABC orginal format of rows being the variables and columns the samples. Defaults to TRUE.

distmeasure

The distance measurs to be used for the data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to "euclidean".

weighting

Logical value indicating whether the rows should be weighted in the resampling.

stat

The statistic to be used in weighing the rows. Currently the Coefficient of Variation and Variance are allowed. The corresponding inputs for these should be, "cv" and "var". If the rows are to be weighed equally, any other string will do.

normalize

Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in Normalization

method

A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL.

gr

A prespecified grouping of the samples to be used in calculating the F-statistic if stat="F".

bag

Logical, indicating whether the columns should be bagged in each iteration. Defaults to TRUE.

numsim

The number of iterations to be used in the ABC Algorithm. Default is 1000.

numvar

The number of featurus to be used at each iteration to calculate the temporary clusters in the ABC Algorithm.

linkage

Choice of inter group dissimilarity (character). Defaults to "ward".

alpha

The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"

NC

Expected number of clusters in the data; passed to Wards Method in each iteration. Default is NULL.

NC2

Expected number of clusters in the data; passed to Wards Method in the final calculation of the clusters. By default set to NULL such that NC2=NC. If NC2="syl", a silhouette will be used to determine the most likely number of clusters.

mds

Logical, indicating whether the dissimilarities calculated in the ABC Algorithm should be plotted using Multi Dimensional Scaling. Defaults to FALSE.

Value

The returned value is a list of two elements:

DistM

The resulting distance matrix matrix

Clust

The resulting clustering

The value has class 'Ensemble'.

References

\insertRef

Amaratunga2008IntClust

Examples

1
2
3
4
5
6
7
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_ABC=ABC.SingleInMultiple(data=fingerprintMat,transpose=TRUE,distmeasure="tanimoto",
weighting=TRUE,stat="var", gr=c(),bag=TRUE, numsim=100,numvar=100,linkage="flexible",
alpha=0.625,NC=7, NC2=NULL, mds=FALSE)

IntClust documentation built on May 2, 2019, 5:51 a.m.