SNF: Similarity network fusion

Description Usage Arguments Details Value References Examples

Description

Similarity Network Fusion (SNF) is a similarity-based multi-source clustering technique. SNF consists of two steps. In the initial step a similarity network is set up for each data matrix. The network is the visualization of the similarity matrix as a weighted graph with the objects as vertices and the pairwise similarities as weights on the edges. In the network-fusion step, each network is iteratively updated with information of the other network which results in more alike networks every time. This eventually converges to a single network.

Usage

1
2
3
4
SNF(List, type = c("data", "dist", "clusters"), distmeasure = c("tanimoto",
  "tanimoto"), normalize = c(FALSE, FALSE), method = c(NULL, NULL),
  StopRange = FALSE, NN = 20, mu = 0.5, T = 20, clust = "agnes",
  linkage = "ward", alpha = 0.625)

Arguments

List

A list of data matrices of the same type. It is assumed the rows are corresponding with the objects.

type

indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or "clusters".

distmeasure

A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").

normalize

Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in Normalization.

method

A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.

StopRange

Logical. Indicates whether the distance matrices with values not between zero and one should be standardized to have so. If FALSE the range normalization is performed. See Normalization. If TRUE, the distance matrices are not changed. This is recommended if different types of data are used such that these are comparable. Default is FALSE.

NN

The number of neighbours to be used in the procedure. Defaults to 20.

mu

The parameter epsilon. The value is recommended to be between 0.3 and 0.8. Defaults to 0.5.

T

The number of iterations.

clust

Choice of clustering function (character). Defaults to "agnes".

linkage

Choice of inter group dissimilarity (character) for the final clustering. Defaults to "ward".

alpha

The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"

Details

If r is specified and nrclusters is a fixed number, only a random sampling of the features will be performed for the t iterations (ADECa). If r is NULL and the nrclusters is a sequence, the clustering is performedon all features and the dendrogam is divided into clusters for the values of nrclusters (ADECb). If both r is specified and nrclusters is a sequence, the combination is performed (ADECc). After every iteration, either be random sampling, multiple divisions of the dendrogram or both, an incidence matrix is set up. All incidence matrices are summed and represent the distance matrix on which a final clustering is performed.

Value

The returned value is a list with the following three elements.

FusedM

The fused similarity matrix

DistM

The distance matrix computed by subtracting FusedM from one

Clust

The resulting clustering

The value has class 'SNF'.

References

\insertRef

Wang2014aIntClust

Examples

1
2
3
4
5
6
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)
MCF7_SNF=SNF(List=L,type="data",distmeasure=c("tanimoto","tanimoto"),normalize=
c(FALSE,FALSE),method=c(NULL,NULL),StopRange=FALSE,NN=10,mu=0.5,T=20,clust="agnes",
linkage="ward",alpha=0.625)

Example output



IntClust documentation built on May 2, 2019, 5:51 a.m.