Description Usage Arguments Details Value Examples
The Distance
function calculates the distances between the data objects. The included
distance measures are euclidean for continuous data and the tanimoto coefficient or jaccard index for binary data.
1 2 3 |
Data |
A data matrix. It is assumed the rows are corresponding with the objects. |
distmeasure |
Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard","hamming","cont tanimoto","MCA_coord","gower","chi.squared" or "cosine" |
normalize |
Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in |
method |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL. |
The euclidean distance distance is included for continuous matrices while
for binary matrices, one has the choice of either the jaccard index, the
tanimoto coeffcient or the hamming distance. The hamming distance is obtained
by applying the hamming.distance
function of the e1071 package.
It will compute the hamming distance between the rows of the data matrix. The
hamming distance counts the number of times where two rows differ in their
zero and one values. The Jaccard index is calcaluted as determined by the
formula of the dist.binary
function in the a4 package and the
tanimoto coefficient as described by Li2011. For both, first the
similarity is calculated as
s=frac{n11}{n11+n10+n01}
with n11 the number of features the 2 objects have in common, n10 the number of features of the first compound and n01 the number of features of the second compound. These similarities are converted to distances by:
J=√{1-s}
for the jaccard index and by:
T=1-s
for the tanimoto coefficient. The lower the similarity values s are, the more features are shared between the two objects and the more alike they are. Since clustering is based on dissimilarity, the conversion to distances is performed. If normalize=TRUE and the distance meausure is euclidean, the data matrix is normalized beforehand. Further, a version of the tanimoto coefficient is also available for continuous data.
The returned value is a distance matrix.
1 2 | data(fingerprintMat)
Dist_F=Distance(fingerprintMat,distmeasure="tanimoto",normalize=FALSE,method=NULL)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.