distancematrix: functions to compute pair wise distances between vectors

Description Usage Arguments Details Value Warning Author(s) References See Also Examples

View source: R/distance.R

Description

The function distancematrix is applied to a matrix of data to compute the pair wise distances between all rows of the matrix. In hopach versions >= 2.0.0 these distance functions are calculated in C, rather than R, to improve run time performance. function distancevector is applied to a matrix and a vector to compute the pair wise distances between each row of the matrix and the vector. Both functions allow different choices of distance metric. The functions dissmatrix and dissvector allow one to convert between a distance matrix and a vector of the upper triangle. The function vectmatrix is used internally.

Usage

1
2
3
4
5
6
7
8
9
distancematrix(X, d, na.rm=TRUE)

distancevector(X, y, d, na.rm=TRUE)

dissmatrix(v)

dissvector(M)

vectmatrix(index, p)

Arguments

X

a numeric matrix. Missing values will be ignored if na.rm=TRUE.

y

a numeric vector, possibly a row of X. Missing values will be ignoredif na.rm=TRUE.

na.rm

an indicator of whether or not to remove missing values. If na.rm=TRUE (default), then distances are computed over all pairwise non-missing values. Else missing values are propagated through the distance computation.

d

character string specifying the metric to be used for calculating dissimilarities between vectors. The currently available options are "cosangle" (cosine angle or uncentered correlation distance), "abscosangle" (absolute cosine angle or absolute uncentered correlation distance), "euclid" (Euclidean distance), "abseuclid" (absolute Euclidean distance), "cor" (correlation distance), and "abscor" (absolute correlation distance). Advanced users can write their own distance functions and add these.

M

a symmetric matrix of pair wise distances.

v

a vector of pair wise distances corresponding to the upper triangle of a distance matrix, stored by rows.

index

index in a distance vector, like that returned by dissvector.

p

number of elements, e.g. the number of rows in a distance matrix.

Details

In hopach versions <2.0.0, these functions returned the square root of the usual distance for d="cosangle", d="abscosangle", d="cor", and d="abscor". Typically, this transformation makes the dissimilarity correspond more closely with the norm. In order to agree with the dist function, the square root is no longer used in versions >=2.0.0.

Value

For versions >= 2.0.0 distancematrix, a hdist object of of all pair wise distances between the rows of the data matrix 'X', i.e. the value of hdist[i,j] is the distance between rows 'i' and 'j' of 'X', as defined by 'd'. A hdist object is an S4 class containing four slots:

Data

representing the lower triangle of the symmetric distance matrix.

Size

the number of objects (i.e. rows of the data matrix).

Labels

labels for the objects, usually the numbers 1 to Size.

Call

the distance used in the call to distancematrix.

A hdist object and can be converted to a matrix using as.matrix(hdist). (See hdist for more details.)

For distancevector, a vector of all pair wise distances between rows of 'X' and the vector 'y'. Entry 'j' is the distance between row 'j' of 'X' and the vector 'y'.

For distancevector, a vector of all pair wise distances between rows of 'X' and the vector 'y'. Entry 'j' is the distance between row 'j' of 'X' and the vector 'y'.

For dissmatrix, the corresponding distance vector. For dissvector, the corresponding distance matrix. If 'M' has 'p' rows (and columns), then 'v' is length 'p*(p-1)/2'.

For vectmatrix, the indices of the row and column of a distance matrix corresponding to entry index in the corresponding distance vector.

Warning

The correlation and absolute correlation distance functions call the cor function, and will therefore fail if there are missing values in the data and na.rm!=TRUE.

Author(s)

Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van der Laan <laan@stat.berkeley.edu>, with Greg Walll

References

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf

See Also

hopach, correlationordering, disscosangle

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
mydata<-matrix(rnorm(50),nrow=10)
deuclid<-distancematrix(mydata,d="euclid")
# old method vdeuclid<-dissvector(deuclid)
vdeuclid<-deuclid@Data
ddaisy<-daisy(mydata)
vdeuclid
ddaisy/sqrt(length(mydata[1,]))

d1<-distancematrix(mydata,d="abscosangle")
d2<-distancevector(mydata,mydata[1,],d="abscosangle")
d1[1,]
d2 #equal to d1[1,]

# old method d3<-dissvector(d1)
d3<-d1@Data
pair<-vectmatrix(5,10)
d1[pair[1],pair[2]]
d3[5]

Example output

Loading required package: cluster
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

 [1] 3.281778 2.471027 1.870812 3.743960 3.795807 1.871671 3.656548 3.194100
 [9] 2.756439 3.811809 1.674290 1.956087 3.466094 2.440560 3.864163 4.152205
[17] 1.431966 2.989327 4.363995 3.959219 2.425968 2.553072 2.966639 2.689146
[25] 2.552091 3.155350 1.424373 3.894268 3.928766 1.871861 2.920726 3.499385
[33] 4.309775 4.711746 2.586528 3.034624 5.214604 5.566075 3.651737 3.874957
[41] 3.824417 2.121169 1.536850 2.508481 2.881000
Dissimilarities :
           1         2         3         4         5         6         7
2  1.4676557                                                            
3  1.1050767 1.7046930                                                  
4  0.8366524 0.7487651 1.3368677                                        
5  1.6743500 0.8747886 1.9516380 1.1413297                              
6  1.6975365 1.5500842 1.7706164 1.4111153 1.3061882                    
7  0.8370366 1.0914516 1.0849260 0.6369988 1.5649727 1.3571251          
8  1.6352578 1.7281061 1.1417685 1.7415697 1.9273902 2.3320419 1.7329336
9  1.4284449 1.8569227 1.3267215 1.7569975 2.1071568 2.4892246 1.7103314
10 1.2327169 0.6403946 1.2026227 0.8371219 1.1567305 1.6331064 0.9486156
           8         9
2                     
3                     
4                     
5                     
6                     
7                     
8                     
9  0.6873002          
10 1.1218267 1.2884224

Metric :  euclidean 
Number of objects : 10
  1         2         3         4         5         6         7         8
1 0 0.5835462 0.6004769 0.4402926 0.8573037 0.9249425 0.4363926 0.9352266
          9       10
1 0.5804999 0.727046
 [1] 0.0000000 0.7639020 0.7749044 0.6635455 0.9259070 0.9617393 0.6606001
 [8] 0.9670711 0.7619054 0.8526699
          6
1 0.9249425
[1] 0.9249425

hopach documentation built on Nov. 8, 2020, 4:54 p.m.