Description Usage Arguments Details Value Author(s) References See Also Examples
Given a matrix X
, these functions compute the pair-wise distances
between all variables (rows) in X
, across all observations
(columns) of X
. Each function uses a different distance metric,
i.e. definition of what it means for two variables to be similar. In
hoapch version >=2.0.0, these functions return an object of class hdist
rather than a matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | disscosangle(X, na.rm = TRUE)
disseuclid(X, na.rm = TRUE)
disscor(X, na.rm = TRUE)
dissabscosangle(X, na.rm = TRUE)
dissabscor(X, na.rm = TRUE)
vdisscosangle(X, y, na.rm = TRUE)
vdisseuclid(X, y, na.rm = TRUE)
vdisscor(X, y, na.rm = TRUE)
vdissabscosangle(X, y, na.rm = TRUE)
vdissabseuclid(X, y, na.rm = TRUE)
vdissabscor(X, y, na.rm = TRUE)
|
X |
A numeric data matrix. Each column corresponds to an observation, and each row corresponds to a variable. In the gene expression context, observations are arrays and variables are genes. All values must be numeric. Missing values are ignored. |
na.rm |
Indicator of whether to remove missing values (i.e. only compute distance over non-missing observations). |
y |
A numeric data vector of length |
Different choices of distance metric are discussed in the references.
Briefly, Euclidean distance (disseuclid
) defines two variables
to be close if they are similar in magnitude across observations.
Correlation distance (disscor
), in contrast, defines similarity
to mean having the same pattern, but not necessarily the same magnitude.
Cosine-angle (disscosangle
) distance is a correlation distance
that also accounts for magnitude. Cosine-angle distance is also known as
uncentered correlation distance. The distance metrics with 'abs' in
their names are absolute versions of each metric; the absolute value is
applied to the data before computing the distance.
In hopach versions <2.0.0, these functions returned the square root of
the usual distance for d="cosangle"
, d="abscosangle"
,
d="cor"
, and d="abscor"
. Typically, this transformation
makes the dissimilarity correspond more closely with the norm. In order
to agree with the dist
function, the square root is no longer
used in versions >=2.0.0.
For versions >= 2.0.0 distancematrix
, a hdist
object of of all pair wise distances between the rows of the data matrix 'X',
i.e. the value of hdist[i,j]
is the distance between rows 'i' and 'j'
of 'X', as defined by 'd'. A hdist
object is an S4 class containing
four slots:
Data |
representing the lower triangle of the symmetric distance matrix. |
Size |
the number of objects (i.e. rows of the data matrix). |
Labels |
labels for the objects, usually the numbers 1 to Size. |
Call |
the distance used in the call to
|
A hdist object and can be converted to a matrix using as.matrix(hdist)
.
(See hdist
for more details.)
For the vector versions (e.g. vdisscosangle
), a numeric vector of
nrow(X)
pair-wise distances between each variable (row) in
X
and the vector y
.
Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van der Laan <laan@stat.berkeley.edu>, with Greg Wall
van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf
http://www.bepress.com/ucbbiostat/paper107/
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/jsmpaper.pdf
1 2 | data<-matrix(rnorm(50),nr=5)
disscosangle(data)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.