# 60_distance_matrices: Distance Matrices In probhat: Multivariate Generalized Kernel Smoothing and Related Statistical Methods

## Description

Compute the distance (or dissimilarity) between pairs of density functions.

The main function, pdist maps a list (of density functions) to a matrix.
The rdist function, converts the above matrix to a two-column sorted data.frame.

Note that these methods are relatively new.
I can not give any guarantee of optimality.
(This applies to the whole package, but in particular to these functions).

Also, the distance values should be not be interpreted, other than for ranking purposes.

## Usage

 ```1 2 3 4 5 6 7 8``` ```psv (sf) pdist (sf, ..., sqrt.mse=TRUE) ph4.rdist (d, n) ph4.pcomp2 (fh, gh, sqrt.mse=TRUE, aggregate=TRUE, dfh = psv (fh), dgh = psv (gh) ) ```

## Arguments

 `fh, gh` A density function. Refer to pdfuv.cks and pdfmv.cks. `sf` In psv, a density function. In pdist, a list of density functions. Optionally, they can be produced by ph4.pdfuv.gset.cks and ph4.pdfmv.gset.cks. `aggregate` If true (the default), return the average of the two one-sided distances. `sqrt.mse` If true (the default), the square root of the MSE is used, otherwise, the MSE is used. `dfh, dgh` Numeric vectors, of self-evaluated densities, see details. Note that no validation is done on these arguments. `d` A distance matrix, as returned by pdist. `n` Integer, the closest n pairs. If missing, all are returned. `...` Ignored.

## Details

Here, self-evaluated density values are computed by evaluating a density function at it's own data.
(In contrast to arbitrary evaluation points).

And cross-evaluated densities are values are computed by evaluating a density function at another density function's data.

The psv function computes self-evaluated densities.
The ph.pcomp2 function computes one (or two) distances.
With a single distance between the average of two one-sided distances.

If dfh-xf is the self-evaluated density of fh, and dgh-xf is the density of gh evaluated at fh's data:
(i.e. Two density functions are evaluated at the same points, which are the data points of the first density function).

Then a one-sided distance can be computed as the mean squared error (MSE), or it's square root.
Where the MSE is the sum of the squared differences between dfh-xf and dgh-xf, over n.

The second one-sided distance is the same, except that f and g are reversed.

## Value

psv returns a numeric vector.

pdist returns a numeric square matrix.

By default, ph4.pcomp2 returns a single value.
(If aggregate is false, then it returns a pair of values).

## References

Refer to the vignette for an overview, references and better examples.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```prep.ph.data () gs1a <- ph4.pdfuv.gset.cks (species, cbind (sepal.length) ) gs1b <- ph4.pdfuv.gset.cks (species, cbind (sepal.width) ) gs2 <- ph4.pdfmv.gset.cks (species, cbind (sepal.length, sepal.width) ) d1a <- pdist (gs1a) d1b <- pdist (gs1b) d2 <- pdist (gs2) #print out distance matrix #(for bivariate models) d2 #print out distances, ranked/sorted ph4.rdist (d1a) ph4.rdist (d1b) ph4.rdist (d2) ```