# l1_fit_ultrametric: Least Absolute Deviation Fit of Ultrametrics to... In clue: Cluster Ensembles

 l1_fit_ultrametric R Documentation

## Least Absolute Deviation Fit of Ultrametrics to Dissimilarities

### Description

Find the ultrametric with minimal absolute distance (Manhattan dissimilarity) to a given dissimilarity object.

### Usage

```l1_fit_ultrametric(x, method = c("SUMT", "IRIP"), weights = 1,
control = list())
```

### Arguments

 `x` a dissimilarity object inheriting from or coercible to class `"dist"`. `method` a character string indicating the fitting method to be employed. Must be one of `"SUMT"` (default) or `"IRIP"`, or a unique abbreviation thereof. `weights` a numeric vector or matrix with non-negative weights for obtaining a weighted least squares fit. If a matrix, its numbers of rows and columns must be the same as the number of objects in `x`, and the lower diagonal part is used. Otherwise, it is recycled to the number of elements in `x`. `control` a list of control parameters. See Details.

### Details

The problem to be solved is minimizing

L(u) = ∑_{i,j} w_{ij} |x_{ij} - u_{ij}|

over all u satisfying the ultrametric constraints (i.e., for all i, j, k, u_{ij} ≤ \max(u_{ik}, u_{jk})). This problem is known to be NP hard (Krivanek and Moravek, 1986).

We provide two heuristics for solving this problem.

Method `"SUMT"` implements a SUMT (Sequential Unconstrained Minimization Technique, see `sumt`) approach using the sign function for the gradients of the absolute value function.

Available control parameters are `method`, `control`, `eps`, `q`, and `verbose`, which have the same roles as for `sumt`, and the following.

`nruns`

an integer giving the number of runs to be performed. Defaults to 1.

`start`

a single dissimilarity, or a list of dissimilarities to be employed as starting values.

Method `"IRIP"` implements a variant of the Iteratively Reweighted Iterative Projection approach of Smith (2001), which attempts to solve the L_1 problem via a sequence of weighted L_2 problems, determining u(t+1) by minimizing the criterion function

∑_{i,j} w_{ij} (x_{ij} - u_{ij})^2 / \max(|x_{ij} - u_{ij}(t)|, m)

with m a “small” non-zero value to avoid zero divisors. We use the SUMT method of `ls_fit_ultrametric` for solving the weighted least squares problems.

Available control parameters are as follows.

`maxiter`

an integer giving the maximal number of iteration steps to be performed. Defaults to 100.

`eps`

a nonnegative number controlling the iteration, which stops when the maximal change in u is less than `eps`. Defaults to 10^{-6}.

`reltol`

the relative convergence tolerance. Iteration stops when the relative change in the L_1 criterion is less than `reltol`. Defaults to 10^{-6}.

`MIN`

the cutoff m. Defaults to 10^{-3}.

`start`

a dissimilarity object to be used as the starting value for u.

`control`

a list of control parameters to be used by the method of `ls_fit_ultrametric` employed for solving the weighted L_2 problems.

One may need to adjust the default control parameters to achieve convergence.

It should be noted that all methods are heuristics which can not be guaranteed to find the global minimum.

### Value

An object of class `"cl_ultrametric"` containing the fitted ultrametric distances.

### References

M. Krivanek and J. Moravek (1986). NP-hard problems in hierarchical tree clustering. Acta Informatica, 23, 311–323. doi: 10.1007/BF00289116.

T. J. Smith (2001). Constructing ultrametric and additive trees based on the L_1 norm. Journal of Classification, 18, 185–207. https://link.springer.com/article/10.1007/s00357-001-0015-0.

`cl_consensus` for computing least absolute deviation (Manhattan) consensus hierarchies; `ls_fit_ultrametric`.