# Measuring goodness-of-fit for principal objects.

### Description

These functions compute the ‘coverage coefficient’ *R_c*
for local principal curves, local principal points
(i.e., kernel density estimates obtained through iterated mean shift), and other principal objects.

### Usage

1 2 3 4 5 6 7 8 9 10 |

### Arguments

`x` |
an object used to select a method. |

`...` |
Further arguments passed to or from other methods (not needed yet). |

`data` |
A data matrix. |

`closest.coords` |
A matrix of coordinates of the projected data. |

`type` |
For principal curves, don't modify. For principal points, set "points". |

### Details

`Rc`

computes the coverage coefficient *R_c*, a quantity which
estimates the goodness-of-fit of a fitted principal object. This
quantity can be interpreted similar to the coeffient of determination in
regression analysis: Values close to 1 indicate a good fit, while values
close to 0 indicate a ‘bad’ fit (corresponding to linear PCA).

For objects of type `lpc`

, `lpc.spline`

, and `ms`

, S3 methods are available which use the generic function
`Rc`

. This, in turn, calls the base function `base.Rc`

, which
can also be used manually if the fitted object is of another class.
In principle, function `base.Rc`

can be used for assessing
goodness-of-fit of any principal object provided that
the coordinates (`closest.coords`

) of the projected data are
available. For instance, for HS principal curves fitted via
`princurve`

, this information is contained in component `$s`

,
and for a a k-means object, say `fitk`

, this information can be
obtained via `fitk$centers[fitk$cluster,]`

. Set `type="points"`

in
the latter case.

The function `Rc`

attempts to compute all missing information, so
computation will take the longer the less informative the given
object `x`

is. Note also, `Rc`

looks up the option `scaled`

in the fitted
object, and accounts for the scaling automatically. Important: If the data
were scaled, then do NOT unscale the results by hand in order to feed
the unscaled version into `base.Rc`

, this will give a wrong result.

In terms of methodology, these functions compute *R_c* directly through the mean
reduction of absolute residual length, rather than through the
area above the coverage curve.

These functions do currently not account for observation
weights, i.e. *R_c* is computed through the unweighted mean
reduction in absolute residual length (even if weights have been used for
the curve fitting).

### Acknowledgements

Contributions (in form of pieces of code, or useful suggestions for improvements) by Jo Dwyer, Mohammad Zayed, and Ben Oakley are gratefully acknowledged.

### Author(s)

J. Einbeck and L. Evers.

### References

Einbeck, Tutz, and Evers (2005). Local principal curves. Statistics and Computing 15, 301-313.

Einbeck (2011). Bandwidth selection for nonparametric unsupervised learning techniques – a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.

### See Also

`lpc.spline`

, codems, `coverage`

.

### Examples

1 2 3 4 5 6 7 8 9 10 | ```
data(calspeedflow)
lpc1 <- lpc.spline(lpc(calspeedflow[,3:4]), project=TRUE)
Rc(lpc1)
# is the same as:
base.Rc(lpc1$lpcobject$data, lpc1$closest.coords)
ms1 <- ms(calspeedflow[,3:4],plotms=0)
Rc(ms1)
# is the same as:
base.Rc(ms1$data, ms1$cluster.center[ms1$closest.label,], type="points")
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.