These functions compute the ‘coverage coefficient’ *R_c*
for local principal curves, local principal points
(i.e., kernel density estimates obtained through iterated mean shift), and other principal objects.

1 2 3 4 5 6 7 8 9 10 |

`x` |
an object used to select a method. |

`...` |
Further arguments passed to or from other methods (not needed yet). |

`data` |
A data matrix. |

`closest.coords` |
A matrix of coordinates of the projected data. |

`type` |
For principal curves, don't modify. For principal points, set "points". |

`Rc`

computes the coverage coefficient *R_c*, a quantity which
estimates the goodness-of-fit of a fitted principal object. This
quantity can be interpreted similar to the coeffient of determination in
regression analysis: Values close to 1 indicate a good fit, while values
close to 0 indicate a ‘bad’ fit (corresponding to linear PCA).

For objects of type `lpc`

, `lpc.spline`

, and `ms`

, S3 methods are available which use the generic function
`Rc`

. This, in turn, calls the base function `base.Rc`

, which
can also be used manually if the fitted object is of another class.
In principle, function `base.Rc`

can be used for assessing
goodness-of-fit of any principal object provided that
the coordinates (`closest.coords`

) of the projected data are
available. For instance, for HS principal curves fitted via
`princurve`

, this information is contained in component `$s`

,
and for a a k-means object, say `fitk`

, this information can be
obtained via `fitk$centers[fitk$cluster,]`

. Set `type="points"`

in
the latter case.

The function `Rc`

attempts to compute all missing information, so
computation will take the longer the less informative the given
object `x`

is. Note also, `Rc`

looks up the option `scaled`

in the fitted
object, and accounts for the scaling automatically. Important: If the data
were scaled, then do NOT unscale the results by hand in order to feed
the unscaled version into `base.Rc`

, this will give a wrong result.

In terms of methodology, these functions compute *R_c* directly through the mean
reduction of absolute residual length, rather than through the
area above the coverage curve.

These functions do currently not account for observation
weights, i.e. *R_c* is computed through the unweighted mean
reduction in absolute residual length (even if weights have been used for
the curve fitting).

Contributions (in form of pieces of code, or useful suggestions for improvements) by Jo Dwyer, Mohammad Zayed, and Ben Oakley are gratefully acknowledged.

J. Einbeck and L. Evers.

Einbeck, Tutz, and Evers (2005). Local principal curves. Statistics and Computing 15, 301-313.

Einbeck (2011). Bandwidth selection for nonparametric unsupervised learning techniques – a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.

`lpc.spline`

, codems, `coverage`

.

1 2 3 4 5 6 7 8 9 10 | ```
data(calspeedflow)
lpc1 <- lpc.spline(lpc(calspeedflow[,3:4]), project=TRUE)
Rc(lpc1)
# is the same as:
base.Rc(lpc1$lpcobject$data, lpc1$closest.coords)
ms1 <- ms(calspeedflow[,3:4],plotms=0)
Rc(ms1)
# is the same as:
base.Rc(ms1$data, ms1$cluster.center[ms1$closest.label,], type="points")
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.