MAINTENANCE.md

Current state

yardstick has reached 1.0.0 status and is stable. For the majority of users, there are 3 types of metrics, each of which has an internal class that is defined through new_metric():

yardstick is a bit unique in that the actual functions it exports, like accuracy(), have extra classes and attributes attached to them. This allows them to be used in metric_set(), which has to decide whether or not two metric functions are allowed to be combined in the same metric set or not. For example, two numeric metric functions can be combined, but you can't combine a numeric metric with a class metric. The only exception here is that you can combine a class metric with a class probability metric - in the resulting function you get back from metric_set(), the class metric will use the estimate interface and the class probability metric will use the one.

For the current public facing API, I don't see any major changes needing to be made. I'm fairly happy with how the 3 core metric classes work. I think most of the work for yardstick could be done on improving the internal helpers (see below in Known issues), or with extending yardstick with new metric class types (see Future directions). It is likely that the internal helpers will have to be improved first before you can add new metric class types, because they are quite complex as it is, making extending yardstick fairly difficult.

Known issues

There is a very similar problem with metric_vec_template(). It currently tries to handle validation and function calling for all of the different types of metrics. This makes it extremely complex, hard to extend, and probably a bit brittle. In particular validate_truth_estimate_checks() does some fairly complex S3 dispatch to perform its validation (kind of a home grown double dispatch on truth and estimate) which might be able to be rewritten in a cleaner way if we had separate metric_vec_template() functions for the different kinds of metric types.

There are a few issues where this high cognitive overhead comes into play, making it hard to add these features:

The complexity of validate_truth_estimate_checks() could be reduced by instead creating a few check_*() helpers that we force the metric writers to call themselves. If we provide useful ones, then they would just call them in their metric_vec() function themselves and we'd avoid the double dispatch altogether because they'd be in charge of calling the correct check_*() function based on the type of truth and estimate that their metric works with. Something like check_factor_truth_factor_estimate(truth, estimate). That would probably help with #305.

Future directions



topepo/yardstick documentation built on April 20, 2024, 7:15 p.m.