TPR-DAG-variants: TPR-DAG Ensemble Variants
In gecko515/HEMDAG: Hierarchical Ensemble Methods for Directed Acyclic Graphs

Description Usage Arguments Details Value See Also Examples

Function gathering the true-path-rule-based hierarchical learning ensemble algorithms and its variants. In their more general form the TPR-DAG algorithms adopt a two step learning strategy:

in the first step they compute a per-level bottom-up visit from the leaves to the root to propagate positive predictions across the hierarchy;
in the second step they compute a per-level top-down visit from the root to the leaves in order to assure the hierarchical consistency of the predictions;

1
2
3

TPR.DAG(S, g, root = "00", positive = "children",
  bottomup = "threshold.free", topdown = "HTD", t = 0, w = 0,
  W = NULL, parallel = FALSE, ncores = 1)

`S`	a named flat scores matrix with examples on rows and classes on columns.
`g`	a graph of class `graphNEL`. It represents the hierarchy of the classes.
`root`	name of the class that it is on the top-level of the hierarchy (`def. root="00"`).
`positive`	choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values: `children` (`def.`): for each node are considered its positive children; `descendants`: for each node are considered its positive descendants;
`bottomup`	strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values: `threshold.free` (`def.`): positive nodes are selected on the basis of the `threshold.free` strategy (`def.`); `threshold`: positive nodes are selected on the basis of the `threshold` strategy; `weighted.threshold.free`: positive nodes are selected on the basis of the `weighted.threshold.free` strategy; `weighted.threshold`: positive nodes are selected on the basis of the `weighted.threshold` strategy; `tau`: positive nodes are selected on the basis of the `tau` strategy. NOTE: `tau` is only a `DESCENS` variants. If you use `tau` strategy you must set the parameter `positive=descendants`;
`topdown`	strategy to make the scores hierarchy-consistent. It can be one of the following values: `HTD` (`def.`): `HTD-DAG` strategy is applied (`HTD-DAG`); `GPAV`: `GPAV` strategy is applied (`GPAV`);
`t`	threshold for the choice of positive nodes (`def. t=0`). Set `t` only for the variants that requiring a threshold for the selection of the positive nodes, otherwise set `t` to zero.
`w`	weight to balance between the contribution of the node i and that of its positive nodes. Set `w` only for the weighted variants, otherwise set `w` to zero.
`W`	vector of weight relative to a single example. If the vector `W` is not specified (by `def. W=NULL`), `W` is a unitary vector of the same length of the columns' number of the flat scores matrix (root node included). Set `W` only if `topdown=GPAV`.
`parallel`	boolean value: `TRUE`: execute the parallel implementation of GPAV (`GPAV.parallel`); `FALSE` (def.): execute the sequential implementation of GPAV (`GPAV.over.examples`); Use `parallel` if and only if `topdown=GPAV`; otherwise set `parallel=FALSE`.
`ncores`	number of cores to use for parallel execution (`def. 8`). Set `ncores=1` if `parallel=FALSE`, otherwise set `ncores` to the desired number of cores. Use `ncores` if and only if `topdown=GPAV`; otherwise set `parallel=1`.

The vanilla TPR-DAG adopts a per-level bottom-up traversal of the DAG to correct the flat predictions \hat{y}_i:

\bar{y}_i := \frac{1}{1 + |φ_i|} (\hat{y}_i + ∑_{j \in φ_i} \bar{y}_j)

where φ_i are the positive children of i. Different strategies to select the positive children φ_i can be applied:

Threshold-Free strategy: the positive nodes are those children that can increment the score of the node i, that is those nodes that achieve a score higher than that of their parents:

φ_i := \{ j \in child(i) | \bar{y}_j > \hat{y}_i \}
Threshold strategy: the positive children are selected on the basis of a threshold that can be selected in two different ways:
1. for each node a constant threshold \bar{t} is a priori selected:
  
  φ_i := \{ j \in child(i) | \bar{y}_j > \bar{t} \}
  
  For instance if the predictions represent probabilities it could be meaningful to a priori select \bar{t}=0.5.
2. the threshold is selected to maximize some performance metric \mathcal{M} estimated on the training data, as for instance the F-score or the AUPRC. In other words the threshold is selected to maximize some measure of accuracy of the predictions \mathcal{M}(j,t) on the training data for the class j with respect to the threshold t. The corresponding set of positives \forall i \in V is:
  
  φ_i := \{ j \in child(i) | \bar{y}_j > t_j^*, t_j^* = \arg \max_{t} \mathcal{M}(j,t) \}
  
  For instance t_j^* can be selected from a set of t \in (0,1) through internal cross-validation techniques.

The weighted TPR-DAG version can be designed by adding a weight w \in [0,1] to balance between the contribution of the node i and that of its positive children φ, through their convex combination:

\bar{y}_i := w \hat{y}_i + \frac{(1 - w)}{|φ_i|} ∑_{j \in φ_i} \bar{y}_j

If w=1 no weight is attributed to the children and the TPR-DAG reduces to the HTD-DAG algorithm, since in this way only the prediction for node i is used in the bottom-up step of the algorithm. If w=0 only the predictors associated to the children nodes vote to predict node i. In the intermediate cases we attribute more importance to the predictor for the node i or to its children depending on the values of w.

The contribution of the descendants of a given node decays exponentially with their distance from the node itself. To enhance the contribution of the most specific nodes to the overall decision of the ensemble we designed a novel variant that we named DESCENS. The novelty of DESCENS consists in strongly considering the contribution of all the descendants of each node instead of only that of its children. Therefore DESCENS predictions are more influenced by the information embedded in the leaves nodes, that are the classes containing the most informative and meaningful information from a biological and medical standpoint. For the choice of the “positive” descendants we use the same strategies adopted for the selection of the “positive” children shown above. Furthermore, we designed a variant specific only for DESCENS, that we named DESCENS-τ. The DESCENS-τ variants balances the contribution between the “positives” children of a node i and that of its “positives” descendants excluding its children by adding a weight τ \in [0,1]:

\bar{y}_i := \frac{τ}{ 1 +|φ_i|} ( \hat{y}_i + ∑_{j \in φ_i} \bar{y}_j ) + \frac{1-τ}{1+|δ_i|} ( \hat{y}_i + ∑_{j\in δ_i} \bar{y}_j )

where φ_i are the “positive” children of i and δ_i=Δ_i \setminus φ_i the descendants of i without its children. If τ=1 we consider only the contribution of the “positive” children of i; if τ=0 only the descendants that are not children contribute to the score, while for intermediate values of τ we can balance the contribution of φ_i and δ_i positive nodes.

Simply by replacing the HTD (HTD-DAG) top-down step with the GPAV approach (GPAV) we can design the TPR-DAG variant ISO-TPR. The most important feature of ISO-TPR is that it maintains the hierarchical constraints by construction and selects the closest solution (in the least square sense) to the bottom-up predictions that obeys the true path rule. Obviously, any aforementioned strategy for the selection of “positive” children or descendants can be applied before executing the GPAV correction.

a named matrix with the scores of the classes corrected according to the chosen algorithm.

GPAV, HTD-DAG

data(graph);
data(scores);
data(labels);
root <- root.node(g);
S.hier <- TPR.DAG(S, g, root, positive="children", bottomup="threshold.free", topdown="HTD", 
t=0, w=0, W=NULL, parallel=FALSE, ncores=1);