Description Usage Arguments Details Value See Also Examples
Function gathering the true-path-rule-based hierarchical learning ensemble algorithms and its variants.
In their more general form the TPR-DAG
algorithms adopt a two step learning strategy:
in the first step they compute a per-level bottom-up visit from the leaves to the root to propagate positive predictions across the hierarchy;
in the second step they compute a per-level top-down visit from the root to the leaves in order to assure the hierarchical consistency of the predictions;
1 2 3 |
S |
a named flat scores matrix with examples on rows and classes on columns. |
g |
a graph of class |
root |
name of the class that it is on the top-level of the hierarchy ( |
positive |
choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values:
|
bottomup |
strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values:
|
topdown |
strategy to make the scores hierarchy-consistent. It can be one of the following values:
|
t |
threshold for the choice of positive nodes ( |
w |
weight to balance between the contribution of the node i and that of its positive nodes. Set |
W |
vector of weight relative to a single example. If the vector |
parallel |
boolean value:
Use |
ncores |
number of cores to use for parallel execution ( |
The vanilla TPR-DAG
adopts a per-level bottom-up traversal of the DAG to correct the flat predictions \hat{y}_i:
\bar{y}_i := \frac{1}{1 + |φ_i|} (\hat{y}_i + ∑_{j \in φ_i} \bar{y}_j)
where φ_i are the positive children of i. Different strategies to select the positive children φ_i can be applied:
Threshold-Free strategy: the positive nodes are those children that can increment the score of the node i, that is those nodes that achieve a score higher than that of their parents:
φ_i := \{ j \in child(i) | \bar{y}_j > \hat{y}_i \}
Threshold strategy: the positive children are selected on the basis of a threshold that can be selected in two different ways:
for each node a constant threshold \bar{t} is a priori selected:
φ_i := \{ j \in child(i) | \bar{y}_j > \bar{t} \}
For instance if the predictions represent probabilities it could be meaningful to a priori select \bar{t}=0.5.
the threshold is selected to maximize some performance metric \mathcal{M} estimated on the training data, as for instance the F-score or the AUPRC. In other words the threshold is selected to maximize some measure of accuracy of the predictions \mathcal{M}(j,t) on the training data for the class j with respect to the threshold t. The corresponding set of positives \forall i \in V is:
φ_i := \{ j \in child(i) | \bar{y}_j > t_j^*, t_j^* = \arg \max_{t} \mathcal{M}(j,t) \}
For instance t_j^* can be selected from a set of t \in (0,1) through internal cross-validation techniques.
The weighted TPR-DAG
version can be designed by adding a weight w \in [0,1] to balance between the
contribution of the node i and that of its positive children φ, through their convex combination:
\bar{y}_i := w \hat{y}_i + \frac{(1 - w)}{|φ_i|} ∑_{j \in φ_i} \bar{y}_j
If w=1 no weight is attributed to the children and the TPR-DAG
reduces to the HTD-DAG
algorithm, since in this
way only the prediction for node i is used in the bottom-up step of the algorithm. If w=0 only the predictors
associated to the children nodes vote to predict node i. In the intermediate cases we attribute more importance to the predictor for the
node i or to its children depending on the values of w.
The contribution of the descendants of a given node decays exponentially with their distance from the node itself. To enhance the
contribution of the most specific nodes to the overall decision of the ensemble we designed a novel variant that we named DESCENS
.
The novelty of DESCENS
consists in strongly considering the contribution of all the descendants of each node instead of
only that of its children. Therefore DESCENS
predictions are more influenced by the information embedded in the leaves nodes,
that are the classes containing the most informative and meaningful information from a biological and medical standpoint.
For the choice of the “positive” descendants we use the same strategies adopted for the selection of the “positive”
children shown above. Furthermore, we designed a variant specific only for DESCENS
, that we named DESCENS
-τ.
The DESCENS
-τ variants balances the contribution between the “positives” children of a node i
and that of its “positives” descendants excluding its children by adding a weight τ \in [0,1]:
\bar{y}_i := \frac{τ}{ 1 +|φ_i|} ( \hat{y}_i + ∑_{j \in φ_i} \bar{y}_j ) + \frac{1-τ}{1+|δ_i|} ( \hat{y}_i + ∑_{j\in δ_i} \bar{y}_j )
where φ_i are the “positive” children of i and δ_i=Δ_i \setminus φ_i the descendants of i without its children. If τ=1 we consider only the contribution of the “positive” children of i; if τ=0 only the descendants that are not children contribute to the score, while for intermediate values of τ we can balance the contribution of φ_i and δ_i positive nodes.
Simply by replacing the HTD
(HTD-DAG
) top-down step with the GPAV
approach (GPAV
) we can design the
TPR-DAG
variant ISO-TPR
. The most important feature of ISO-TPR
is that it maintains the hierarchical constraints by
construction and selects the closest solution (in the least square sense) to the bottom-up predictions that obeys the true path rule.
Obviously, any aforementioned strategy for the selection of “positive” children or descendants can be applied before executing the GPAV
correction.
a named matrix with the scores of the classes corrected according to the chosen algorithm.
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.