tpr.dag.cv: TPR-DAG cross-validation experiments

Description Usage Arguments Details Value Examples

View source: R/tpr.dag.R

Description

Correct the computed scores in a hierarchy according to the a TPR-DAG ensemble variant.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
tpr.dag.cv(
  S,
  g,
  ann,
  norm = FALSE,
  norm.type = NULL,
  positive = "children",
  bottomup = "threshold",
  topdown = "gpav",
  W = NULL,
  parallel = FALSE,
  ncores = 1,
  threshold = seq(from = 0.1, to = 0.9, by = 0.1),
  weight = 0,
  kk = 5,
  seed = 23,
  metric = "auprc",
  n.round = NULL
)

Arguments

S

a named flat scores matrix with examples on rows and classes on columns.

g

a graph of class graphNEL. It represents the hierarchy of the classes.

ann

an annotation matrix: rows correspond to examples and columns to classes. ann[i,j]=1 if example i belongs to class j, ann[i,j]=0 otherwise. ann matrix is necessary to maximize the hyper-parameter(s) of the chosen parametric TPR-DAG ensemble variant respect to the metric selected in metric. For the parametric-free ensemble variant set ann=NULL.

norm

a boolean value. Should the flat score matrix be normalized? By default norm=FALSE. If norm=TRUE the matrix S is normalized according to the normalization type selected in norm.type.

norm.type

a string character. It can be one of the following values:

  1. NULL (def.): none normalization is applied (norm=FALSE)

  2. maxnorm: each score is divided for the maximum value of each class (scores.normalization);

  3. qnorm: quantile normalization. preprocessCore package is used (scores.normalization);

positive

choice of the positive nodes to be considered in the bottom-up strategy. Can be one of the following values:

  • children (def.): positive children are are considered for each node;

  • descendants: positive descendants are are considered for each node;

bottomup

strategy to enhance the flat predictions by propagating the positive predictions from leaves to root. It can be one of the following values:

  • threshold.free: positive nodes are selected on the basis of the threshold.free strategy;

  • threshold (def.): positive nodes are selected on the basis of the threshold strategy;

  • weighted.threshold.free: positive nodes are selected on the basis of the weighted.threshold.free strategy;

  • weighted.threshold: positive nodes are selected on the basis of the weighted.threshold strategy;

  • tau: positive nodes are selected on the basis of the tau strategy. NOTE: tau is only a DESCENS variant. If you select tau strategy you must set positive=descendants;

topdown

strategy to make the scores hierarchy-consistent. It can be one of the following values:

  • htd: HTD-DAG strategy is applied (htd);

  • gpav (def.): GPAV strategy is applied (gpav);

W

vector of weight relative to a single example. If W=NULL (def.) it is assumed that W is a unitary vector of the same length of the columns' number of the matrix S (root node included). Set W only if topdown=gpav.

parallel

a boolean value:

  • TRUE: execute the parallel implementation of GPAV (gpav.parallel);

  • FALSE (def.): execute the sequential implementation of GPAV (gpav.over.examples);

Use parallel only if topdown=gpav; otherwise set parallel=FALSE.

ncores

number of cores to use for parallel execution. Set ncores=1 if parallel=FALSE, otherwise set ncores to the desired number of cores. Set ncores if topdown=gpav, otherwise set ncores=1.

threshold

range of threshold values to be tested in order to find the best threshold (def: from:0.1, to:0.9, by:0.1). The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher. For the threshold-free variants, set threshold=0.

weight

range of weight values to be tested in order to find the best weight (def: from:0.1, to:0.9, by:0.1). The denser the range is, the higher the probability to find the best threshold is, but the execution time will be higher. For the weight-free variants, set weight=0.

kk

number of folds of the cross validation (def: kk=5) on which tuning the parameters threshold, weight and tau of the parametric ensemble variants. For the parametric-free variants (i.e. if bottomup = threshold.free), set kk=NULL.

seed

initialization seed for the random generator to create folds (def. 23). If seed=NULL folds are generated without seed initialization. If bottomup=threshold.free, set seed=NULL.

metric

a string character specifying the performance metric on which maximizing the parametric ensemble variant. It can be one of the following values:

  1. auprc (def.): the parametric ensemble variant is maximized on the basis of AUPRC (auprc);

  2. fmax: the parametric ensemble variant is maximized on the basis of Fmax (multilabel.F.measure;

  3. NULL: threshold.free variant is parameter-free, so none optimization is needed.

n.round

number of rounding digits (def. 3) to be applied to the hierarchical scores matrix for choosing the best threshold on the basis of the best Fmax. If bottomup==threshold.free or metric="auprc", set n.round=NULL.

Details

The parametric hierarchical ensemble variants are cross-validated maximizing the parameter on the metric selected in metric.

Value

A named matrix with the scores of the functional terms corrected according to the chosen TPR-DAG ensemble algorithm.

Examples

1
2
3
4
5
6
data(graph);
data(scores);
data(labels);
S.tpr <- tpr.dag.cv(S, g, ann=NULL, norm=FALSE, norm.type=NULL, positive="children",
bottomup="threshold.free", topdown="gpav", W=NULL, parallel=FALSE, ncores=1,
threshold=0, weight=0, kk=NULL, seed=NULL, metric=NULL, n.round=NULL);

HEMDAG documentation built on Feb. 12, 2021, 5:13 p.m.