Do.GPAV: GPAV - High Level Function

Description Usage Arguments Details Value See Also Examples

View source: R/GPAV.R

Description

High level function to correct the computed scores in a hierarchy according to the GPAV algorithm.

Usage

1
2
3
4
5
6
7
Do.GPAV(norm = TRUE, norm.type = NULL, W = NULL, parallel = FALSE,
  ncores = 1, folds = 5, seed = 23, n.round = 3,
  f.criterion = "F", recall.levels = seq(from = 0.1, to = 1, by = 0.1),
  compute.performance = FALSE, flat.file = flat.file,
  ann.file = ann.file, dag.file = dag.file, flat.dir = flat.dir,
  ann.dir = ann.dir, dag.dir = dag.dir,
  hierScore.dir = hierScore.dir, perf.dir = perf.dir)

Arguments

norm

boolean value:

  • TRUE (def.): the flat scores matrix has been already normalized in according to a normalization method;

  • FALSE: the flat scores matrix has not been normalized yet. See the parameter norm.type for which normalization can be applied;

norm.type

can be one of the following three values:

  1. NULL (def.): set norm.type to NULL if and only if the parameter norm is set to TRUE;

  2. MaxNorm: each score is divided for the maximum of each class;

  3. Qnorm: quantile normalization. preprocessCore package is used;

W

vector of weight relative to a single example. If the vector W is not specified (def. W=NULL), W is a unitary vector of the same length of the columns' number of the flat scores matrix (root node included).

parallel

boolean value:

  • TRUE: execute the parallel implementation of GPAV (GPAV.parallel);

  • FALSE (def.): execute the sequential implementation of GPAV (GPAV.over.examples);

ncores

number of cores to use for parallel execution (def. 8). Set the parameter ncores to 1 if the parameter parallel is set to FALSE, otherwise set the desired number of cores.

folds

number of folds of the cross validation on which computing the performance metrics averaged across folds (def. 5). If folds=NULL, the performance metrics are computed one-shot, otherwise the performance metrics are averaged across folds. If compute.performance is set to FALSE, folds is automatically set to NULL.

seed

initialization seed for the random generator to create folds (def. 23). If NULL folds are generated without seed initialization. The parameter seed controls both the parameter kk and the parameter folds. If compute.performance is set to FALSE and bottomup is set to threshold.free, then seed is automatically set to NULL.

n.round

number of rounding digits to be applied to the hierarchical scores matrix (def. 3). It is used for choosing the best threshold on the basis of the best F-measure. If compute.performance is set to FALSE and bottomup is set to threshold.free, then n.round is automatically set to NULL.

f.criterion

character. Type of F-measure to be used to select the best F-measure. Two possibilities:

  1. F (def.): corresponds to the harmonic mean between the average precision and recall;

  2. avF: corresponds to the per-example F-score averaged across all the examples;

If compute.performance is set to FALSE and bottomup is set to threshold.free, then f.criterion is automatically set to NULL.

recall.levels

a vector with the desired recall levels (def: from:0.1, to:0.9, by:0.1) to compute the Precision at fixed Recall level (PXR). If compute.performance=FALSE the parameter recall.levels is automatically set to NULL.

compute.performance

boolean value: should the flat and hierarchical performance (AUPRC, AUROC, PXR, multilabel F-score) be returned?

  • FALSE (def.): performance are not computed and just the hierarchical scores matrix is returned;

  • TRUE: both performance and hierarchical scores matrix are returned;

flat.file

name of the file containing the flat scores matrix to be normalized or already normalized (without rda extension).

ann.file

name of the file containing the label matrix of the examples (without rda extension).

dag.file

name of the file containing the graph that represents the hierarchy of the classes (without rda extension).

flat.dir

relative path where flat scores matrix is stored.

ann.dir

relative path where annotation matrix is stored.

dag.dir

relative path where graph is stored.

hierScore.dir

relative path where the hierarchical scores matrix must be stored.

perf.dir

relative path where the performance measures must be stored. If compute.performance=FALSE the functions automatically sets perf.dir to NULL.

Details

The function checks if the number of classes between the flat scores matrix and the annotations matrix mismatched. If so, the number of terms of the annotations matrix is shrunk to the number of terms of the flat scores matrix and the corresponding subgraph is computed as well. N.B.: it is supposed that all the nodes of the subgraph are accessible from the root.

Value

Two rda files stored in the respective output directories:

  1. Hierarchical Scores Results: a matrix with examples on rows and classes on columns representing the computed hierarchical scores for each example and for each considered class. It is stored in the hierScore.dir directory.

  2. Performance Measures: flat and hierarchical performance results:

    1. AUPRC results computed though AUPRC.single.over.classes (AUPRC);

    2. AUROC results computed through AUROC.single.over.classes (AUROC);

    3. PXR results computed though precision.at.given.recall.levels.over.classes (PXR);

    4. FMM results computed though compute.Fmeasure.multilabel (FMM);

It is stored in the perf.dir directory.

See Also

GPAV

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(graph);
data(scores);
data(labels);
tmpdir <- paste0(tempdir(),"/");
save(g, file=paste0(tmpdir,"graph.rda"));
save(L, file=paste0(tmpdir,"labels.rda"));
save(S, file=paste0(tmpdir,"scores.rda"));
dag.dir <- flat.dir <- ann.dir <- tmpdir;
hierScore.dir <- perf.dir <- tmpdir;
recall.levels <- seq(from=0.25, to=1, by=0.25);
dag.file <- "graph";
flat.file <- "scores";
ann.file <- "labels";
Do.GPAV(norm=FALSE, norm.type= "MaxNorm", W=NULL, parallel=FALSE, ncores=1, folds=NULL, 
seed=23, n.round=3, f.criterion ="F", recall.levels=recall.levels, compute.performance=TRUE, 
flat.file=flat.file, ann.file=ann.file, dag.file=dag.file, flat.dir=flat.dir, ann.dir=ann.dir, 
dag.dir=dag.dir, hierScore.dir=hierScore.dir, perf.dir=perf.dir);

gecko515/HEMDAG documentation built on Oct. 18, 2019, 6:34 a.m.