inst/shiny-server/markdown/dataformat.md

Supported Data Format

The user is encouraged to analyze your own benchmark data using IOHanalyzer. For this purpose, users have prepare their own data set according to the supported data formats. At the time of writing, three data formats are supported by IOHanalyzer:

When loading the data in programming interface (and in the graphical user interface as well), it is not necessary to specified which format the user took as IOHanalyzer detects the format automatically. For all three data formats, data files are organized in the same manner within the file system. The structure of data files exemplified as follows:

Generally, in the folder (e.g., ./ here) that contains the data set, the following files are mandatory for IOHanalyzer:

Meta-data

When benchmarking, it is common to specify a number of different dimensions, functions and instances, resulting in a quite large number of data files (e.g., *.dat files). It would make the data organization more structured if some meta data are provided. Here, the meta data are implemented in a format that is very similar to that in the well-known COCO environment. The meta data are indicated with suffix \verb|.info|. An small example is provided as follows:

suite = 'PBO', funcId = 10, DIM = 100, algId = '(1+1) fGA'
%
data_f10/IOHprofiler_f10_DIM625.dat, 1:1953125|5.59000e+02,
1:1953125|5.59000e+02, 1:1953125|5.59000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.64000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.49000e+02, 1:1953125|5.54000e+02,
1:1953125|5.49000e+02
suite = 'PBO', funcId = 10, DIM = 625, algId = '(1+1) fGA'
%
data_f10/IOHprofiler_f10_DIM625.dat, 1:1953125|5.59000e+02,
1:1953125|5.59000e+02, 1:1953125|5.59000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.64000e+02, 1:1953125|5.54000e+02,
1:1953125|5.59000e+02, 1:1953125|5.49000e+02, 1:1953125|5.54000e+02,
1:1953125|5.49000e+02
...

Note that, as this meta information is also used in IOHanalyzer when loading the data, it is crucial to give an attention to the format of the meta data, if the user would convert their own data sets.

Raw-data

Despite different events are used for those four types of data file, those files take the same format, which is adapted from CSV format to accommodate multiple runs/instances. Typically, this format is illustrated by the example below (with dummy data records):

"function evaluation"  "current f(x)" "best-so-far f(x)"  "current af(x)+b" "best af(x)+b" "parameter name"  ...
1  +2.95000e+02  +2.95000e+02  +2.95000e+02  +2.95000e+02  0.000000  ...
2  +2.96000e+02  +2.96000e+02  +2.96000e+02  +2.96000e+02  0.001600  ...
4  +3.07000e+02  +3.07000e+02  +3.07000e+02  +3.07000e+02  0.219200  ...
9  +3.11000e+02  +3.11000e+02  +3.11000e+02  +3.11000e+02  0.006400  ...
12  +3.12000e+02  +3.12000e+02  +3.12000e+02  +3.12000e+02  0.001600  ...
16  +3.16000e+02  +3.16000e+02  +3.16000e+02  +3.16000e+02  0.006400  ...
20  +3.17000e+02  +3.17000e+02  +3.17000e+02  +3.17000e+02  0.001600  ...
23  +3.28000e+02  +3.28000e+02  +3.28000e+02  +3.28000e+02  0.027200  ...
27  +3.39000e+02  +3.39000e+02  +3.39000e+02  +3.39000e+02  0.059200  ...
"function evaluation"  "current f(x)" "best-so-far f(x)"  "current af(x)+b" "best af(x)+b" "parameter name"  ...
1   +3.20000e+02  +3.20000e+02  +3.20000e+02  +3.20000e+02  1.000000  ...
24  +3.44000e+02  +3.44000e+02  +3.44000e+02  +3.44000e+02  2.000000  ...
60  +3.64000e+02  +3.64000e+02  +3.64000e+02  +3.64000e+02  3.000000  ...
"function evaluation"  "current f(x)" "best-so-far f(x)"  "current af(x)+b" "best af(x)+b" "parameter name"  ...
...  ... ... ... ...  ...  ...

Note that,

  1. [mandatory] each separation line (line that starts with "function evaluation") serves as a separator among different independent runs of the same algorithm. Therefore, it is clear that the data block between two separation lines corresponds to a single run a triplet of (dimension, function, instance).
  2. [mandatory] "function evaluation" the current number of function evaluations.
  3. [mandatory] "best-so-far f(x)" keeps track of the best function value observed since the beginning of one run.
  4. [optional] "current f(x)" stands for the function value observed when the corresponding number of function evaluation is consumed.
  5. [optional] The value stored under "current af(x)+b" and "best af(x)+b", are so-called transformed function values, obtained on each function instances that are generated by translating the orginal function in its domain and co-domain.
  6. [optional] In addition, a parameter value (named "parameter") is also tracked in this example and recording more parameter value is also facilitated (see below).

Two-Column Format

The raw data file, in its simplest two-column format should resemble the following example:

"function evaluation" "best-so-far f(x)"
1  +2.95000e+02
2  +2.96000e+02
4  +3.07000e+02  
23  +3.28000e+02
27  +3.39000e+02
"function evaluation" "best-so-far f(x)"  
1   +3.20000e+02
...  ...

This format is regulated as follows:

Reference

[1] Hansen N, Auger A, Finck S, Ros R (2009a). "Real-Parameter Black-Box OptimizationBenchmarking 2009: Experimental Setup." Research Report RR-6828, INRIA. URL https://hal.inria.fr/inria-00362649.



IOHprofiler/IOHanalyzer documentation built on Feb. 1, 2024, 11:35 a.m.