knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
collectResults()
is a powerful function and the aim of this vignette is to
explain its options in more detail.
collectResults()
creates the following database tables:
experiments
: Which experiments are stored in the DB?scenarios
: Which scenarios per experiment are available?placeholders
: What placeholders are used per scenario?scenarios_metadata
: What metadata is available per scenario?results
: OpenMalaria output (per scenario).Please note that all data are identified by experiment and possibly by scenario.
The table containing the simulation output (results) can be renamed by using the
resultsName
argument and its layout can be adjusted as needed via
resultsCols
. resultsCols
needs to be a list where the names
entry defines
the column names and the types
entry the SQLite column types.
The creation of an index can be controlled by the indexOn
argument, which
accepts a list of vectors. Each vector is a pair of the SQLite table and column
name the index should be created on.
collectResults()
is performing three steps during the result collection:
These three steps can be modified by the user and allow changing the behavior of
collectResults()
. Users can write their own functions with nearly no
limitations and can pass them to collectResults()
's dedicated arguments.
fileFun
's job is to return a vector of scenario XML file names. By default,
the file
column of the scenario data frame is loaded from the cache. Usually
this would be a function which applies filtering to the scenarios data frame in
order to determine the files.
readFun
will called upon each found output file and has to return a data frame
which is used by aggrFun
or put into the database. readOutputFile()
should
be sufficient for most uses. In case you want to write your own implementation,
make sure that the first argument the function accepts is the file name and it
has R's ellipsis ...
in the arguments. This is necessary so the scenario ID
can be passed automatically and can be used by your function via scenID
.
aggrFun
is probably the function most users want to ajust to their needs and
where we do not provide any default. It needs to accept the data frame generated
from readFun
as a first argument and should provide the final data frame which
will be put into the database. This data frame must not have a column named
experiment_id
and can contain a column named scenario_id
. Furthermore, the
output needs to match the layout defined in resultsCols
.
The arguments for the above three functions have to be (named) list. If you need
to pass unevaluated arguments, consider using quote()
and bquote()
.
collectResults()
will try to remove all quotes before the arguments are passed
to the functions.
Users can control the number of CPU cores used for the calculations by the parameters
ncores
and ncoresDT
. Similar to the scenario generation and simulation step, ncores
determines how many R cluster nodes will be launched to perform the calculations. ncoresDT
sets
the number of threads available for data.table
, which is used by us extensively. By default this
is set to one in order to avoid nested parallelization.
The strategy
argument accepts "serial"
and "batch"
as input values and
defines how collectResults()
is processing the files.
"batch"
means that we will launch an R cluster to read all files in parallel
into one single data table. Then we will apply aggregation function and add the
whole output batch to the database. This process is the fastest by our
experience when using SQLite but can require a lot of memory, depending on how
many scenarios are aggregated.
"serial"
means that we will launch an R cluster and each node will read a
file, apply the aggregation function and send the result to the database. With
SQLite, this process has been slower than the "batch"
stragety but also less
memory intensive. We expect it to be quite performant when a database system
like PostgresSQL is used which allows multiple simultaneous write connections.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.