e2e_merge_sens_mc: Combine two or more sets of raw output data from parallel...

Description Usage Arguments Details Value See Also Examples

View source: R/e2e_merge_sens_mc.R

Description

The functions e2e_run_sens() and e2e_run_mc() are extremely time consuming so it makes sense to share the load across multiple processors in parallel and combine the results afterwards. This function merges the raw outputs from multiple separate runs of either function into a single file. This is not as simple as merely concatenating the files as it is necessary to keep track of the unique identities of the sets of trajectories.

Usage

1
2
3
4
5
6
7
e2e_merge_sens_mc(
  model,
  selection = "",
  ident.list,
  postprocess = TRUE,
  csv.output = FALSE
)

Arguments

model

Model object for the raw data to be combined generated by the e2e_read() function.

selection

Text string from a list identifying source of data to be merged. Select from: "SENS", "MC", referring to sensitivity analysis or Monte Carlo analysis. Remember to include the phrase within "" quotes.

ident.list

A vector of text variables corresponding to the "model.ident" identifiers for each of the files to merged (list must be length 2 or greater).

postprocess

Logical. if TRUE then process the results through to final data; if FALSE just produce the combined raw results (default=TRUE). The reason for NOT processing would be if there are further run results stil to be combined with the set produced by this function.

csv.output

Logical. If TRUE then enable writing of CSV output files (default=FALSE).

Details

The files to be combined must be transferred into the same folder, and this is where the new combined files will be placed. The path to locate the files is set in a e2e_read() function call. If not specified it is assumed that the files are located in the current temporary folder.

An identifying text string for the new combined files is set by the 'model.ident' argument in a e2e_read() function call.

The list of files to be combined (any number > 1) is defined by a vector of their individual "model.ident" identifiers ("ident.list" argument).

When combining the files, the function creates a seamless sequence of trajectory identifiers through the combined data, beginning from 1 for the first baseline (maximum likelihood) trajectory of the first set.

If for any reason there is a need to combine separate batches of multiple run results, then post-processing can be delayed with the 'postprocess' argument until the last merge when all the data have been gathered together. Stand-alone postprocess can be performed using the function e2e_process_sens_mc().

Details relating to merging sensitivity analysis files:

e2e_run_sens() generates two output files per run - OAT_results-*.csv, and OAT_parameter_values-*.csv, where * is a model.ident text string set as an argument of the e2e_read() function. Thus function merges both of these types of files.

When combining sensitivity analysis data the first-named model.ident in the ident.list vector MUST correspond to a run of the e2e_run_sens() function with the argument coldstart=TRUE, and all others with coldstart=FALSE. This forces the first trajectory of sensitivity test to be performed on the baseline (e.g. maximum-likelihood) parameter set loaded with the initial e2e_read() function call. Thus is important for the post-processing stage of the analysis which needs to be performed on the combined results.

Details relating to merging Monte Carlo analysis files:

e2e_run_mc() generates 11 different output files of accumulated outputs from the iterations (total 3.2 Mb per iteration) covering the range of model outputs. This merging function combines batches of each of these types of files. Check the memory capacity of you machine before starting a long run or before merging runs. The processed output consists of 13 files total ~4 Mb regardless of the number of iterations.

Value

csv files of merged data if csv.output=TRUE. If the argument postprocess=TRUE then also a dataframe of processed output (for sensitivity analysis) or a list object of dataframes (for Monte Carlo analysis).

See Also

e2e_read , e2e_run_sens , e2e_run_mc , e2e_process_sens_mc , e2e_plot_sens_mc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## Not run: 
# The examples provided here are illustration of how to merge results from parallel
# sets of sensitivity or Monte Carlo analysis runs. Even though they are stripped-down
# minimalist examples they each still take a very long time to run.

# --------------------------------------------------------------------------

# Example of parallelizing the sensitivity analysis process.
# Here the model is only run for a 1 year each time and for 3 trajectories
# on each processor in order to illustrate the approach. A meaningful simulation 
# would require many more years per run and more trajectories. Even so this 
# example will be very time consuming.
# In the illustration here csv output is directed to a temporary folder since 
# results.path is unspecified. To explore further, set up your own results folder and   
# define results.path as relative to the current working directory.
# Launch two (or more) runs separately on different processors...
# Launch batch 1 (on processor 1):
    model1 <- e2e_read("North_Sea", "1970-1999", model.ident="BATCH1")
    sens_results <- e2e_run_sens(model1, nyears=1, n_traj=3, coldstart=TRUE, 
                                 postprocess=FALSE, csv.output=TRUE)
# Note that coldstart=TRUE for the first batch only.
# Launch batch 2 (on processor 2):
    model2 <- e2e_read("North_Sea", "1970-1999", model.ident="BATCH2")
    sens_results <- e2e_run_sens(model2, nyears=1, n_traj=3, coldstart=FALSE, 
                                 postprocess=FALSE, csv.output=TRUE)
# Note that these two runs return only raw data since postprocess=FALSE
#
# Then, afterwards, merge the two raw results files with text-tags BATCH1 and BATCH2,
# and post process the combined file:
    model3 <- e2e_read("North_Sea", "1970-1999", model.ident="COMBINED")
    processed_data <- e2e_merge_sens_mc(model3, selection="SENS",
          ident.list<-c("BATCH1","BATCH2"), postprocess=TRUE, csv.output=TRUE)
# or...
    combined_data <- e2e_merge_sens_mc(model3, selection="SENS",
          ident.list<-c("BATCH1","BATCH2"), postprocess=FALSE, csv.output=TRUE)
    processed_data <- e2e_process_sens_mc(model3, selection="SENS",
                                         use.example=FALSE,csv.output=TRUE)

## End(Not run)

# --------------------------------------------------------------------------


# Example of parallelizing the Monte Carlo process:
# Here the model is only run for a 2 year each time and for a 5 (or 6) trajectories
# on each processor in order to illustrate the approach. A meaningful simulation 
# would require many more years per run and more trajectories. Even so this 
# example will be time consuming.
# In the illustration here csv output is directed to a temporary folder since 
# results.path is unspecified. To explore further, set up your own results folder and   
# define results.path as relative to the current working directory.
# Launch two (or more) runs separately on different processors...
# Launch batch 1 (on processor 1)
    model1 <- e2e_read("North_Sea", "1970-1999", model.ident="BATCH1")
    results1 <- e2e_run_mc(model1,nyears=2,baseline.mode=TRUE,
                               n_iter=5,csv.output=TRUE,postprocess=FALSE)                
# Launch batch 2 (on processor 2):
    model2 <- e2e_read("North_Sea", "1970-1999", model.ident="BATCH2")
    results2 <- e2e_run_mc(model2,nyears=2,baseline.mode=TRUE,
                           n_iter=6,csv.output=TRUE,postprocess=FALSE) 
# Note that these two runs return only raw data since postprocess=FALSE
# Note 6 iterations in batch 2 - the first iteration wil be stripped off at merging so the
# combined data should include only 10 iterations.

# Then, afterwards, merge the two raw results files with text-tags BATCH1 and BATCH2,
# and post process the combined file:
    model3 <- e2e_read("North_Sea", "1970-1999", model.ident="COMBINED")
    processed_data <- e2e_merge_sens_mc(model3, selection="MC",
               ident.list<-c("BATCH1","BATCH2"), postprocess=TRUE, csv.output=TRUE)
# or...
    combined_data <- e2e_merge_sens_mc(model3, selection="MC",
           ident.list<-c("BATCH1","BATCH2"), postprocess=FALSE, csv.output=TRUE)
    processed_data<-e2e_process_sens_mc(model3, selection="MC",use.example=FALSE,csv.output=TRUE)

# --------------------------------------------------------------------------

StrathE2E2 documentation built on Jan. 23, 2021, 1:07 a.m.