Compute: Specify the execution parameters and trigger the execution

View source: R/Compute.R

ComputeR Documentation

Specify the execution parameters and trigger the execution

Description

The step of the startR workflow after the complete workflow is defined by AddStep(). This function specifies the execution parameters and triggers the execution. The execution can be operated locally or on a remote machine. If it is the latter case, the configuration of the machine needs to be sepecified in the function, and the EC-Flow server is required to be installed.

The execution can be operated by chunks to avoid overloading the RAM memory. After all the chunks are finished, Compute() will gather and merge them, and return a single data object, including one or multiple multidimensional data arrays and additional metadata.

Usage

Compute(
  workflow,
  chunks = "auto",
  workflow_manager = "ecFlow",
  threads_load = 1,
  threads_compute = 1,
  cluster = NULL,
  ecflow_suite_dir = NULL,
  ecflow_server = NULL,
  autosubmit_suite_dir = NULL,
  autosubmit_server = NULL,
  silent = FALSE,
  debug = FALSE,
  wait = TRUE
)

Arguments

workflow

A list of the class 'startR_workflow' returned by function AddSteop() or of class 'startR_cube' returned by function Start(). It contains all the objects needed for the execution.

chunks

A named list of dimensions which to split the data along and the number of chunks to make for each. The chunked dimension can only be those not required as the target dimension in function Step(). The default value is 'auto', which lists all the non-target dimensions and each one has one chunk.

workflow_manager

Can be NULL, 'ecFlow' or 'Autosubmit'. The default is 'ecFlow'.

threads_load

An integer indicating the number of parallel execution processes to use for the data retrieval stage. The default value is 1.

threads_compute

An integer indicating the number of parallel execution processes to use for the computation. The default value is 1.

cluster

A list of components that define the configuration of the machine to be run on. The comoponents vary from the different machines. Check Practical guide on GitLab for more details and examples. Only needed when the computation is not run locally. The default value is NULL.

ecflow_suite_dir

A character string indicating the path to a folder in the local workstation where to store temporary files generated for the automatic management of the workflow. Only needed when the execution is run remotely. The default value is NULL.

ecflow_server

A named vector indicating the host and port of the EC-Flow server. The vector form should be c(host = 'hostname', port = port_number). Only needed when the execution is run remotely. The default value is NULL.

autosubmit_suite_dir

A character string indicating the path to a folder where to store temporary files generated for the automatic management of the workflow manager. This path should be available in local workstation as well as autosubmit machine. The default value is NULL, and a temporary folder under the current working folder will be created.

autosubmit_server

A character vector indicating the login node of the autosubmit machine. It can be "bscesautosubmit01" or "bscesautosubmit02". The default value is NULL, and the node will be randomly chosen.

silent

A logical value deciding whether to print the computation progress (FALSE) on the R session or not (TRUE). It only works when the execution runs locally or the parameter 'wait' is TRUE. The default value is FALSE.

debug

A logical value deciding whether to return detailed messages on the progress and operations in a Compute() call (TRUE) or not (FALSE). Automatically changed to FALSE if parameter 'silent' is TRUE. The default value is FALSE.

wait

A logical value deciding whether the R session waits for the Compute() call to finish (TRUE) or not (FALSE). If FALSE, it will return an object with all the information of the startR execution that can be stored in your disk. After that, the R session can be closed and the results can be collected later with the Collect() function. The default value is TRUE.

Value

A list of data arrays for the output returned by the last step in the specified workflow (wait = TRUE), or an object with information about the startR execution (wait = FALSE). The configuration details and profiling information are attached as attributes to the returned list of arrays.

Examples

 data_path <- system.file('extdata', package = 'startR')
 path_obs <- file.path(data_path, 'obs/monthly_mean/$var$/$var$_$sdate$.nc')
 sdates <- c('200011', '200012')
 data <- Start(dat = list(list(path = path_obs)),
               var = 'tos',
               sdate = sdates,
               time = 'all',
               latitude = 'all',
               longitude = 'all',
               return_vars = list(latitude = 'dat',
                                  longitude = 'dat',
                                  time = 'sdate'),
               retrieve = FALSE)
 fun <- function(x) {
           lat = attributes(x)$Variables$dat1$latitude
           weight = sqrt(cos(lat * pi / 180))
           corrected = Apply(list(x), target_dims = "latitude",
                             fun = function(x) {x * weight})
         }
 step <- Step(fun = fun,
              target_dims = 'latitude',
              output_dims = 'latitude',
              use_libraries = c('multiApply'),
              use_attributes = list(data = "Variables"))
 wf <- AddStep(data, step)
 res <- Compute(wf, chunks = list(longitude = 4, sdate = 2))


startR documentation built on Sept. 12, 2023, 5:07 p.m.

Related to Compute in startR...