create_jobs: create_jobs

View source: R/createJob.R

create_jobsR Documentation

create_jobs

Description

Creates a jobs on BOINC server.

Usage

create_jobs(
  connection,
  work_func,
  data = NULL,
  n = NULL,
  init_func = NULL,
  global_vars = NULL,
  packages = c(),
  files = c(),
  install_func = NULL
)

Arguments

connection

a connection returned by create_connection.

work_func

data processing function with prototype function(data_element) if 'data' is specified or function() if 'n' is specified. This function runs for each element in data. This function can be recursive.

data

data for processing. Must be a numerable list or vector.

n

a number of jobs. This parameter must be less than or equal to the length of the data. If not specified, then the number of jobs will be equal to the length of the data.

init_func

initialization function with prototype function(). This function runs once at the start of a job before the job is split into separate threads. This function can not to be recursive.

global_vars

a list in the format <variable name>=<value>.

packages

a string vector with imported packages names.

files

a string vector with the files names that should be available for jobs.

install_func

installation function with prototype function(packages), where packages is a vector with package names which cannot be installed from repositories. This function can not to be recursive.

Details

This function automatically breaks the data into n parts and creates n jobs. The number of jobs must be greater than zero.

Parameter init_func is necessary for additional initialization, for example, for compiling C++ functions from sources transferred through files parameter. It runs for all computation nodes but not for main node.

The job is performed as follows:

  1. The necessary packages are first loaded/installed;

  2. If some packages were not installed, the RBOINC_additional_inst_func function is called which is renamed install_func.

  3. The RBOINC_work_func and RBOINC_init_func functions are loaded which are renamed work_func and init_func;

  4. The RBOINC_data object is loaded which is renamed part of data;

  5. The working folder changes to the one where the files were copied;

  6. According to the number of detected cores, a cluster is created with the name "RBOINC_cluster"

  7. The registerDoParallel(RBOINC_cluster) function is called;

  8. The original name of the RBOINC_work_func function is restored;

  9. global_vars are copied to the global environment;

  10. The RBOINC_init_func() function is called;

  11. The job is divided into sub-tasks and is performed in parallel.

  12. Execution results are collected together and sent to the BOINC server.

Restrictions

Don't create or use objects that begin with the prefix RBOINC_.

Don't rely on any packages to be loaded automatically. Specify the necessary packages explicitly through the packages parameter.

Don't pass in global_vars objects that cannot be saved, such as functions compiled from C++ code.

Packages passed in packages will be installed from the repositories specified in your R environment. Additionally, https://cloud.r-project.org is added to the list of repositories. Only CRAN-like repositories are supported.

Packages that require compilation may depend on header files and libraries that are not in the VM. Such packages cannot be installed in the standard way.

init_func is only called on processing nodes created by makeCluster. if nothing is being processed in the master node, it will not be called in master node.

install_func is always called. As a parameter, it is passed a vector of strings with the names of packages for which the installation failed a parameter equal to the vector with names of packages installation of that is failed. If you need to use functions from packages passed to packages, then refer to them with a colon.

Errors and warnings

When errors occur, execution can be stopped with the following messages:

  • for http connections:

    • "You can not create jobs."

    • "BOINC server error: "<server message>"."

  • for unknown connections:

    • "Unknown protocol."

  • for any connection:

    • "The number of tasks must be greater than 0."

    • "The number of tasks must be less than or equal to the length of the data."

    • "Archive making error: <error message>"

    • "You must specify 'data' or 'n'."

Value

a list with current states of jobs. This list contains the following fields:

  • batch_id - ID of the batch that includes the jobs;

  • jobs_name - a name of jobs on BOINC server;

  • results - computation results (NULL if computation is still incomplete); The length of this list is equal to the length of the data;

  • jobs_status - human-readable status for each job;

  • status - computation status, may be:

    • "initialization" - jobs have been submitted to the server, but their status was not requested by update_jobs_status;

    • "in_progress" - BOINC serves jobs;

    • "done" - computations are complete, the results were downloaded;

    • "warning" a recoverable error occurred during the job processing;

    • "error" - a serious error occurred during the job processing;

    • "aborted" - processing was canceled using the cancel_jobs function.

Examples

## Not run: 

# Function for data processing:
fun = function(val)
{
   ...
}

# Data for processing:
data = list(...)

# Connection to the BOINC server:
con = create_connection(...)

# Send jobs to BOINC server:
jobs = create_jobs(con, fun, data)

# Get status for jobs:
jobs = update_jobs_status(con, jobs)

# Release resources:
close_connection(con)

## End(Not run)

RBOINC.cl documentation built on Nov. 15, 2022, 3 a.m.