parallelize: Subject a function to dynamic parallelization

Description Usage Arguments Details Value Important details Author(s) See Also Examples

Description

This function executes all necessary steps to perform a dynamic analysis of parallelism of a given function, create objects that encapsulate code that can be executed in parallel, transfer this to a execution backend which potentially is a remote target, recollect results and resume execution in a transparent way.

Usage

1
2
3
parallelize(.f, ..., Lapply_local = rget("Lapply_local", default = FALSE),
  parallelize_wait = TRUE)
parallelize_call(.call, ..., parallelize_wait = TRUE)

Arguments

.f

Function to be parallelized given as a function object.

...

Arguments passed to .f.

Lapply_local

Force local execution.

parallelize_wait

Force to poll completion of computation if backend returns asynchroneously

.call

Unevaluated call to be parallelized

Details

Function parallelize and parallelize_call both perform a dynamic parallelization of the computation of .f, i.e. parallelism is determined at run-time. Points of potential parallelism have to be indicated by the use of Apply/Sapply/Lapply (collectively Apply functions) instead of apply/sapply/lapply in existing code. The semantics of the new function is exactly the same as that of the original functions such that a simple upper-casing of these function calls makes existing programs amenable to parallelization. Parallelize will execute the function .f, recording the number of elements that are passed to Apply functions. Once a given threshold (degree of parallelization) is reached computation is stopped and remaining executions are done in parallel. A call parallelize_initialize function determines the precise mechanism of parallel execution and can be used to flexibly switch between different resources.

Value

The value returned is the result of the computation of function .f

Important details

Author(s)

Stefan Böhringer, r-packages@s-boehringer.org

See Also

Apply, Sapply, Lapply, parallelize_initialize

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
  # code to be parallelized
  parallel8 = function(e) log(1:e) %*% log(1:e);
  parallel2 = function(e) rep(e, e) %*% 1:e * 1:e;
  parallel1 = function(e) Lapply(rep(e, 15), parallel2);
  parallel0 = function() {
    r = sapply(Lapply(1:50, parallel1),
      function(e)sum(as.vector(unlist(e))));
    r0 = Lapply(1:49, parallel8);
    r
  }

  # create file that can be sourced containing function definitions
  # best practice is to define all needed functions in files that
  # can be sourced. The function tempcodefile allows to create a
  # temporary file with the defintion of given functions
  codeFile = tempcodefile(c(parallel0, parallel1, parallel2, parallel8));

  # definitions of clusters
  Parallelize_config = list(max_depth = 5, parallel_count = 24, offline = FALSE,
  backends = list(
    snow = list(localNodes = 2, splitN = 1, sourceFiles = codeFile),
    local = list(
      path = sprintf('%s/tmp/parallelize', tempdir())
    )
  ));

  # initialize
  parallelize_initialize(Parallelize_config, backend = 'local');

  # perform parallelization
  r0 = parallelize(parallel0);
  print(r0);

  # same
  r1 = parallelize_call(parallel0());
  print(r1);

  # compare with native execution
  parallelize_initialize(backend = 'off');
  r2 = parallelize(parallel0);
  print(r2);

  # put on SNOW cluster
  parallelize_initialize(Parallelize_config, backend = 'snow');
  r3 = parallelize(parallel0);
  print(r3);

  # analyse parallelization
  parallelize_initialize(Parallelize_config, backend = 'local');
  Log.setLevel(5);
  r4 = parallelize(parallel0);
  Log.setLevel(6);
  r5 = parallelize(parallel0);

  print(sprintf('All results are the same is %s', as.character(
    all(sapply(list(r0, r1, r2, r3, r4, r5), function(l)all(l == r0)))
  )));

parallelize.dynamic documentation built on May 2, 2019, 3:45 a.m.