pool: Manage parallel Azure connections

Description Usage Arguments Details See Also Examples

Description

Manage parallel Azure connections

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19

Arguments

size

For init_pool, the number of background R processes to create. Limit this is you are low on memory.

restart

For init_pool, whether to terminate an already running pool first.

...

Other arguments passed on to functions in the parallel package. See below.

Details

AzureRMR provides the ability to parallelise communicating with Azure by utilizing a pool of R processes in the background. This often leads to major speedups in scenarios like downloading large numbers of small files, or working with a cluster of virtual machines. This functionality is intended for use by packages that extend AzureRMR (and was originally implemented as part of the AzureStor package), but can also be called directly by the end-user.

A small API consisting of the following functions is currently provided for managing the pool. They pass their arguments down to the corresponding functions in the parallel package.

The pool is persistent for the session or until terminated by delete_pool. You should initialise the pool by calling init_pool before running any code on it. This restores the original state of the pool nodes by removing any objects that may be in memory, and resetting the working directory to the master working directory.

See Also

parallel::makeCluster, parallel::clusterCall, parallel::parLapply

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 

init_pool()

pool_size()

x <- 42
pool_export("x")
pool_sapply(1:5, function(i) i + x)

init_pool()
# error: x no longer exists on nodes
try(pool_sapply(1:5, function(i) i + x))

delete_pool()


## End(Not run)

Hong-Revo/AzureSMRbase documentation built on Aug. 1, 2020, 7:32 p.m.