README.md

threadpool

The threadpool package implements a simple parallel programming backend using a thread pool model that can be dynamically resized during operation. The package builds clusters by setting up a queue of input jobs and having the individual cluster nodes retrieve jobs from the queue one at a time. The queue is stored on-disk and nodes communicate with each other via the filesystem.

A few benefits of this model of parallel computing are

This package will be most useful in situations where the jobs being run

Installation

You can install threadpool from GitHub with:

# install.packages("remotes")
remotes::install_github("rdpeng/threadpool")

The threadpool package makes use of the thor package by Rich FitzJohn. This package can be installed from GitHub with:

remotes::install_github("richfitz/thor")

In addition, you need to install the queue package from GitHub with:

remotes::install_github("rdpeng/queue")

Example

This is a basic example of how you might invoke the threadpool package. The basic approach is to initialize the cluster (which also adds tasks to the cluster), join the cluster, and then run it by starting the execution of jobs. Once the cluster is finished running, you can retrieve the results. Finally, once we are finished we can delete the cluster.

library(threadpool)

data(airquality)
x <- as.list(airquality)

f <- function(x) {
        mean(x, na.rm = TRUE)
}

## Initialize the cluster
cl <- cluster_initialize("my_cluster", x, f)

## Run jobs
cl <- cluster_run(cl)
#> Starting cluster node: 53666

## Gather the output
r <- cluster_reduce(cl)
r[1:3]
#> [[1]]
#> [1] 15.80392
#> 
#> [[2]]
#> [1] 42.12931
#> 
#> [[3]]
#> [1] 77.88235

## Clean up
delete_cluster("my_cluster")


rdpeng/threadpool documentation built on Nov. 20, 2019, 2 p.m.