optimalClusterNum: Determine the number of nodes to use in a new cluster

View source: R/makeClusters.R

optimalClusterNumGeneralizedR Documentation

Determine the number of nodes to use in a new cluster

Description

Optimally determine the number of cores to use to set up a new cluster, based on:

  1. the number of cores available (see note);

  2. the amount of free memory available on the local machine;

  3. the number of cores requested vs. the number available, such that if requesting more cores than available, the number of cores used will be adjusted to be a multiple of the number of cores needed, so jobs can be run in approximately-even-sized batches. (E.g., if 16 cores available but need 50, the time taken to run 3 batches of 16 plus a single batch of 2 – i.e., 4 batches total – is the same as running 4 batches of 13.)

Usage

optimalClusterNumGeneralized(
  memRequiredMB = 500,
  maxNumClusters = parallel::detectCores(),
  NumCoresAvailable = parallel::detectCores(),
  availMem = pemisc::availableMemory()/1e+06
)

optimalClusterNum(
  memRequiredMB = 500,
  maxNumClusters = parallel::detectCores()
)

Arguments

memRequiredMB

The amount of memory needed in MB

maxNumClusters

The number of nodes needed (requested)

NumCoresAvailable

The number of cores available on the local machine (see note).

availMem

The amount of free memory (RAM) available to use.

Value

integer specifying the number of cores

Note

R hardcodes the maximum number of socket connections it can use (currently set to 128 in R 4.1). Three of these are reserved for the main R process, so practically speaking, a user can create at most 125 connections e.g., when creating a cluster. See https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28.

We limit this a bit further here just in case the user already has open connections.


PredictiveEcology/pemisc documentation built on Sept. 19, 2022, 7 p.m.