daemons: Daemons (Set Persistent Processes)

View source: R/daemons.R

daemonsR Documentation

Daemons (Set Persistent Processes)

Description

Set 'daemons' or persistent background processes to receive mirai requests. Specify 'n' to create daemons on the local machine. Specify 'url' for receiving connections from remote daemons (for distributed computing across the network). Specify 'remote' to optionally launch remote daemons via a remote configuration. By default, dispatcher ensures optimal scheduling.

Usage

daemons(
  n,
  url = NULL,
  remote = NULL,
  dispatcher = TRUE,
  ...,
  resilience = TRUE,
  seed = NULL,
  tls = NULL,
  pass = NULL,
  .compute = "default"
)

Arguments

n

integer number of daemons to set.

url

[default NULL] if specified, the character URL or vector of URLs on the host for remote daemons to dial into, including a port accepting incoming connections (and optionally for websockets, a path), e.g. 'tcp://hostname:5555' or 'ws://10.75.32.70:5555/path'. Specify a URL starting 'tls+tcp://' or 'wss://' to use secure TLS connections. Auxiliary function host_url may be used to construct a valid host URL.

remote

[default NULL] required only for launching remote daemons, a configuration generated by remote_config or ssh_config.

dispatcher

[default TRUE] logical value whether to use dispatcher. Dispatcher is a local background process that connects to daemons on behalf of the host and ensures FIFO scheduling (see Dispatcher section below).

...

(optional) additional arguments passed through to dispatcher if using dispatcher and/or daemon if launching daemons. These include 'token' at dispatcher and 'autoexit, 'cleanup', 'output', 'maxtasks', 'idletime', 'walltime' and 'timerstart' at daemon.

resilience

[default TRUE] (applicable when not using dispatcher) logical value whether to retry failed tasks on other daemons. If FALSE, an appropriate 'errorValue' will be returned in such cases.

seed

[default NULL] (optional) supply a random seed (single value, interpreted as an integer). This is used to inititalise the L'Ecuyer-CMRG RNG streams sent to each daemon. Note that reproducible results can be expected only for 'dispatcher = FALSE', as the unpredictable timing of task completions would otherwise influence the tasks sent to each daemon. Even for 'dispatcher = FALSE', reproducibility is not guaranteed if the order in which tasks are sent is not deterministic.

tls

[default NULL] (optional for secure TLS connections) if not supplied, zero-configuration single-use keys and certificates are automatically generated. If supplied, either the character path to a file containing the PEM-encoded TLS certificate and associated private key (may contain additional certificates leading to a validation chain, with the TLS certificate first), or a length 2 character vector comprising [i] the TLS certificate (optionally certificate chain) and [ii] the associated private key.

pass

[default NULL] (required only if the private key supplied to 'tls' is encrypted with a password) For security, should be provided through a function that returns this value, rather than directly.

.compute

[default 'default'] character compute profile to use for creating the daemons (each compute profile has its own set of daemons for connecting to different resources).

Details

Use daemons(0) to reset daemon connections:

  • A reset is required before revising settings for the same compute profile, otherwise changes are not registered.

  • All connected daemons and/or dispatchers exit automatically.

  • mirai reverts to the default behaviour of creating a new background process for each request.

  • Any unresolved 'mirai' will return an 'errorValue' 7 (Object closed) after a reset.

If the host session ends, all connected dispatcher and daemon processes automatically exit as soon as their connections are dropped (unless the daemons were started with autoexit = FALSE). If a daemon is processing a task, it will exit as soon as the task is complete.

To reset persistent daemons started with autoexit = FALSE, use daemons(NULL) instead, which also sends exit signals to all connected daemons prior to resetting.

For historical reasons, daemons() with no arguments returns the value of status.

Value

Depending on the arguments supplied:

  • using dispatcher: integer number of daemons set.

  • or else launching local daemons: integer number of daemons launched.

  • otherwise: the character host URL.

Local Daemons

Daemons provide a potentially more efficient solution for asynchronous operations as new processes no longer need to be created on an ad hoc basis.

Supply the argument 'n' to set the number of daemons. New background daemon processes are automatically created on the local machine connecting back to the host process, either directly or via dispatcher.

Dispatcher

By default dispatcher = TRUE. This launches a background process running dispatcher. Dispatcher connects to daemons on behalf of the host and queues tasks until a daemon is able to begin immediate execution of that task, ensuring FIFO scheduling. Dispatcher uses synchronisation primitives from nanonext, waiting rather than polling for tasks, which is efficient both in terms of consuming no resources while waiting, and also being fully synchronised with events (having no latency).

By specifying dispatcher = FALSE, daemons connect to the host directly rather than through dispatcher. The host sends tasks to connected daemons immediately in an evenly-distributed fashion. However, optimal scheduling is not guaranteed as the duration of tasks cannot be known a priori, such that tasks can be queued at a daemon behind a long-running task while other daemons remain idle. Nevertheless, this provides a resource-light approach suited to working with similar-length tasks, or where concurrent tasks typically do not exceed available daemons.

Distributed Computing

Specifying 'url' allows tasks to be distributed across the network. This should be a character string such as: 'tcp://10.75.32.70:5555' at which daemon processes should connect to. Switching the URL scheme to 'tls+tcp://' or 'wss://' automatically upgrades the connection to use TLS. The auxiliary function host_url may be used to automatically construct a valid host URL based on the computer's hostname.

Specify 'remote' with a call to remote_config or ssh_config to launch daemons on remote machines. Otherwise, launch_remote may be used to generate the shell commands to deploy daemons manually on remote resources.

IPv6 addresses are also supported and must be enclosed in square brackets [ ] to avoid confusion with the final colon separating the port. For example, port 5555 on the IPv6 loopback address ::1 would be specified as 'tcp://[::1]:5555'.

Specifying the wildcard value zero for the port number e.g. 'tcp://[::1]:0' or 'ws://[::1]:0' will automatically assign a free ephemeral port. Use status to inspect the actual assigned port at any time.

With Dispatcher

When using dispatcher, it is recommended to use a websocket URL rather than TCP, as this requires only one port to connect to all daemons: a websocket URL supports a path after the port number, which can be made unique for each daemon.

Specifying a single host URL such as 'ws://10.75.32.70:5555' with n = 6 will automatically append a sequence to the path, listening to the URLs 'ws://10.75.32.70:5555/1' through 'ws://110.75.32.70:5555/6'.

Alternatively, specify a vector of URLs to listen to arbitrary port numbers / paths. In this case it is optional to supply 'n' as this can be inferred by the length of vector supplied.

Individual daemons then dial in to each of these host URLs, and at most one daemon should be dialled into each URL at any given time.

Dispatcher automatically adjusts to the number of daemons actually connected. Hence it is possible to dynamically scale up or down the number of daemons as required, subject to the maximum number initially specified.

Alternatively, supplying a single TCP URL will listen at a block of URLs with ports starting from the supplied port number and incrementing by one for 'n' specified e.g. the host URL 'tcp://10.75.32.70:5555' with n = 6 listens to the contiguous block of ports 5555 through 5560.

Without Dispatcher

A TCP URL may be used in this case as the host listens at only one address, utilising a single port.

The network topology is such that daemons (started with daemon) or indeed dispatchers (started with dispatcher) dial into the same host URL.

'n' is not required in this case, and disregarded if supplied, as network resources may be added or removed at any time. The host automatically distributes tasks to all connected daemons and dispatchers in a round-robin fashion.

Compute Profiles

By default, the 'default' compute profile is used. Providing a character value for '.compute' creates a new compute profile with the name specified. Each compute profile retains its own daemons settings, and may be operated independently of each other. Some usage examples follow:

local / remote daemons may be set with a host URL and specifying '.compute' as 'remote', which creates a new compute profile. Subsequent mirai calls may then be sent for local computation by not specifying its '.compute' argument, or for remote computation to connected daemons by specifying its '.compute' argument as 'remote'.

cpu / gpu some tasks may require access to different types of daemon, such as those with GPUs. In this case, daemons() may be called twice to set up host URLs for CPU-only daemons and for those with GPUs, specifying the '.compute' argument as 'cpu' and 'gpu' respectively. By supplying the '.compute' argument to subsequent mirai calls, tasks may be sent to either 'cpu' or 'gpu' daemons as appropriate.

Note: further actions such as resetting daemons via daemons(0) should be carried out with the desired '.compute' argument specified.

Everywhere

everywhere evaluates an expression on all connected daemons and persists the resultant state. This is designed for setting up the evaluation environment, with particular packages loaded, or common resources made available, etc.

Timeouts

Specifying the .timeout argument in mirai will ensure that the 'mirai' always resolves.

However, the task may not have completed and still be ongoing in the daemon process. In such situations, dispatcher ensures that queued tasks are not assigned to the busy process, however overall performance may still be degraded if they remain in use. If a process hangs and cannot be restarted manually, saisei specifying force = TRUE may be used to cancel the task and regenerate any particular URL for a new daemon to connect to.

Examples

if (interactive()) {
# Only run examples in interactive R sessions

# Create 2 local daemons (using dispatcher)
daemons(2)
status()
# Reset to zero
daemons(0)

# Create 2 local daemons (not using dispatcher)
daemons(2, dispatcher = FALSE)
status()
# Reset to zero
daemons(0)

# 2 remote daemons via dispatcher using WebSockets
daemons(2, url = host_url(ws = TRUE))
status()
# Reset to zero
daemons(0)

# Set host URL for remote daemons to dial into
daemons(url = host_url(), dispatcher = FALSE)
status()
# Reset to zero
daemons(0)

# Launch 2 daemons on remotes 'nodeone' and 'nodetwo' using SSH
# connecting back directly to the host URL over a TLS connection:
#
# daemons(url = host_url(tls = TRUE),
#         remote = ssh_config(c('ssh://nodeone', 'ssh://nodetwo')),
#         dispatcher = FALSE)

# Launch 4 daemons on the remote machine 10.75.32.90 using SSH tunnelling
# over port 5555 ('url' hostname must be 'localhost' or '127.0.0.1'):
#
# daemons(n = 4,
#         url = 'ws://localhost:5555',
#         remote = ssh_config('ssh://10.75.32.90', tunnel = TRUE))

}


mirai documentation built on Nov. 16, 2023, 5:08 p.m.