Introduction to Sub-Processes in R

library(subprocess)
library(knitr)

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Introduction

Since R is not really a systems-programming language[^systemslanguage] some facilities present in other such languages (e.g. C/C++, Python) haven't been yet brought to R. One of such features is process management which is understood here as the capability to create, interact with and control the lifetime of child processes.

The R package subprocess aims at filling this gap by providing the few basic operations to carry out the aforementioned tasks. The spawn_subprocess() function starts a new child process and returns a handle which can be later used in process_read() and process_write() to send and receive data or in process_wait() or process_terminate() to stop the such a process.

The R subprocess package has been designed after the exemplary Python package which goes by the same. Its documentation can be found here and numerous examples of how it can be used can be found on the Web.

The R subprocess package has been verified to run on Linux, Windows and MacOS.

[^systemslanguage]: "By systems programming I mean writing code that directly uses hardware resources, has serious resource constraints, or closely interacts with code that does." Bjarne Stroustrup, "The C++ Programming Language"

Design and Implementation

The main concept in the package is the handle which holds process identifiers and an external pointer object which in turn is a handle to a low-level data structure holding various system-level parameters of the running sub-process.

A child process, once started, runs until it exits on its own or until its killed. Its current state as well as its exit status can be obtained by dedicated API.

Communication with the child process can be carried out over the three standard streams: the standard input, standard output and standard error output. These streams are intercepted on the child process' side and redirected into three anonymous pipes whose other ends are held by the parent process and can be accessed through the process handle.

In Linux these are regular pipes created with the pipe() system call and opened in the non-blocking mode. All communication takes place on request and follows the usual OS rules (e.g. the sub-process will sleep if its output buffer gets filled).

In Windows these pipes are created with the CreatePipe() call and opened in the blocking mode. Windows does not support non-blocking (overlapped in Windows-speak) mode for anonymous pipes. For that reason each stream has an accompanying reader thread. Reader threads are separated from R interpreter, do not exchange memory with the R interpreter and will not break the single-thread assumption under which R operates.

Introduction to Examples

Before we move on to examples, let's define a few helper functions that abstract out the underlying operating system. We will use them throughout this vignette.

is_windows <- function () (tolower(.Platform$OS.type) == "windows")

R_binary <- function () {
  R_exe <- ifelse (is_windows(), "R.exe", "R")
  return(file.path(R.home("bin"), R_exe))
}

Just for the record, vignette has been built in r ifelse(is_windows(), "Windows", "Linux").

ifelse(is_windows(), "Windows", "Linux")

Now we can load the package and move on to the next section.

library(subprocess)

Example: controlling chlid R process

In this example we spawn a new R process, send a few commands to its standard input and read the responses from its standard output. First, let's spawn the child process (and give it a moment to complete the start-up sequence[^syssleep]):

handle <- spawn_process(R_binary(), c('--no-save'))
Sys.sleep(1)

[^syssleep]: Depending on the system load, R can take a few seconds to start and be ready for input. This is true also for other processes. Thus, you will see Sys.sleep() following spawn_process() in almost every example in this vignette.

Let's see the description of the child process:

print(handle)

And now let's see what we can find it the child's output:

process_read(handle, PIPE_STDOUT, timeout = 1000)
process_read(handle, PIPE_STDERR)

The first number in the output is the value returned by process_write which is the number of characters written to standard input of the child process. The final character(0) is the output read from the standard error stream.

Next, we create a new variable in child's session. Please notice the new-line character at the end of the command. It triggers the child process to process its input.

process_write(handle, 'n <- 10\n')
process_read(handle, PIPE_STDOUT, timeout = 1000)
process_read(handle, PIPE_STDERR)

Now it's time to use this variable in a function call:

process_write(handle, 'rnorm(n)\n')
process_read(handle, PIPE_STDOUT, timeout = 1000)
process_read(handle, PIPE_STDERR)

Finally, we exit the child process:

process_write(handle, 'q(save = "no")\n')
process_read(handle, PIPE_STDOUT, timeout = 1000)
process_read(handle, PIPE_STDERR)

The last thing is making sure that the child process is no longer alive:

process_state(handle)
process_return_code(handle)

Of course there is little value in running a child R process since there are multiple other tools that let you do that, like parallel, Rserve and opencpu to name just a few. However, it's quite easy to imagine how running a remote shell in this manner enables new ways of interacting with the environment. Consider running a local shell:

shell_binary <- function () {
  ifelse (tolower(.Platform$OS.type) == "windows",
          "C:/Windows/System32/cmd.exe", "/bin/sh")
}

handle <- spawn_process(shell_binary())
print(handle)

Now we can interact with the shell sub-process. We send a request to list the current directory, then give it a moment to process the command and produce the output (and maybe finish its start-up, too). Finally, we check its output streams.

process_write(handle, "ls\n")
Sys.sleep(1)
process_read(handle, PIPE_STDOUT)
process_read(handle, PIPE_STDERR)

Advanced techniques

Terminating a child process

If the child process needs to be terminated one can choose to:

Assume the child R process is hung and there is no way to stop it gracefully. process_wait(handle, 1000) waits for 1 second (1000 milliseconds) for the child process to exit. It then returns NA and process_terminate() gives R a chance to exit graceully. Finally, process_kill() forces it to exit.

sub_command <- "library(subprocess);subprocess:::signal(15,'ignore');Sys.sleep(1000)"
handle <- spawn_process(R_binary(), c('--slave', '-e', sub_command))
Sys.sleep(1)

# process is hung
process_wait(handle, 1000)
process_state(handle)

# ask nicely to exit; will be ignored in Linux but not in Windows
process_terminate(handle)
process_wait(handle, 1000)
process_state(handle)

# forced exit; in Windows the same as the previous call to process_terminate()
process_kill(handle)
process_wait(handle, 1000)
process_state(handle)

We see that the child process remains running until it receives the SIGKILL signal[^signal]. The final return code (exit status) is the number of the signal that caused the child process to exit[^status].

[^termination]: In Windows, process_terminate() is an alias for process_kill(). They both lead to immediate termination of the child process.

[^signal]: The .Call("C_signal") in our example is a call to a hidden C function that subprocess provides mainly for the purposes of this example.

[^status]: See the waitpid() manual page, e.g. here.

Sending a signal to the child process

The last topic we want to cover here is sending an arbitrary[^windowssignals] signal to the child process. Signals can be listed by looking at the signals variable present in the package. It is constructed automatically when the package is loaded and its value on Linux is different than on Windows. In the example below we see the first three elements of the Linux list of signals.

length(signals)
signals[1:3]

All possible signal identifiers are supported directly from the subprocess package. Signals not supported on the current platform are set to NA and the rest have their OS-specific numbers as their values.

ls(pattern = 'SIG', envir = asNamespace('subprocess'))

Now we can create a new child process and send an arbitrary using its handle.

handle <- spawn_process(R_binary, '--slave')

process_send_signal(handle, SIGUSR1)

[^windowssignals]: The list of signals supported in Windows is much shorter than the list of signals supported in Linux and contains the following three signals: SIGTERM, CTRL_C_EVENT and CTRL_BREAK_EVENT.



Try the subprocess package in your browser

Any scripts or data that you put into this service are public.

subprocess documentation built on May 2, 2019, 4:04 p.m.