library(crew) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
crew
facilitates computationally demanding workloads that consume a lot of memory and CPU resources, and workloads that exhaust these resources may stall or crash. Oftentimes, a crash is silent and there are no clues for troubleshooting. But if you proactively monitor crew
processes in log files, you can keep track of resource usage and generated prospective data which is useful in diagnosing problems. This vignette describes best practices for logging.
crew
runs in 3 different kinds of R processes:
crew_controller_local()
or similar.mirai
dispatcher which sends tasks to workers. The dispatcher starts on controller$start()
and terminates on controller$terminate()
or when the local R session closes. The dispatcher always runs on the same machine as (1).This vignette first covers logging for local the R session and dispatcher (1 and 2), then logging for workers (3).
Each launcher plugin has its own way to turn on logging for workers. In the case of the local plugin (crew_controller_local()
), the crew_options_local()
has arguments to configure how messages are written to log file, and each worker instance gets its own log file in a directory you specify. Other plugins in packages such as crew.cluster
and crew.aws.batch
have different arguments to configure platform-specific logging mechanisms (e.g. SLURM logs or AWS CloudWatch logs). Example for the local plugin:
library(crew) log_directory <- tempfile() controller <- crew_controller_local( workers = 2, options_local = crew_options_local(log_directory = log_directory) ) controller$start() controller$push(print("task 1")) controller$push(print("task 2")) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE) writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 1" writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> [1] "task 2"
To add resource metrics to existing log files, regardless of your platform, use the options_metrics
argument and set path = "/dev/stdout"
.^[Windows does not have /dev/stdout
, but path
can be a directory on disk as well. In that case, crew
will write the log files there, instead of to the existing log files specified in options_local
.]
library(crew) log_directory <- tempfile() controller <- crew_controller_local( name = "a", workers = 2, options_local = crew_options_local(log_directory = log_directory), options_metrics = crew_options_metrics( path = "/dev/stdout", seconds_interval = 1 ) ) controller$start() controller$push( name = "task1", command = { Sys.sleep(2.5) print("task 1") Sys.sleep(2.5) } ) controller$push({ Sys.sleep(5) print("task 2") }) controller$wait() controller$terminate() log_files <- list.files(log_directory, full.names = TRUE)
writeLines(readLines(log_files[1])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624627.805|2.500|0.250|113688576|420894687232|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624628.809|0.000|0.000|113721344|420903075840|task1|__AUTOMETRIC__ #> [1] "task 1" #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624629.815|0.000|0.000|113754112|420911464448|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624630.818|0.000|0.000|113754112|420911464448|task1|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61269|crew_worker_a_1_61f738a4efc23071f87a1697|0|1729624631.818|0.000|0.000|113786880|420911464448|__DEFAULT__|__AUTOMETRIC__
writeLines(readLines(log_files[2])) #> R version 4.4.0 (2024-04-24) #> ... # more session info... #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624627.820|1.700|0.170|114835456|420760469504|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624628.821|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624629.826|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> __AUTOMETRIC__|0.0.5.9001|61278|crew_worker_a_2_4d762c8805d675b1e1626a0e|0|1729624630.828|0.000|0.000|114900992|420769906688|unnamed_task_ef5ee3df7ae|__AUTOMETRIC__ #> [1] "task 2"
The lines with __AUTOMETRIC__
record resource usage metrics over the life cycle of each worker. Use autometric::log_read()
to read these metrics and autometric::log_plot()
to visualize them. Below, the time
column is elapsed time in seconds since logging began, and resident
is memory usage in megabytes. The phase
column shows which task was running at the time the log was recorded. A phase of __DEFAULT__
means the worker was idle (not running a task). More details are at autometric::log_read()
.
data <- autometric::log_read(log_directory) data$name <- substr(data$name, 0, 20) data #> version phase pid name status time core cpu resident virtual #> 1 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 0.000 2.5 0.25 113.6886 420894.7 #> 2 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 1.004 0.0 0.00 113.7213 420903.1 #> 3 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 2.010 0.0 0.00 113.7541 420911.5 #> 4 0.0.5.9001 task1 61269 crew_worker_a_1_61f7 0 3.013 0.0 0.00 113.7541 420911.5 #> 5 0.0.5.9001 __DEFAULT__ 61269 crew_worker_a_1_61f7 0 4.013 0.0 0.00 113.7869 420911.5 #> 6 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 0.000 1.7 0.17 114.8355 420760.5 #> 7 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 1.001 0.0 0.00 114.9010 420769.9 #> 8 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 2.006 0.0 0.00 114.9010 420769.9 #> 9 0.0.5.9001 unnamed_task_ef5ee3df7ae 61278 crew_worker_a_2_4d76 0 3.008 0.0 0.00 114.9010 420769.9
autometric::log_plot(data, pid = 43098, metric = "resident")
You can use the autometric
package directly to monitor local crew
processes (including the mirai
dispatcher). To begin, you need the process IDs of the local processes. As of crew
version 0.9.5.9012
, every controller has a pids()
method to get the process IDs of all the local processes.
library(crew) controller <- crew_controller_local() controller$start() controller$pids() #> local dispatcher-e07cd32eeac18cb9e1dd59fb #> 2104319 2104731
You can pass this named vector of process IDs directly into autometric::log_start()
, which starts a background thread which logs resource usage metrics to the text file of your choice.^[On Mac OS, you may not get usable metrics for the dispatcher because of security restrictions on low-level system calls.] Remember to call autometric::log_stop()
when you are done (or restart R) to stop logging.
path <- tempfile() autometric::log_start( path = path, seconds = 1, pids = controller$pids() ) controller$push(Sys.sleep(5)) controller$wait() autometric::log_stop() controller$terminate()
Use autometric::log_read()
to read these metrics and autometric::log_plot()
to visualize them. Below, the time
column is elapsed time in seconds since logging began, and resident
is memory usage in megabytes. More details are at autometric::log_read()
. Note: the phase
column does not show task information because only the workers know which tasks are running.
data <- autometric::log_read(path) data$name <- substr(data$name, 0, 10) data #> version phase pid name status time core cpu resident virtual #> 1 0.0.1 __DEFAULT__ 2104319 local 0 0.000 0.000 0.000 195.1007 9708.073 #> 2 0.0.1 __DEFAULT__ 2104731 dispatcher 0 0.000 0.000 0.000 31.7399 9514.222 #> 3 0.0.1 __DEFAULT__ 2104319 local 0 1.000 4.998 0.078 195.1007 9842.328 #> 4 0.0.1 __DEFAULT__ 2104731 dispatcher 0 1.000 0.000 0.000 31.7399 9849.901 #> 5 0.0.1 __DEFAULT__ 2104319 local 0 2.000 0.000 0.000 195.1007 9909.436 #> 6 0.0.1 __DEFAULT__ 2104731 dispatcher 0 2.000 0.000 0.000 31.7399 9984.119 #> 7 0.0.1 __DEFAULT__ 2104319 local 0 3.001 0.000 0.000 195.1007 9909.436 #> 8 0.0.1 __DEFAULT__ 2104731 dispatcher 0 3.001 0.000 0.000 31.7399 10118.337 #> 9 0.0.1 __DEFAULT__ 2104319 local 0 4.001 0.000 0.000 195.1007 9976.545 #> 10 0.0.1 __DEFAULT__ 2104731 dispatcher 0 4.001 0.000 0.000 31.7399 10185.445
targets
As explained at https://books.ropensci.org/targets/crew.html, crew
helps targets
orchestrate pipelines. For a targets
pipeline, to log all local processes as in the previous section, you will need to pass an already started controller to tar_option_set()
. To do this safely:
targets::tar_active()
in _targets.R
to check if tar_make()
is running.tar_make()
with callr_function = NULL
. This ensures the logging thread quits when the tar_make()
isolated callr
R process quits.Example _targets.R
file:
library(autometric) library(crew) library(targets) library(tarchetypes) controller <- crew_controller_local( workers = 2, options_metrics = crew_options_metrics( path = "worker_log_directory/", seconds_interval = 1 ) ) if (tar_active()) { controller$start() log_start( path = "log.txt", seconds = 1, pids = controller$pids() ) } tar_option_set(controller = controller) list( tar_target(name = sleep1, command = Sys.sleep(5)), tar_target(name = sleep2, command = Sys.sleep(5)) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.