observer: Creates an observer for an rrqueue
In traitecoevo/rrqueue: Scalable Job Queues for R with 'Redis'

Description Usage Arguments Details Methods

Creates an observer for an rrqueue. This is the "base class" for a couple of different objects in rrqueue; notably the queue object. So any method listed here also works within queue objects.

1 2	observer(queue_name = NULL, redis_host = "127.0.0.1", redis_port = 6379, config = NULL)

`queue_name`	Name of the queue, if not given then it will check with the given Redis server to see if there is just a single queue known. In that case we connect to that queue. Otherwise we error and list possible queues.
`redis_host`	Redis hostname
`redis_port`	Redis port number
`config`	Configuration file of key/value pairs in yaml format. See the package README for an example. If given, additional arguments to this function override values in the file which in turn override defaults of this function.

Most of the methods of the observer object are extremely simple and involve fetching information from the database about the state of tasks, environments and workers.

The method and argument names try to give hints about the sort of things they expect; a method asking for task_id expects a single task identifier, while those asking for task_ids expect a vector of task identifiers (and if they have a default NULL then will default to returning information for all task identifiers). Similarly, a method starting task_ applies to one task while a method starting tasks_ applies to multiple.

tasks_list

Return a vector of known task ids.

Usage: tasks_list()

Value: A character vector

tasks_status

Returns a named character vector indicating the task status.

Usage: tasks_status(task_ids = NULL, follow_redirect = FALSE)

Arguments:

task_ids: Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.
follow_redirect: should we follow redirects to get the status of any requeued tasks?

Value: A named character vector; the names will be the task ids, and the values are the status of each task. Possible status values are

PENDING: queued, but not run by a worker
RUNNING: being run on a worker, but not complete
COMPLETE: task completed successfully
ERROR: task completed with an error
ORPHAN: task orphaned due to loss of worker
REDIRECT: orphaned task has been redirected
MISSING: task not known (deleted, or never existed)

tasks_overview

High-level overview of the tasks in the queue; the number of tasks in each status.

Usage: tasks_overview()

tasks_times

returns a summary of times for a set of tasks

Usage: tasks_times(task_ids = NULL, unit_elapsed = "secs")

Arguments:

task_ids: Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.
unit_elapsed: Unit to use in computing elapsed times. The default is to use "secs". This is passed through to difftime so the units there are available and are "auto", "secs", "mins", "hours", "days", "weeks".

Value: A data.frame, one row per task, with columns

task_id: The task id
submitted: Time the task was submitted
started: Time the task was started, or NA if waiting
finished: Time the task was completed, or NA if waiting or running
waiting: Elapsed time spent waiting
running: Elapsed time spent running, or NA if waiting
idle: Elapsed time since finished, or NA if waiting or running

tasks_envir

returns the mapping of tasks to environmen

Usage: tasks_envir(task_ids = NULL)

Arguments:

task_ids: Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

Value: A named character vector; names are the task ids and the value is the environment id associated with that task.

task_get

returns a task object associated with a given task identifier. This can be used to interrogate an individual task. See the help for task objects for more about these objects.

Usage: task_get(task_id)

Arguments:

task_id: A single task identifier

task_result

Get the result for a single task

Usage: task_result(task_id, follow_redirect = FALSE)

Arguments:

task_id: A single task identifier
follow_redirect: should we follow redirects to get the status of any requeued task?

tasks_groups_list

Returns list of groups known to rrqueue. Groups are assigned during task creation, or through the tasks_set_group method of link{queue}.

Usage: tasks_groups_list()

tasks_in_groups

Returns a list of tasks belonging to any of the groups listed.

Usage: tasks_in_groups(groups)

Arguments:

groups: A character vector of one or more groups (use tasks_groups_list to get a list of valid groups).

tasks_lookup_group

Look up the group for a set of tasks

Usage: tasks_lookup_group(task_ids = NULL)

Arguments:

task_ids: Optional vector of task identifiers. If omitted all tasks known to rrqueue will be used.

Value: A named character vector; names refer to task ids and the value is the group (or NA if no group is set for that task id).

task_bundle_get

Return a "bundle" of tasks that can be operated on together; see task_bundle

Usage: task_bundle_get(groups = NULL, task_ids = NULL)

Arguments:

groups: A vector of groups to include in the bundle
task_ids: A vector of task ids in the bundle. Unlike all other uses of task_ids here, only one of groups or task_ids can be provided, so if task_ids=NULL then task_ids is ignored and groups is used.

envirs_list

Return a vector of all known environment ids in this queue.

Usage: envirs_list()

envirs_contents

Return a vector of the environment contents

Usage: envirs_contents(envir_ids = NULL)

Arguments:

envir_ids: Vector of environment ids. If omitted then all environments in this queue are used.

Value: A list, each element of which is a list of elements

packages: a vector of packages loaded
sources: a vector of files explicitly sourced
source_files: a vector of files sourced including their hashes. This includes and files detected to be sourced by another file

envir_workers

Determine which workers are known to be able to process tasks in a particular environment.

Usage: envir_workers(envir_id, worker_ids = NULL)

Arguments:

envir_id: A single environment id
worker_ids: Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A named logical vector; TRUE if a worker can use an environment, named by the worker identifers.

workers_len

Number of workers that have made themselves known to rrqueue. There are situations where this is an overestimate and that may get fixed at some point.

Usage: workers_len()

workers_list

Returns a vector of all known worker identifiers (may include workers that have crashed).

Usage: workers_list()

workers_list_exited

Returns a vector of workers that are known to have exited. Workers leave behind most of the interesting bits of logs, times, etc, so these identifiers are useful for asking what they worked on.

Usage: workers_list_exited()

workers_status

Returns a named character vector indicating the task status.

Usage: workers_status(worker_ids = NULL)

Arguments:

worker_ids: Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A named character vector; the names will be the task ids, and the values are the status of each task. Possible status values are

IDLE: worker is idle
BUSY: worker is running a task
LOST: worker has been identified as lost by the workers_identify_lost of queue.
EXITED: worker has exited
PAUSED: worker is paused

workers_task_id

Returns the tasks that workers are currently processing (or NA for workers that are not known to be working on a task)

Usage: workers_task_id(worker_ids = NULL)

Arguments:

worker_ids: Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).

Value: A named character vector. Names are the worker ids and value is the task id, or NA if no task is being worked on.

workers_times

returns a summary of times for a set of workers. This only returns useful information if the workers are running a heartbeat process, which requires the RedisHeartbeat package.

Usage: workers_times(worker_ids = NULL, unit_elapsed = "secs")

Arguments:

worker_ids: Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).
unit_elapsed: Unit to use in computing elapsed times. The default is to use "secs". This is passed through to difftime so the units there are available and are "auto", "secs", "mins", "hours", "days", "weeks".

Value: A data.frame, one row per worker, with columns

worker_id: Worker identifier
expire_max: Maximum length of time before worker can be declared missing, in seconds
expire: Time until the worker will expire, in seconds
last_seen: Time since the worker was last seen
last_action: Time since the last worker action

workers_log_tail

Return the last few entries in the worker logs.

Usage: workers_log_tail(worker_ids = NULL, n = 1)

Arguments:

worker_ids: Optional vector of worker identifiers. If omitted all workers known to rrqueue will be used (currently running workers only).
n: Number of log entries to return. Use 0 or Inf to return all entries.

Value:

A data.frame with columns

worker_id: the worker identifier
time: time of the event
command: the command (e.g., MESSAGE, ALIVE)
message: The message associated with the command

workers_info

Returns a set of key/value information about workers. Includes things like hostnames, process ids, environments that can be run, etc. Note that this information is from the last time that the worker process registered an INFO command. This is registered at startup and after recieving a INFO message from a queue object. So the information may be out of date.

Usage: workers_info(worker_ids = NULL)

Arguments: