The hardware server (currently sgdata.motus.org) that processes raw files from receivers and generates runs of tag detections runs various software servers that are R applications based on the motusServer package.
This is an sqlite DB that tracks (re) processing of uploaded, sync'd, or archived data.
The main table is jobs
, with this schema:
CREATE TABLE jobs (
id INTEGER UNIQUE PRIMARY KEY NOT NULL, -- unique job id
pid INTEGER REFERENCES jobs (id), -- id of parent job; null if this is a "top-level" job
stump INTEGER REFERENCES jobs (id), -- id of top-level job; i.e. ultimate ancestor of this job. Equal to `id` if `pid` is null
ctime FLOAT(53), -- timestamp of job creation
mtime FLOAT(53), -- timestamp of latest change to job information
type TEXT , -- short string giving type of job. If type is 'abcdEfg', the job will be handled by a function called 'handleAbcdEfg'
done INTEGER , -- status code. 0: not completed (maybe not started); 1: completed successfully; < 0: error
queue TEXT , -- usually a small integer; queue in which job resides.
-- This indicates which running processServer instance is processing or has processed this job.
path TEXT , -- filesystem path to job folder, which holds archives or data files used for this job; null if none
oldpath TEXT , -- filesystem path to previous location of job folder; permits recovery in case of crash between time
-- of attempt to move job and recording thereof in the DB
data JSON, -- parameters, logs, product pointers for this job, as a json-encoded object. Names of fields generated by the job end in `_`
motusUserID, -- integer; motus ID of user who launched this job; only non-null in top-level jobs
motusProjectID -- integer; motus ID of project (selected by user at upload time) which will own the outputs from this job; only non-null in top-level jobs
)
R accesses this database via an S3 class called Copse
, a simple
data-base-backed object interface.
Jobs are represented as Twigs
in the Copse
, with a tree structure
(subjobs within jobs) and arbitrary data fields for parameters
and output products.
Writing to the R objects makes immediate changes to the fields in the jobs table, and reading from the R objects uses the most recently.
A top-level job is created by these events: - user uploads an archive of raw receiver files to be processed - data server polls an attached receiver for new data (typically hourly) - admin requests a re-run of some portion of archived raw files
Each top-level job creates subjobs that perform chunks of the processing. These pieces were chosen in somewhat arbitrary fashion, but with these goals: - if a chunk fails, it should leave the DB and filesystem in a state where the chunk can be retried, in case a bug is fixed - if processing is interrupted during a chunk (e.g. power outage, system crash, fatal bug), retrying it should work - chunks should be conceptually independent, to the extent possible - chunks that require locking objects (such as receiver databases) should be as small as possible
Top-level jobs are created in either the regular queue /sgm/queue/0
,
or the priority queue /sgm/priority
. By default, uploads go
into the former, and sync jobs into the latter. Jobs (re)submitted
by admin users can be forced into either queue.
From these two top-level queues, one of the processServers will claim the job. We've typically been running 4 normal processServers that claim jobs from queue 0, and two 'high-priority' processServers that claim jobs from the priority queue. There is nothing different about high-priority servers except for the queue from which they are fed. These are intended to allow low-latency processing of data from attached receivers, frequently and in small quantities. Upload jobs, which might involve very large amounts of data and so take a long time to process, are run on the normal processServers so as not to disrupt the low-latency processing.
Once a top-level job has entered a queue, any subjobs it generates are automatically added to the same queue.
Top-level jobs are represented in the filesystem by a folder whose name is a left-zero-padded number equal to the jobID, e.g. 00000001 Currently, the numbers are padded to 8 digits, allowing for 100 M jobs. That could be changed if needed.
Jobs begin life in one of the input queues:
/sgm/queue/0
(normal priority jobs)/sgm/priority
(high priority jobs)So, e.g., a new upload might begin with the folder /sgm/queue/0/00012345 containing the uploaded file.
When a processServer is available, it looks at its input queue and
claims the first job it finds there, waiting if there are none.
(Really, it does blocking reads from a pipe connected to an
instance of inotifywait
, which watchs a folder for file creation
or move events).
The job is then moved to the processServer's processing queue:
/sgm/queue/1, /sgm/queue/2, ... /sgm/queue/4
(normal priority processServers)/sgm/queue/101, /sgm/queue/102
(for the two high-priority processServers)When a processServer is started, it checks its processing queue for any
unfinished jobs (done
== 0) and runs those, before looking at its input queue.
This allows for resumption of jobs interrupted by a server outage.
When a job (and all of its subjobs) has completed, its folder is moved
to /sgm/done
. This is currently a flat folder, but needs to be re-organized
hierarchically to properly support huge numbers of jobs.
Any job (including subjob) that ends in an error has its stack dump recorded
in /sgm/errors
as an .rds file, e.g. /sgm/errors/00001270.rds
This file can be examined within R by doing:
> library(motusServer)
> hackError(1270, topLevel=FALSE)
With:
Error in pushToMotus(src): invalid motus device ID for receiver with DB at /mnt/usb/new_sgm_recv/SG-0613BB000593.motus
Traceback (also is in variable bt):
bt[[3]]: h(j)
bt[[2]]: pushToMotus(src)
bt[[1]]: stop("invalid motus device ID for receiver with DB at ", attr(src$con, "dbn
## the bt list holds environments with variables at each level of the stack dump
> ls(bt[[2]])
[1] "batches" "con" "deviceID" "motusTX" "newBatches"
[6] "sql" "src"
> bt[[2]]$newBatches
# A tibble: 1 x 10
batchID motusDeviceID monoBN tsStart tsEnd numHits
<int> <int> <int> <dbl> <dbl> <int>
1 1 NA 8 1370809071.1776 1372964178 0
# ... with 4 more variables: ts <dbl>, motusUserID <int>, motusProjectID <int>,
# motusJobID <int>
Not all variables in the stack dump environments will be valid; e.g. database and file connections will not be.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.