README.md

qrmarkdown R package

qrmarkdown workflow

About

qrmarkdown is a simplest job queue control system for R user.

You can schedule daily, weekly task.

Why

Air-flow, darg, many great workflow app exists but when come down to deployment on different runtime environment without your control, this becomes a big headache.

Usability

You can control everything inside of rstudio. No additional thirdparty app required.

Benefits

Tested

Unix, AWS, Cloudera, Mac OSX

Installation

Download R package and allocate a directory for job queue.

install_github("okux/qrmarkdown")

Export QWD environment value in your OS. Unix Like:

export QWD='~/Desktop/timebox'

Queue a job

If you don't want to use QWD environment value, try this way on R console.

# Setup queue directory before calling
require(qrmarkdown)
q.wd('~/Desktop/timebox') 

q.dispatcher(n=3)
jid1 <- q.push(script='wkfl.1.Rmd', name='mytest')

note: q.dispatcher(n=3) starts 3 background processes for executing rmd job.

What's in my workflow queue?

q.show()

q.show

uses rda binary format of job status in file system. - inbox: contains a job tickets to run - outbox: contains all completed tickets - schedule: recurrent workflow ticket - n.running: number of job running - n.queue: total number of current jobs

Schedule your daily task

qrmarkdown automate to run rmd script on a specific weekly & hour.

Scheduling a task

workflow types: weelky report, daily test run, model validation, monitoring anomaly.

SCRIPT <- 'fullpath/regression.test.rmd'
OUTPUT <- 'fullpath/regression.test.html'

jid <- q.schedule(wday='Monday',hour=5,
           script=SCRIPT, output=OUTPUT) 

q.monitor()           

q.monitor starts hourly monitoring demaon in background for lunching all scheduled job.

note: ensure to use full path for script or output location otherwise all background deamon look file under QWD.

Viewing result

rmarkdown creates unique job id(jid) looks like this. i.e. 'f2fed362-6e6b-4aa8-90d8-02765534d8e6'

q.ls with view.output=TRUE shows specific rmd's output in a browser.

q.ls('f2fed362-6e6b-4aa8-90d8-02765534d8e6', view.output=TRUE)

Workflow redandancy & rerun failed job

Rerun

use q.ls and jid to re-queue your job

require(ggplot2) 

qlist <- q.ls()
plot.data <- qlist %>% select(name,secs,status) 

ggplot(data=plot.data,aes(x=name,y=secs,color=status)) + geom_boxplot() + ggtitle('job execution time') + theme_minimal() + xlab('job name')

failed <- q.ls('failed', detail = TRUE) # grab detail report 

q.run( failed$jid[1] )

failed jobs by name

debug rmd script with browser in rstudio

Open your script which failed from run in rstudio. Insert browser or set breakpoint.

Use q.run will execute your script directly from console which allows you to debug a script.

# grab script failed 
rstudioapi::viewer(failed[1,]$script) 

By appending following lines at start of your script, restore the exact same parameters used for this job execution.

load(failed[1,]$jid)     # restore job ticket parameter 
params <- ticket$params  # overwriting knitr params 

note: give full path to jid path.

Parameterize jobs

You can use single rmd file to generate multiple report by parameters.

queue.job <- function(country)
{
 ret <- q.push(script="dynamic.analysis.Rmd", 
  params=list(country=country), 
  output=sprintf("~/Desktop/%s.html",country))

 return(ret)
}

joblist <- lapply(countries, f=queue.job)

q.schedule(joblist, wday="Monday",hour=4) # kick off every Monday 

Running deamons in different terminal

Terminal setup

Open rstudio terminal or other terminal application. This allows you to start qrmarkdown deamon in a separated screen.

Rscript -e "qrmarkdown::q.dispatcher(n=8, dir='your.queue.log.dir')"
Rscript -e "qrmarkdown::q.monitor(wdir='your.queue.log.dir')"

Shutting down background deamon

qrmarkdown safely shutdown a job by waiting all running job to be finished. There is no mechanism implemented to force to kill a job.

q.shutdown() 

q.dispatcher(n=10)

Cleaning past job files

q.rm('outbox/*') # delete all completed job log 

q.rm('*')        # delete all completed job log 

Advanced workflow design

See next chapter for more advanced user.



okux/qrmarkdown documentation built on Dec. 22, 2021, 4:17 a.m.