In langfob/tzar: Package for emulating running tzar tool from inside R

BTL - 2017 01 22

This vignette is out of date now that I have changed the package to provide templates for model.R and tzar_main.R. I've modified the README to reflect that, but not this vignette, since that will take a fair bit more work. In the meantime, I've extracted the old Simple Example from the README and pasted it into the start of this vignette just to save the text to use as the basis of a more accurate new simple example. The rest of the old vignette follows the example and I have no idea how much (if any) change needs to be done on that.

Simple example

Here is a simple example program to illustrate how to use the tzar emulator and how it works. We will consider a trivial program to print the value of a variable x. Not all of the example will make perfect sense until you have read the vignette, but it will give you a concrete idea of how you use tzar emulation and that should help make the details in the vignette easier to understand.

Suppose that we want to build a program that will just print the value of an input parameter x. Call the program print_x() and suppose that it is in a file called print_x.R.

print_x <- function (x)
    {
    cat ("\nValue of x = '", x, "'\n\n", sep='')
    }

Print a value from a parameters list.

print_x <- function (parameters)
    {
    cat ("\nValue of x = '", parameters$x, "'\n\n", sep='')
    }

parameters <- list (x="a value from a list of parameters")
print_x (parameters)

Print a value from a parameters list specified in a project.yaml file

Suppose that the project.yaml file contains:

project_name: simple_program
runner_class: RRunner

base_params:
    x: 100

To run under tzar emulation

We have to do 3 things:

First, make sure that the tzar emulator package is installed (as described above) and loaded with a call such as "library(tzar)".
Second, we would have to edit the model.R file to set the argument variables correctly. Assuming that:
- the tzar jar file was in the directory "~/tzar"
- we want the tzar emulation scratch file in our home directory (though you can put it anywhere you want)
  - tzar emulation requires the use of a tiny scratch file that you don't need to know anything about, other than specifying where to put it.
  - The file is deleted at the end of the run, so its location is not terribly important.
- the R code is in the current working directory

```{R eval=FALSE} library (tzar) source ("print_x.R")

main_function = print_x projectPath = "." tzarEmulation_scratchFileName = "~/tzar_em_scratch.yaml"

model_with_possible_tzar_emulation (parameters, main_function, projectPath, tzarEmulation_scratchFileName )

- Third, we would make a **function call** something like this one (e.g., at the R prompt or in a file being sourced) to run the program under tzar emulation:

```{R eval=FALSE}
run_tzar (
    emulatingTzar               = TRUE, 
    main_function               = print_x,    #  note, no quotes on name
    projectPath                 = ".",
    tzarJarPath                 = "~/tzar/tzar.jar", 
    emulation_scratch_file_path = "~/tzar_em_scratch.yaml"
    )

To run under normal tzar without emulation

We would leave model.R as above and then just change the final argument to run_tzar(), i.e., change emulatingTzar to FALSE. Note that:
- You don't have to run tzar yourself; the emulator runs it for you, and
- This only works for local running of tzar for a single run since it's intended only to be used for development. Once in production and spawning lots of runs, you would go back to normal command-line calls to tzar. - Even this could be changed though, if we were to add tzar-control arguments to the run_mainline...() call and pass them on to its call to execute tzar.

```{R eval=FALSE} run_tzar ( emulatingTzar = FALSE, main_function = print_x, # note, no quotes on name projectPath = ".", tzarJarPath = "~/tzar/tzar.jar", emulation_scratch_file_path = "~/tzar_em_scratch.yaml" )

---------------------

### End of old Simple Example

---------------------

## Overview

The java program tzar helps with distributing and managing many runs of sets of experiments and making them easier to reproduce. tzar itself is documented at and downloadable from https://tzar-framework.atlassian.net/wiki/display/TD/Tzar+documentation.

One problem with tzar is that it runs your R code in a separate process that makes it difficult, if not impossible, to easily debug your R code. The tzar package is intended to make it easy to emulate running tzar while keeping the running code under your control (e.g., inside RStudio). This allows more interaction with your code, e.g., allowing the use of the browser() function to pause and examine variables. Similarly, when your program crashes in emulation mode you can get at lots of information that is lost or truncated when the program is run in a separate tzar process.

Examples of how to use tzar emulation are given in the README file. This vignette explains some of the background in how the emulation works. This background is intended to make it easier to understand why each of the steps in using the emulator are necessary.

## How tzar emulation works

Basically, tzar emulation is just a way to trick tzar into giving you half of its benefits. The various things that you have to do as a user to use the package are largely a consequence of what is required to trick tzar. This section gives some background that helps understand what's necessary to trick tzar.

#### Conventions used by tzar that matter in emulation

- **project.yaml**: tzar requires a project.yaml file to exist in the directory where the source code is stored.
- tzar builds a *parameters* list by decoding the yaml expressions in the project.yaml file.
- This *parameters* list is then written out to a file "parameters.R" which can be read back in as a list object by any R program.
- One useful thing about tzar is that it builds various directory locations and variables for you by replacing wildcard values, but the names that are built aren't known to you until after tzar has started up.
- However, the values of these constructed names and variables are added to the *parameters* list before it is written out to "parameters.R".
- After tzar builds the directory path to its output, it saves it in the *parameters* list as parameters$fullOutputDirWithSlash.
- This is the location where the *parameters* list itself will be written as a parameters.R file.
- **model.R**: tzar requires you to have a model.R file (in the same directory) that invokes the code you want to run.
- The code in **model.R** is run *after* tzar has loaded the project.yaml data into the *parameters* list.
- This means that model.R can treat the variable *parameters* as a pre-existing variable inside the same scope as the code in model.R.
- The code in model.R has no restrictions on what it can or has to do. It could be completely empty for all that tzar cares.
- Consequently, we can run a model.R that does almost nothing at all but it will still have read and written the *parameters* list.

#### How the conventions suggest a method for tzar emulation

Tzar creates the parameters list and saves it to disk *regardless of what is done in model.R*. So, in emulation mode we can take advantage of this by having model.R do nothing at all, except save the path to the directory where the parameters list was written.

We can easily do this by just having model.R write that path into a scratch file whenever it's doing emulation. The emulation code can then read the location from the scratch file and find the "parameters.R" file to load the parameters list and pass the reconstituted *parameters* to our mainline application code. This parameters list will contain all of the paths and variable names that tzar has built based on the project.yaml file so that we don't have to build them ourselves.

#### Allowing both normal tzar and tzar emulation

The one other thing that we would like is for our code to still work with as little change as possible when running normally under tzar rather than emulating tzar. This is easy to accomplish by just adding a boolean flag indicating whether we are running tzar normally or just emulating it. Each time you want to switch from emulation to normal tzar and vice versa, you just have to change the value of this flag.

The only hitch is that the flag has to be known to both model.R and the emulation code. At the moment, the simplest way to make sure that the value is the same in both bits of code is to read it from a file that both bits of code know about. This is a little bit clumsy, but it works.

## Implementation summary

Implementation of tzar emulation requires two pieces of code:

* One piece that runs inside model.R whenever tzar is invoked, and
* Another piece that wraps around our mainline application code to:
+ run tzar, and
+ get the *parameters* list back from it if running in emulation mode, and
+ run the mainline application.

In pseudocode, this looks like the following:

### In project.yaml:

Set emulatingTzar flag to TRUE if emulating, FALSE if running normal tzar.

### In wrapper R code around main application:

run tzar

if (parameters$emulating tzar)
read tzar output directory path from scratch file load parameters list from tzar output directory run mainline application

###  In model.R:

Assume called by tzar and tzar has already loaded parameters list from project.yaml

if (parameters$emulatingTzar)
do nothing except write tzar output directory path to scratch file else
run mainline application

---
---


## Dumping ground for misc notes that will probably end up removed from this vignette

##  NOTES


###  Some possibly useful tzar-related commands I had lying around in various editor buffers

java -jar tzar.jar scheduleruns http://rdv-framework.googlecode.com/svn/trunk/projects/rdvPackages/biodivprobgen/R --clustername biodivprobgen --revision head --runset biodivprobgen_1kruns_1sce_8001ranSeed_10to60ranPUs

java -jar tzar.jar execlocalruns /Users/bill/D/rdv-framework/projects/rdvPackages/biodivprobgen/R/ --runset biodivprobgen_1kruns_1sce_8001ranSeed_10to60ranPUs

java -jar tzar.jar execlocalruns https://tzar-framework.googlecode.com/svn/tags/latest/example-projects/example-R

https://tzar-framework.googlecode.com/svn/trunk/

sudo scp -i /Users/bill/.ssh/rdv.pem /Users/bill/D/rdv-framework/tzar-0.5.5-runset-wildcard.mar.07.2015.jar ubuntu@146.118.96.116:/home/ubuntu

[ubuntu@unixnpeap06 tzar_output]$ grep -i bdpg ?????_scen_rand_10_to_60_PUs_0_5_optFrac_random_r_and_p/prob_diff_results.csv > ../aggregated1kResults.csv -bash: ../aggregated1kResults.csv: Read-only file system ```

langfob/tzar documentation built on May 20, 2019, 7:56 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com