knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(dplyr)
library(readGallery)
library(readr)
library(ggplot2)

Introduction

This readGallery package allows you to read art data files with the gallery library (see http://art.fnal.gov).

System requirements

The system requirements are as follows:

$ env | egrep '(_INC|_LIB=|_DIR=|^MRB=|^PYTHON)' > ~/.Renviron

Workflow

The basic workflow you need to follow is:

  1. Load the readGallery library.
  2. Locate your data files and put the paths in a vector. If you are using XRootD, then to get the appropriate URI(s) you should look-up the file paths with ifdh ls and then run them through readGallery::xrootify.
  3. Declare use of an art data product with readGallery::useDataProduct. The argument is a string with the C++ class name including namespace if necessary. Be sure to wrap in std::vector< > if necessary.
  4. Create the art input tags you need with readGallery::artInputTag. You can determine the input tags of objects in the data file with the art binary product_sizes_dumper.
  5. Write a python class to serve as the reader. See below.
  6. Run readGallery::getGalleryData to process the files.
  7. Run readGallery::galleryReader_df on the reader object(s) to extract the collected data as an R data frame.

That all looks like a lot, but it's pretty easy. Let's try it!

Locate data files

I happen to have some data files in Fermilab dCache (note that you'll need the proper environment and proxy established).

system('ifdh ls /pnfs/GM2/scratch/users/lyon/arr_20170307/*/*.root | grep .root | grep _10k', intern=T) %>% 
  xrootify() -> myFiles
myFiles

Let's do one file for testing

myOneFile <- myFiles[1]
myOneFile

Declare use of a data product

We are going to read the GhostDetectorArtRecord objects. We need to declare use of this object.

useDataProduct('std::vector<gm2truth::GhostDetectorArtRecord>')

We would do the same for other objects too if needed.

Create input tags

There are two instances of GhostDetectorArtRecord objects in the file. We'll load both (eventually). Here are their input tags.

gh_cyl_tag <- artInputTag('artg4:GhostCylinderDetector')
gh_nwd_tag <- artInputTag('artg4:GhostNearWorldDetector')

Write the reader class

You write a reader class in Python. The reader class should satisfy the following.

You can use a base class, galleryReader.GalleryReaderBase to do some of the boilerplate things for you.

You can either write the python class in a file and import it with readGallery::createReaderClass_from_file or write it in an R string and import it with readGallery::createReaderClass_from_string. Both of these functions return the main python environment. You then need to extract your class from the environment with the $<className> operation. For example,

createReaderClass_from_file('myReaderClass.py')$MyReader   # class MyReader  is in the python file

There is one example Reader class in this package for GhostDetectorArtRecord. It uses the base class. Here is the base class for reference.

class GalleryReaderBase:
  """A base class for simple readers"""

  def __init__(self, inputTag):
    self.vals = []
    self.inputTag = inputTag
    self.getValidHandle = None   # Shouild be set in the prepare method
    self.names = None # Needs to be set in derived class; self.names = [...]

  def colnames(self):
    return self.names

  def values(self):
    return self.vals

  def prepare(self, ROOT, ev):
    self.vals = []  # Protect against re-run
    # Your code sets self.getValidHandle

  def fill(self, ROOT, ev):
    # Your code fills self.vals
    return True

We can make a skeleton class and fill in the details. Most of the time the skeleton will work fine, but you will want to make changes. Perhaps remove some columns. Add constraints. Other things. Remember that the philosophy is to load only the data you need.

readerClassSkel('gm2truth::GhostDetectorArtRecord', writeFile = 'ghostDetectorRecord.py')

We can display the GhostDetectorArtRecordReader class with the readr::read_file(...) %>% cat pipeline.

readr::read_file( 'ghostDetectorRecord.py') %>% cat

Let's load it into python. Note the use of $<class name>,

createReaderClass_from_file('ghostDetectorRecord.py')$GhostDetectorArtRecordReader -> GHReader

Make the reader objects

ghCReader <- GHReader(gh_cyl_tag)

Process the files

We are now ready to process the files! Let's just do one at first.

getGalleryData(myOneFile, ghCReader)
ghcdf <- galleryReader_df(ghCReader) %>% tbl_df
ghcdf

Let's try two readers (we'll use the same Reader class, but initialize it with a different input tag).

ghNReader <- GHReader(gh_nwd_tag)

Let's read in the data again

getGalleryData(myOneFile, c(ghCReader, ghNReader))
ghndf <- galleryReader_df(ghNReader) %>% tbl_df
ghndf

Let's try all the data! This will take longer.

getGalleryData(myFiles, c(ghCReader, ghNReader) )

And look at the data

ghcdf <- galleryReader_df(ghCReader)
ghndf <- galleryReader_df(ghNReader)

How many rows did we get per file?

ghcdf %>% group_by(fileEntry) %>% tally()
ghndf %>% group_by(fileEntry) %>% tally()

Timing information

The function readGallery:getGalleryData can return timing information in the form of a python object. For example,

times <- getGalleryData(myOneFile, ghCReader)
times$allTime

Look at time it took to process events.

et <- times$eventTimes
mean(et)
qplot(seq_along(et), et) + xlab('eventEntry') + ylab('Processing time (s)')

The slow ones seem to be in the beginning

ets <- et[100:length(et)]
qplot(seq_along(ets), ets) + xlab('eventEntry') + ylab('Processing time (s)')


lyon-fnal/readGallery documentation built on May 21, 2019, 9:16 a.m.