knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) library(dplyr) library(readGallery) library(readr) library(ggplot2)
This readGallery
package allows you to read art data files with the gallery library (see http://art.fnal.gov).
The system requirements are as follows:
$PATH
and $LD_LIBRARY_PATH
but not other necessary environment variables. You need to write an ~/.REnviron
file set the variables within R. You can write this file from the bash command line (do this before starting R or RStudio) with,$ env | egrep '(_INC|_LIB=|_DIR=|^MRB=|^PYTHON)' > ~/.Renviron
reticulate
R package. See https://github.com/rstudio/reticulate for information and installation instructions.The basic workflow you need to follow is:
readGallery
library.XRootD
, then to get the appropriate URI(s) you should look-up the file paths with ifdh ls
and then run them through readGallery::xrootify
. readGallery::useDataProduct
. The argument is a string with the C++ class name including namespace if necessary. Be sure to wrap in std::vector< >
if necessary.readGallery::artInputTag
. You can determine the input tags of objects in the data file with the art binary product_sizes_dumper
.readGallery::getGalleryData
to process the files.readGallery::galleryReader_df
on the reader object(s) to extract the collected data as an R data frame.That all looks like a lot, but it's pretty easy. Let's try it!
I happen to have some data files in Fermilab dCache (note that you'll need the proper environment and proxy established).
system('ifdh ls /pnfs/GM2/scratch/users/lyon/arr_20170307/*/*.root | grep .root | grep _10k', intern=T) %>% xrootify() -> myFiles myFiles
Let's do one file for testing
myOneFile <- myFiles[1]
myOneFile
We are going to read the GhostDetectorArtRecord
objects. We need to declare use of this object.
useDataProduct('std::vector<gm2truth::GhostDetectorArtRecord>')
We would do the same for other objects too if needed.
There are two instances of GhostDetectorArtRecord
objects in the file. We'll load both (eventually). Here are their input tags.
gh_cyl_tag <- artInputTag('artg4:GhostCylinderDetector') gh_nwd_tag <- artInputTag('artg4:GhostNearWorldDetector')
You write a reader class in Python. The reader class should satisfy the following.
prepare
and fill
methods. The prepare
method gets the reader ready before file processing, perhaps by getting the get_valid_handle
function from the gallery.Event
. The fill
method does everything necessary to fill the value rows with data from the gallery.Event
. values
(the accumulated data from the files) and colnames
(names of the columns) methods to return those data. If you do not provide those methods, then you will need to use readGallery::galleryReader_df
with the values
and colnames
arguments and provide methods to extract that data. You can use a base class, galleryReader.GalleryReaderBase
to do some of the boilerplate things for you.
You can either write the python class in a file and import it with readGallery::createReaderClass_from_file
or write it in an R string and import it with readGallery::createReaderClass_from_string
. Both of these functions return the main python environment. You then need to extract your class from the environment with the $<className>
operation. For example,
createReaderClass_from_file('myReaderClass.py')$MyReader # class MyReader is in the python file
There is one example Reader class in this package for GhostDetectorArtRecord
. It uses the base class. Here is the base class for reference.
class GalleryReaderBase: """A base class for simple readers""" def __init__(self, inputTag): self.vals = [] self.inputTag = inputTag self.getValidHandle = None # Shouild be set in the prepare method self.names = None # Needs to be set in derived class; self.names = [...] def colnames(self): return self.names def values(self): return self.vals def prepare(self, ROOT, ev): self.vals = [] # Protect against re-run # Your code sets self.getValidHandle def fill(self, ROOT, ev): # Your code fills self.vals return True
We can make a skeleton class and fill in the details. Most of the time the skeleton will work fine, but you will want to make changes. Perhaps remove some columns. Add constraints. Other things. Remember that the philosophy is to load only the data you need.
readerClassSkel('gm2truth::GhostDetectorArtRecord', writeFile = 'ghostDetectorRecord.py')
We can display the GhostDetectorArtRecordReader
class with the readr::read_file(...) %>% cat
pipeline.
readr::read_file( 'ghostDetectorRecord.py') %>% cat
Let's load it into python. Note the use of $<class name>
,
createReaderClass_from_file('ghostDetectorRecord.py')$GhostDetectorArtRecordReader -> GHReader
Make the reader objects
ghCReader <- GHReader(gh_cyl_tag)
We are now ready to process the files! Let's just do one at first.
getGalleryData(myOneFile, ghCReader)
ghcdf <- galleryReader_df(ghCReader) %>% tbl_df ghcdf
Let's try two readers (we'll use the same Reader class, but initialize it with a different input tag).
ghNReader <- GHReader(gh_nwd_tag)
Let's read in the data again
getGalleryData(myOneFile, c(ghCReader, ghNReader))
ghndf <- galleryReader_df(ghNReader) %>% tbl_df ghndf
Let's try all the data! This will take longer.
getGalleryData(myFiles, c(ghCReader, ghNReader) )
And look at the data
ghcdf <- galleryReader_df(ghCReader) ghndf <- galleryReader_df(ghNReader)
How many rows did we get per file?
ghcdf %>% group_by(fileEntry) %>% tally()
ghndf %>% group_by(fileEntry) %>% tally()
The function readGallery:getGalleryData
can return timing information in the form of a python object. For example,
times <- getGalleryData(myOneFile, ghCReader)
times$allTime
Look at time it took to process events.
et <- times$eventTimes mean(et)
qplot(seq_along(et), et) + xlab('eventEntry') + ylab('Processing time (s)')
The slow ones seem to be in the beginning
ets <- et[100:length(et)] qplot(seq_along(ets), ets) + xlab('eventEntry') + ylab('Processing time (s)')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.