The Pico Analysis Framework (PAF)
This is an attempt to summarize some of the features of PAF which
are applicable to the MINOS software framework persistency design.
Overview
PAF was developed for use in the BaBar experiment for purposes of
providing a versatile interface to the data. It is a ROOT based system
and makes extensive use of ROOT I/O and data structures.
Originally envisioned to be experiment independent,
the current implementation has some BaBar'isms built into the code, and
so while we can learn from and adapt aspects of the code for our purposes,
it can't be adopted wholesale for our use.
The history of PAF is that it is designed to be a solution
for those who wish to perform analysis on an arbitrary computing platform
(e.g. a laptop PC running Linux) and who may not have access to the
licenses required to run the main BaBar framework (which depends on
Objectivity (a licensed product) for its object database store). PAF is
advertised to be:
- Fast and cheap (available free of license fees)
- Platform independent
It is not an officially supported framework of the BaBar experiment.
Principles of Design
There are 4 basic components of the PAF system. These are:
Data Streams
There are currently 3 of these implemented in PAF. The 3 implemented streams
in PAF are a subset of
the data streams supported in the main BaBar framework. These are:
- Tag - Event Identification data at "nano-DST" level. Contains a list of
prominent physics attributes supplied
by the event reconstruction. These attributes can be used to select
events of interest rapidly.
- Aod - Analysis object data at "micro-DST" level.
- Mct - Monte Carlo Truth info.
Alternative data streams implemented in the main BaBar framework, but not
in PAF, are [5]:
- Sim - Contains the GHITS generated by the Monte Carlo (separated
from the Mct because of the difference in frequency of
access).
- Raw - The Raw data.
- Rec - The full results of production reconstruction.
- Esd - Event Summary Data - Compressed and distilled results from
the Rec stream. ("Mini-DST level between Rec and Aod).
- Usr - Contains information added during a user analysis.
Each data stream is supported by its own ROOT TTree data structure, i.e.
there is a 1:1 relationship between TTree's and data streams. The
structure of the TTree's and their contents in discussed in more
detail in the section
Organization of PAF Data Streams below.
Managers
The PAF managers manage the object data streams and other aspects of the
analysis program. Some of these managers are:
- Event manager (TPicoEventManager). This manager manages
the 3 default data streams and provides access to the stream data.
The Event manager maintains a pointer to a PAFPafReader object
which actually does most of the work of managing the data streams.
The PAFPafReader class maintains a list of the different data streams
(each implemented through a class PAFStream) that it needs to manage.
The PAFPafReader class inherits from an abstract base class, and is
implemented specifically to handle
the 3 default PAF streams (Tag,Aod,Mct). The
structure of the 3 stream TTree's is "hardwired" into the PAFPafReader
class and it uses this to set up the TTree branches in each of the
PAFStream objects using PAFStream configuration methods.
Input data is provided to the Event manager two ways:
- by setting a single input file. The user actually supplies a filename base
and the filename for
each stream is obtained by appending a suffix appropriate for
the stream, e.g. "Tag.root" to the filename base. There are
also parameters that can can be set to specify alternative locations
for the Aod streams (e.g. on a remote site).
- by setting an input Collection (or an
input file containing a Collection).
The event manager provides methods to load data into memory from
each of the streams according to sequential and random access
requests (Next() and GetEvent(UInt_t)). It also provides several
Get methods to retrieve the different aspects of the currently
loaded event. It also supports several other methods, some of which are:
- Methods by which the user can disable/enable streams.
- Methods to set and evaluate user-supplied Tag expressions.
These methods are used to select events according to specified
Tag attribute cuts (much like cutting on attributes of a PAW n-tuple).
- A set of Fill methods to fill user-supplied PAFList's with
Candidates of the appropriate type (e.g. FillNeutral will generate
neutral candidates from the current Aod stream entry and fill them
into the list).
- An AddEventToCollection method by which user's can add the current
event to a user-supplied Event collection (PAFEventCollection).
Access to remote files is handled through the use of the ROOT class
TNetFile and the root provided rootd server. Files which are to be accessed
remotely are prefaced by "root:" and have a format similar to that
of standard URL's as described in the TNetFile class documentation.
PAF makes use the static TFile::Open method to open both types of files
(local and remote) transparently.
- Directory manager (TPicoDirectoryManager). This manager is
one of 4 types of concrete Collections
supported by PAF.
If the Directory manager is used, it is supplied to the
Event manager as an input Collection.
- Persistency manager (TPicoPersistenceManager). This manager
provides services for persisting histograms, n-tuples and user-defined
objects to a user-specified ROOT file. THashList's are maintained
for each of the 3 object types (histograms,ntuples,TObject's), and
user's may Add,Get or Remove objects from these lists. A Store() method
is provided to allow user's to persist these objects to file at the
end of a job.
- Object manager (TPicoObjectManager).This manager provides tools to
store and retrieve transient objects such as those that are passed between
different job modules. This is very similar to the current
version of MOM in Minos. The object manager stores pointers to the
transient objects in a THashList, facilitating rapid look-up of objects.
Like MOM, it owns its objects and deletes them and the list in its
destructor.
- List manager (TPicoListManager). This manager keeps a list of
transient candidate lists (PAFList's) for use in transferring PAFList's
between job modules. (PAFList's maintain a TClonesArray of PAFCandidate
objects.) This is just a specific case of the TPicoObjectManager, and
its implementation is very similar.
- Parameter manager (TPicoParameterManager). This manager handles
the input parameters to the job framework and modules. Input parameters
can be specified 3 ways:
- Arguments passed at the command line
- Through a "tcl" style file (command followed by one or more arguments)
- Directly through invoking SetParm methods.
These accept key/value pairs, a description (optional), and a target
job module for the parameters.
Bool,string,double, and integer values are accepted. The parameter
manager maintains 4 THashList's, one for each of the value types
supported.
The lists keep track of all parameters that have been defined and
set through the parameter manager. The system is extendable in that
user's can define their own parameter key/value/(description) objects.
The parameter manager has a method to dump its parameters to a tcl-style
file that can be reused as input in another session.
The managers in turn are managed by a singleton Application manager
class TPico which maintains a set of pointers to each of the possible
manager types as well as a pointer to a PAFAnalysis object. The PAFAnalysis
class mantains a list of job modules (PAFModules) and controls the execution
of the job, performing a function something like our Job Control package.
All manager and analysis pointers are retrieved by first using a static
TPico::Instance() method to retrieve the TPico singleton, and then invoking
the appropriate Get method.
Selectors
This section is incomplete.
Selectors are used to filter the data. There are 3 basic types:
- Event Selectors These selectors are used to select events according to criteria stored in the TAG and/or AOD datastreams.
- Particle Selectors These selectors are used to select subsets of
particles.
- Vertex Selectors These selectors are used to filter tracks that might have the same vertex.
This section is incomplete.
Collections can be used to organize events into sets. There are 4
concrete implementations in PAF, all of which inherit from an abstract
collection class and provide methods for incrementing over the members
of the collection. The 4 collection classes are:
- TPicoDirectoryManager
- PAFEventCollection
- PAFIndexCollection
- TRunCollection
Each collection contains some method of associating a member event to the
file containing its data. The abstract base class that all of the collections
inherit from ensures that all collections support a common set of interfaces
for accessing the events in the collection.
PAF supports access to two formats for the organization of data in ROOT files:
the PAF and the Kanga model.
This section discusses the PAF organizational scheme.
PAF streams (Tag, Aod, Mct) are each supported by their own ROOT TTree
data structure, i.e. there is a 1:1 relationship between TTree's and
data streams. Each TTree is stored in a separate file with a suffix
indicating it's stream contents, e.g. "xxxTag.root","xxxAod.root", etc..
The three TTree's are organized as follows:
TagTree
The TagTree stores the data members of the PAFEventBase class.
The Tree is organized as follows:
- A single tree called "TagTree"
- One branch called "TagList" is the mother branch for the PAFEventBase object.
- The TagList Branch is split to hold the individual PAFEventBase
data members with subbranch names and data members as follows:
- "_runno" // run number (UInt_t)
- "_eventno" // event number (Int_t)
- "_majorID" // number of seconds since 01/01/1901 (UInt_t)
- "_minorID" // number of microseconds in the second (UInt_t)
- "_nChargedTracks" // number of prongs (UShort_t)
- "_nNeutralTracks" // number of neutrals (UShort_t)
- "_tag" // the event TAG object (PAFTagBase)
The "_tag" branch stores objects of type PAFTagBase, and is further split into
subbranches to store the individual data members of the PAFTagBase class.
PAFTagBase data members are numerous and represent prominent physics
attributes determined during reconstruction that may be useful during the
analysis stage. Examples of Tag data members are
_eTotal & _pTotalMass, which reside
on branches with names "_tag._eTotal" and "_tag._pTotalMass" respectively.
There are currently 83 such Tag data members.
Selection of events by "Tag" variables is done through an application
of the TTreeFormula class. It allows the user to express a cut
on events as an expression, e.g.
"Tag_i_nTracks > 3 && Tag_f_eTotal > 5.0"
to select events with more than 3 tracks and total energy greater than
5 GeV. The Tag cut expression can be supplied dynamically (e.g. on the
command line) so that no recompilation of code is necessary.
AodTree
The AodTree stores Analysis Object Data (Aod) objects.
The Aod objects can be of type PAFChargedAodBase
or PAFNeutralAodBase, which are the
classes for the storage of charged and neutral candidates respectively. The
TTree is organized as follows:
- A single TTree "AodTree" consisting of:
- Two TClonesArray branches: "AodListC" holds PAFChargedAodBase objects, and
"AodListN" holds PAFNeutralAodBase objects.
- Both TClonesArray branches are configured in "split" mode to create
subbranches to hold each of the PAFChargeAodBase and PAFNeutralAodBase
datamembers.
MctTree
The MctTree stores Monte Carlo Truth objects of type PAFMcBase. The
Tree is organized as follows:
- A single TTree "MctTree" consisting of:
- One TClonesArray branch: "MctList" holding PAFMcBase objects.
References:
The first 2 references are from Marcel Kunze (an author of the
PAF package) to whom I promised acknowledgement for any use of this code
in the MINOS framework.
Reference 3 is from George and Reference 4 from the ROOT Example Applications site.
[1]
PAF Class Documentation .
[2] A recent PAF tutorial given at SLAC.
[3] Pico Analysis Framework Documentation .
[4] More Pico Analysis Framework Documentation .
[5] S. Patton's BaBar Database Documentation .
If you have any comments about this page please send them to
Sue Kasahara
Last Updated: August 24, 2000.