MINOS Persistency
The purpose of this site is to document work on the development of a
persistency package for handling the storage and retrieval of MINOS data.
There is also a developing Package
Rationale for this system, and a Persistency Class development site.
A new web site discusses issues related to
the management of the input of records from multiple data streams.
Introduction
Persistence describes the process of making objects permanent beyond their
present application for reuse by some later application (such as when an
object is written to/read from a file).
The problem of persisting objects to file and retrieving objects
back into memory is not trivial.
- Objects may have complex structure (hierarchy of inheritance, and data
members that are objects or pointers to objects).
- C++ has weak built-in support for object persistency.
The solution to this problem is to supplement C++ with a framework (in our
case ROOT) for use of its I/O facilities.
ROOT provides the following
persistency tools:
- Automatically generated Streamer methods for user-defined classes.
The Streamer method is used to convert object data to a stream of
bytes (data buffer) to be stored on file (and vice versa).
- File management (TFile) to organize storage of object
data buffers in a file with Keys to facilitate random and
sequential access of objects according to a user provided Key Name.
- Data structures (TTree) to coordinate sets of objects. Provides
synchronized I/O between objects in set, and also partial I/O
of subsets of objects in any given TTree.
- Remote access to files (TNetfile along with a ROOT supplied
server daemon) to support a "Distributed Database" environment,
in which data may be distributed over multiple files (some remote).
- Support for Schema evolution. ROOT currently implements
support for Class versioning (allowing a Class to be defined with a
version number which is written with the object), however there is
no mechanism in place for automatically generating a streamer method
that handles multiple versions of a class. Instead, the user must
provide a customized streamer for this purpose. Two relatively new
features of ROOT are:
- Methods to set&check the number of byte written for a given object
against the number expected.
- Tools for writing/reading "self-describing" objects. These objects
have their format written out with the object data. (The status of
this development and its use in supporting automized schema evolution in ROOT
needs to be understood.)
The tools provided by ROOT will be used in the design of a MINOS persistency package. (Slides from a recent talk I gave at our local software meeting on object persistency in a ROOT environment can be found here .)
Objectives of the Persistency Package
The purpose of this package is to provide a set of tools to be
used in managing the I/O of event data from user processes to persistent
store. It must
also provide a scheme for organizing the data in files and organizing
the distribution of files so as to optimize the efficiency and flexibility
of access to the event data.
Within these two subcategories, management and organization, the
objectives can be itemized as follows:
- Management Objectives:
- Provide tools to manage the I/O of data streams. A data
stream in abstract terms is a collection of related objects. Its
implementation will likely be managed through the use of the ROOT TTree
data structure, probably in a 1:1 relationship (1 TTree for each
data stream). Tools to manage the I/O of data streams
should support:
- Random read and sequential read/write to stream entries. This
functionality
is supported by the ROOT TTree data structure.
- Partial I/O of entries in any one stream. This functionality is
also supported by the ROOT TTree data structure.
- Synchronized I/O of event entries in multiple streams.
This synchronization is facilitated by a stream manager which maintains
a list of all streams, and iterates over the list when a new
entry is requested. (This package is responsible for the synchronization
of event entry data streams only. Synchronizing to detector conditions
records is handled separately.)
Since multiple streams may be spread out over more than one file, we may want
to "mark" the stream entries belonging to the same analysis chain
with a unique id. This unique id could later be used as
a consistency check by the I/O tools to check the validity
of combining the user-specified input data streams.
(This idea comes from the BaBar Kanga(Roo) framework.)
- A method by which users can enable/disable streams. This is
facilitated by a bool variable in each stream which users can toggle.
- Provide tools by which user's can filter event data. This
will take advantage of the partial I/O mechanism to provide
efficient "skimming" of events according to selected variables.
An interface will be provided to facilitate command line
filtering of events. ROOT provides a TTreeFormula class which
facilitates the use of the user-defined expression to select
and cut on attributes of the ROOT TTree (much like cutting on
attributes of a PAW n-tuple). This idea comes from the BaBar
PAF framework which can also be used as a model of how to implement it.
- Provide support for a set of default data streams. An example
of such a set of default data streams,
based on the BaBar model, is discussed under
Organization of Data Streams .
- Provide methods by which users can add their own objects to an existing
data stream (under certain restrictions), or create their own
data stream. In particular, if a data stream is maintained
for "tag" attributes (used for rapid selection of events), user's
should be able to add their own tags to the data stream (but stored
in a user-defined output file).
- Provide support for schema evolution.
The persistency package
should provide support for multiple versions of classes. This
requires investigating the current capabilities of ROOT to provide
automized schema evolution support and the status of its self-describing
object to be used as a tool in this process. BaBar code also offers an
example of how to supplement ROOT to fully support schema evolution.
- Provide tools by which user's have flexibility in the way they select
input data to process.
User's should be able to select input data by specifying:
- filename, or list of filenames.
- run/event, or range of runs/events. A database would be used
to resolve the resolution of run to filename (production-version
and also a user can maintain their own database).
- a time period.
- a collection. User's can select a set of events writing them
out to a "collection", which can be read back in in a later job.
- Organizational Objectives:
- Defining the organizational breakdown of the production data streams.
An example of one such organizational scheme, based on the BaBar
framework, is discussed under Organization of Data Streams .
- Defining the organization of data in the ROOT provided data structures
(TTree's). An obvious way of structuring the relationship of TTree's
to data streams is 1:1, that is one TTree per stream. (This is the
model used by the PAF model.) The TTree branches should be
organized to optimize its use in a particular data stream application.
In general, frequently accessed items should be stored on branches
separate from less frequently accessed items since the TTree data
structure supports partial I/O of selected branches.
- Defining the organization and distribution of files. The data may
be distributed over local and remote disks and tapes. The data should
be distributed in such a way as to facilitate rapid local access to
the most frequently used pieces of data.
Some preliminary ideas on the organizational breakdown of production data
streams can be found here . This model is based on the BaBar framework data stream organization.
Models of Persistency in ROOT framework
Other HEP experiments are already using ROOT as the framework for their persistent data storage, and can be used as models for the design of a MINOS persistency package. Three such packages are:
Presentations
MINOS Persistency package talks have been given at the following meetings:
References
Some books on Object-oriented Database use and design:
[1] C++ Object Databases / David Jordan; Addison Wesley, 1998.
[2] Object data management : object-oriented and extended relational database systems / R.G.G. Cattell; Addison Wesley, 1991.
ROOT references:
[3] ROOT Home Page .
[4] Nick West's MINOS OO Companion .
[5] Fermilab's ROOT Persistency Chapter .
Sue Kasahara
Last Updated: March 20, 2001