Managing the input of MINOS data records
The purpose of this site is to summarize some thoughts on how the
input of the records from the raw and candidate data streams should
be managed.
Some definitions:
- A "stream" is defined as a collection of related records
stored in a single ROOT TTree.
- A record is a single entry in a ROOT TTree containing a header and
collection of data blocks.
There are a number of issues related to the management of the
input of these data records. One of these (the only one discussed
on this site so far) is how to handle the synchronization of the
records from the different streams. Later I hope to expand this site
with discussions of some other related issues, such as how the interface
to JobControl is to be handled.
Synchronization of Input Data Records
I've posted two figures below illustrating the data model that is proposed to be
applied at the calibration and far detector (additional complications
will be introduced at the near detector due to event splitting which
I ignore for now). These figures are based on the
organizational model proposed by Robert, slightly adapted according to my own
understanding after the discussions we had at the Ely meeting.
There are (at least) two different proposed Daq running modes and two different
data model configurations resulting from this:
- A "typical" calibration detector (far detector) run . In this example, the Daq generates the raw data file C100_1.daq
containing 3 data streams. The production offline reconstruction job produces
the files C100_1.cnd containing the results (CandRecords) of the
reconstruction. A second alternative reconstruction job (performed by the
user) generates the output file C100_1.cnd2.
- A dedicated light injection run . This
illustrates a specialized mode of running in which event data from light
injection events is stored for later retrieval along with the light injection
summary entries. Only the Daq generated raw data file is shown.
The main purpose of the diagrams is to illustrate how the records
from the different streams are synchronized and loaded into Mom according to
each record's VldTimeStamp. The idea
illustrated in the diagram is one possible mode of synchronization (I call it
"SyncByValidity") and is very similar to Robert's implementation of
sequencing records by VldTimeStamp in the IoRawDataFile, differing only
slightly in that:
- Multiple records with coincident VldTimeStamp are loaded in the same
Mom entry.
- The idea has been generalized to handle records from any stream type
(raw and reconstruction).
Note that the DaqMonitor stream contains a heterogenous mix of record types,
and that some of these records stored in the DaqMonitor stream may have
identical validity time stamps. These coincident DaqMonitor records of
different types will appear in the same Mom entry, along with any other
coincident records from other input streams.
The initial version of the input stream management classes will support
only this "SyncByValidity" mode of synchronization.
Some discussion points:
- The different data records illustrated in the model can be subdivided
into 2 different data types: constants records (e.g. the BeginRun
record on the DaqMonitor stream or any Dcs record) and data records
(e.g. DaqSnarls, LISummaries, and CandRecords). (The data records can
be further divided according the trigger source: beam trigger,
calibration trigger, etc..)
- Currently there is no data type identification tag to distinguish
between constants records and data records in the record header base
class. Such a tag would allow the input stream management classes to
easily skip over entries containing only constants records if the user
requested this.
- Currently only DaqSnarl headers contain the Run,SubRun, and
Snarl # of the record. If the DaqSnarl stream is missing, the
input stream management classes have no way to advance to a user
specified Run/SubRun/Snarl entry. Since I think there will be
instances when a user will want to analyze candidate streams
without raw data being present, I'd argue that we need an
intermediate header class shared by both the Candidate
and Raw data records which specifies the identifying run,subrun
and snarl #.
- Since the LISummaries are associated with a trigger
(a calibration trigger), I think these should also be marked with
a run/subrun/snarl #. This means having an entry for
calibration triggers in the DaqSnarl tree which consists of
just a header in the case of the typical run, and a header +
data blocks in the case of the dedicated calibration run.
- The handling of the multiple data streams is done through the
application interface provided by the IoInputModule and related classes,
which interface to the Persistency classes to manage the ROOT I/O tasks.
Some rough definitions of interfaces which allow users to move up or down
a set of data streams (using the names in IoDataFile) are:
- Next : allows the user to advance to the next entry (user could opt to have the input stream management classes skip over entries containing only constants records).
- Previous: allows the user to advance to the previous entry. (Same option of skipping over constants records.)
- EventAt : allows the user to advance to
a run/subrun/snarl of interest.
- The file naming scheme shown in the figures follows the proposal of
Robert for handling the naming of raw data files and has the form:
_.
where detector type is "C", "F" or "N". There was some discussion at the
Ely meeting arguing that the extension for these filenames should be ".root".
I feel this is unnecessary since:
- ROOT doesn't need the file extension .root to identify root files (the
file headers contain a root identification).
- All files generated by the online data generators and offline reconstruction are of ROOT formatted type (I think). Users will quickly get used to this.
- The Mom interface will have to be adapted to allow users to distinguish
between records of the class name but with different header types. This is
important because all raw data records are of the same RawRecord class, but
have different header types. There was an e-mail discussion on this
topic some time ago which I can repost if it is of interest.
Also, all records will be stamped with the name of the input stream from
which they originated and can be retrieved from Mom with this input stream
qualifier. This issue has also been discussed in previous e-mails.
Sue Kasahara
Last Updated: July 1, 2001