Organization of MINOS Data Streams
This is an attempt to document some preliminary ideas on the organization
of MINOS data streams.
Introduction
The organizational design of the MINOS data streams should take into
account two logical views of the data[1]:
- The reconstruction view of the generation of the data
streams.
- The analysis view of the use of the data streams. The user of the
data streams will likely require:
- Rapid local disk access of the most significant physical aspects
of the event.
- A hierarchy of association to the more substantial (in size) pieces
of the event. This hierarchy could, for example, consist of small,
prominent event physics attributes stored on local disk, more
sizeable "micro-DST" data containing summary data about individual
tracks and showers stored on a central site remote disk, and the
full results of production reconstruction stored on tape.
Proposed designs for the organization of the near and far detector
data streams are
given below. These designs are based on the BaBar framework [1] data stream organization.
Event sizes in each case have been
extracted from Reference
[2] and supplemented by guesswork.
Far Detector
A listing of possible far detector data streams is given below along
with the estimated size (for neutrino events only) of each stream component:
- Mct (? Kb/event) Monte Carlo truth info
- Sim (? Kb/event) Monte Carlo hit data
- Raw (2.4 Kb/event, 0.05 Gb/year) Raw event data
- Rec (12 Kb/event, 0.25 Gb/year) Full reconstruction data
- Esd (1.2-2.4 Kb/event,25-50 Mb/year) Event summary data distilled from reconstruction data (assume factor of 5-10 compression).
- Tag (100 bytes/event,2.2 Mb/year) Summary of prominent event physics attributes.
The reconstruction procedural view for far detector data is:
Monte Carlo (Mct,Sim) or DAQ -> Raw -> Rec -> Esd -> Tag.
The user's analysis view is to retrace the hierarchy beginning with the Tag
level. Each stream has its own ROOT Tree. There is a one-to-one
correspondence between the entries of each of the stream Tree's.
Given that disk space is relatively cheap (a low-end PC comes equipped
with at least 5 GB hard drive), all or most of the far detector data can
be stored on disks maintained by the local university groups. Users can
access their local university event store site via the ROOT provided rootd
server.
Near Detector
For near detector data, there is an additional complication introduced
by the grouping of individual neutrino events into a spill data "frame".
A listing of possible near detector data streams is given below along with
the estimated size (for a target radius (<100 cm) inclusive of
conventional physics neutrino events, assuming high energy beam) of each
stream component:
- Mct (? Kb/frame) Monte Carlo truth info (grouped by spill).
- Sim (? Kb/frame) Monte Carlo hit data (grouped by spill).
- Raw (160 Kb/frame (200 events/spill),36 Gb/year) Raw spill data frame (grouped by spill).
- Rec (4 Kb/event, 182 Gb/year ) Full reconstruction data.
- Esd (0.4-0.8 Kb/event, 18-36 Gb/year) Event summary data distilled from reconstruction data (assume factor of 5-10 compression).
- Tag (100 bytes/event, 5 Gb/year) Summary of prominent event physics attributes.
The reconstruction procedural view at the near detector is:
Monte Carlo (Mct,Sim) or DAQ -> Raw -> (via event splitter + reconstruction) Rec -> Esd -> Tag. The analysis view is to retrace the hierarchy beginning
with the Tag level. Each stream has its own ROOT TTree to store
the stream components. The Rec,Esd,
and Tag stream components are stored with one entry per event.
The Raw,Mct, Sim entries are stored with one entry
per frame. A mapping must be maintained between event and frame number.
One scenario for the physical organization of the data, is to
have the Tag data stream files for a period of interest
stored on a user's local disk and most if not all of the remaining data streams available remotely
from a central site disk (served by the ROOT provided rootd server).
References:
[1] S. Patton's BaBar Database Documentation .
[2] Draft version of the MINOS Computing Division MOU.
If you have any comments about this page please send them to
Sue Kasahara
Last Updated: Sept 21, 2000.