Geant4 / TSB June 2000 Proposal for handling data analysis in Geant4.

Submitted by Anonymous (not verified) on

Proposal to the collaboration

Up to now the Geant4 collaboration had avoided to take into account histogramming facilities and data analysis in general. This is surprising due to the fact that the physic Geant4 addresses is statistical by nature.

This lack of decision is probably related to the lack of a Geant4-like collaboration for handling the problem of data analysis and to the fact that no HEP consensus around some project exists in this area.

Recently, developers of various analysis systems had converged on some "abstract interfaces" for 1D and 2D histograms ; we can mention (alphabetic order) :

  • G.Barrand (LAL, OpenScientist project)
  • P.Binko (LHCb, Gaudi project)
  • T.Johson (SLAC, JAS project).
  • A.Pfeiffer (CERN-IT, Lizard project).
These two abstract interfaces are the first agreed ones of the AIDA project (Abstract Interfaces for Data Analysis).

This agreement permits to reconsider, at least, the question of histogramming in Geant4. Mainly, through these abstract interfaces, histograms could be manipulated (filled, queried) by a Geant4-user-programmer-tester in an independent way of any underlying histogram package and/or analysis system.

Note that this proposal addresses only the question of histogramming and not the question of tupling. Tuples would be considered later, after that some HEP discussions and agreements about their interfaces will come out (if this ever happens...).


Proposal for a technical implementation

As for visualization (and interactivity in general), there are two questions that must be addressed.

Openness of the kernel

The first point is to guarantee that the Geant4 kernel is sufficently open so that any analysis (visualization) system can use it.

For analysis, this means that the kernel must be sufficiently open so that someone can histogram most of the relevant quantities in an easy way (through action classes ?...).

Must be mentionned that regarding visualization, having at least one "interactive environment" (OPACS), handling the Geant4 kernel with its own strategy, had revealed some locking points (now corrected) and probably other ones will be discovered when doing the same with data analysis tools.

Note that this question of openness must be taken into account anyway, even if the Geant4 collaboration does not want to support any data analysis tool.

A concrete proposition is to let developers of data analysis systems, interested by Geant4, to plug their tools on the Geant4 kernel and have the related code, necessary to access Geant4 kernel, deposited and distributed under the "environment" category.

An immediate proposition is to extend the environment category to :


environmnet/jas
environment/OpenScientist
...
in parallel to :

environment/OPACS
already here for testing the openness of the kernel toward visualization tools. This assumes that relevant people of the jas, OpenScientist,... projects had accepted to join the collaboration.

It is clearly understood that only the "adapter" code is deposited here and not the tools themselves (it is not to Geant4 to distribute OPACS, jas, OpenScientist, AVS, IRIS/Explorer,...)

Introduction of an analysis category

The second point of this technical proposal is the introduction of a new category "analysis" (parallel to the "visualization" one).

The first step of this part is to adopt the AIDA interfaces for histogram 1D and 2D. The idea is that, in their applications (or in Geant4 examples, tests or even in the kernel) people manipulate an agreed API for the histograms (to fill and query them).

But these interfaces alone are not sufficient to provide an histogramming environment. Some classes that will do the link to various analysis systems are needed.

We propose to create a new category, "analysis", to handle these "adapter" classes toward analysis systems. It must be clear that this category is not here to have a Geant4 implementation of an histogram package (Geant4 is a simualtion system not a data analysis system).

The goals of this category are very similar to the visualization category : giving access in a generic way to various analysis systems (graphic systems for visualization) whilst ensuring an openness (and neutral attitude) of the kernel, examples,... toward such systems.

A similar organization than the visualization category is proposed. In parallel to :


source/visualization/managemenr/[include,src]/G4VisManager,...
source/visualization//...
source/visualization/OpenGL/...
source/visualization/OpenInventor/...
source/visualization/OPACS/...
...
we have :

source/analysis/management/[include,src]/G4AnalysisManager,...
source/analysis//...
source/analysis/CLHEP(interface to HBOOK)/...
source/analysis/jas/...
source/analysis/OpenScientist/...
source/analysis/OPACS (yes, we can also do analysis with them)/...
In a similar way than the G4VisManager, a G4AnalysisManager will drive the access to various analysis systems, in the most possible generic way. Note that this task is far from being obvious, due to the huge architectural differences between the analysis systems around.

About the supported analysis systems, the politic could be the same as for visualization and other categories relying on "external" packages ; will be considered, code on which some people, having joined the collaboration, had engaged themselves to do the necessary developments and maintenance of the code (in agreement with the category coordinator).

At the Naruto vis-ui-mini-workshop, a prototype of source/analysis and of a G4AnalysisManager had been done that permits to book, fill, "commit" (store on disk or send to the net) histograms with two systems : jas of Tony Johnson (not (yet) a member of Geant4) and OpenScientist of Guy Barrand (already member).

To give an idea, some real code would look (very drafty) :


...
#ifdef G4ANALYSIS_USE
#include "G4AnalysisManager.hh"
IHistogramFactory* histoFactory = 0;
IHistogram1D* histoEAbs = 0;
#endif
...
void MyRunAction::BeginOfRunAction(const G4Run* aRun) {
...
#ifdef G4ANALYSIS_USE
// Getting an histogram factory :
histoFactory = G4AnalysisManager::CreateHistogramFactory();
//
if(histoFactory) {
// Creating ("booking") an histogram :
histoEAbs = histoFactory->create1D("EAbs",100,0.,1.);
}
#endif
...
}
(not clear yet how to pass the histogram pointers from MyRunAction to MyEventAction in a clean way...)

void MyEventAction::EndOfEventAction(const G4Event* evt) {
...
for (int i=0;iGetEdepAbs();
#ifdef G4ANALYSIS_USE
histoEAbs->fill((*CHC)[i]->GetEdepAbs()/GeV);
#endif
}
...
}

void MyRunAction::EndOfRunAction(const G4Run*) {
...
#ifdef G4ANALYSIS_USE
// To store on disk or send to the net the histograms :
G4AnalysisManager::End(); (or Flush() or Commit();)
#endif
...
}
As for visualization, a maximum of protection for building/using the G4AnalysisManager along with the building/using of the various "plug-in" systems could be reach by the usage of some CPP macros :

G4ANALYSIS_USE
G4ANALYSIS_[BUILD,USE]_
G4ANALYSIS_BUILD_OPEN_SCIENTIST
G4ANALYSIS_USE_OPEN_SCIENTIST
G4ANALYSIS_BUILD_JAS
G4ANALYSIS_USE_JAS
....
The default comportement being that NO analysis is plugged. These macros would be taken into account in config/architecture.gmk and some analysis.gmk in a same way as the G4[VIS,UI]_[BUILD, USE] ones for visualisation and interfaces.

Conclusions

Then this proposal asks two questions and proposes some answers.

Does Geant4 collaboration want to take into account histogramming and data analysis in general ? The fact to have some agreement around interfaces for histograms permits to reconsider the today position of doing nothing related to data analysis.

How to do it ? Two aspects have to be considered :

  • first ensure the openness of the kernel. To do so, we propose to extend the environment category so that "adapter" code related to various systems around could be developed/tested/distributed to see possible locks in the kernel.
  • creation of a new category to offer a common interface to various analysis systems in a same way than interfaces and visualization categories operate toward GUI and visualization systems.

URLs

(By alphabetic order of project names) :

AIDA

Gaudi

jas

Lizard

OpenScientist