Title: Data Analysis Tools G' Wormser, LAL Orsay
1Data Analysis Tools G. Wormser, LAL Orsay
- The topics
- End-user data (statistical) Analysis Tools
- Event Displays
- (Data Quality Control)
- The inputs
- Feedback from LHC/HEP experiments
- The various analysis packages
- HEPVis99
- Personal experience from BABAR
- The key issues
- Conclusions
2Historical perspective PAW
- Very large productivity boost in the
physicists community with the introduction of a
universal analysis tool program PAW - very easy to use , available everywhere
- Ntuples, MINUIT, presentation package
- fortran interpreter
- macros/script (KUIP, .kumac)
- No integration within experiments framework
- No overhead!
- But not possible to benefit from infrastructure
(no access to code, constants, data not in
ntuples,event display)
3The new environment
- OO Data structures (ROOT,Objectivity,etc)
- Analysis codes and tools in OO language
- We want PAW_OO !
- Very large datasets
- want Better integration within the framework
- Very powerful CPUs
- Better interactivity
4User Basic Requirements
- Histo and tuples
- Knowledge of the experiment data structure
- Interpreted OO langage
- Fitting package
- Script/macros
- Presentation package
5Example Detailed requirements from ATLAS
- AnT design should be modular and reusable, and
allow modules addition and deletion without major
changes to the program. - AnT should save and restart analysis procedures
in the same state as at the exit time. - AnT should provide a standard mechanism to store
information and operations executed in each
analysis procedure (i.e. information about a
dataset, selection cuts, calibration data used -
if attributes were re-calculated in an analysis
job) to allow their recalculations with identical
results. - AnT should provide a standard mechanism to store
information on any errors encountered in any data
manipulation (i.e. fitting, mathematical
manipulations, display). The information should
be stored in an object generated by the data
operations. - AnT should provide a standard mechanism to append
information on the data related to an analysis
(for example - criteria used to select data and
conditions used to collect data) to the analysis
results. - AnT should provide a standard mechanism to store
and view results of the preliminary, the
intermediate, and the final stage of analysis. - AnT should allow viewing of results in the
interactive form and a possibility to save them,
if needed, in a standard format for possible
inclusion in informal and formal publications. - AnT should display one or more events
simultaneously. - AnT should make it possible to plot, graph and
represent graphically in other ways results from
simple and multiple data sets. - AnT should be easy enough to learn its basic
functionalitys in a short time ( few hours).
6Technical Requirements
- Lifetime of the experimentsgtLifetime of the
packages - Coexistence of several packages in one experiment
- Collaborative development of the packages
- Modularity
- Interoperability
- Evolutionarity
- Portability
- Maintenance
- Documentation
- Users support
- User extension
7The various products
- ROOT (Statistical Event Display)
- JAS (Statistical Event Display)
- LHC (Statistical )
- OpenScientist (Statistical Event Display)
- WIRED (Event Display)
- HippoDraw (Statistical)
- Colt (Statistical)
- No purely commercial products !
8What is ROOT
- Ambitious replacement for PAW by its main author,
R. Brun and his group , written in C - Covers all aspects of data analysis
- Data storage (ROOT I/O)
- Statistical analysis
- C interpreter CINT
- Event Display
- Initially built as all-in-one-package, evolution
towards more modularity - Open source approach
- Large and growing users base
9ROOT users base
- ALICE
- LHCb test beam (Outer tracking)
- CDF,D0
- BABAR (see later)
- JLC
- STAR and many other nuclear physics projects
10Root class structure
11Some ROOT examples from various expts
12An Online ROOT application from ALICE
13Fermilab Review committee Evaluation of ROOT
( 98)
- 1) ROOT is a complete, full-featured package that
meets the functional requirements - 2) There are some trivial unacceptable features
(use of CMZ, lack of build scripts) which should
not be a stumbling block, but will require a
formal collaboration with the ROOT team - 3) There is a large, world-wide user base, but so
far limited use for serious HEP analysis - 4) ROOT can cope with the CDF and D0 data models
- 5) ROOT has an effective internal data format
well matched to HEP needs - 6) The present version of CINT is a potential
serious drawback (buggy, undocumented, limited
C features, hard to support, poorly
engineered). This will require a decision to
enhance/upgrade/replace, which would require
significant work. - 7) the user interface is not very friendly
- 8) The interconnectedness of the various modules
is substantial. External modules must conform to
(ROOT specific non-standard) ROOT protocols to be
functional. - 9) The package is not highly engineered (ie, it
has grown organically rather than been designed).
The current implementation reflects this
evolution, for example, it has not kept up with
the C language standard (has its own container
classes, etc.) Even beyond CINT, the product has
many bugs. - 10) It will require some relatively
straightforward customization to support casual
users - 11) There is an active and responsive support
team with good archives and an active mailing
list
14Fermilab review Committee recommendations
- RECOMMENDATIONS FOR RUN II
- We recommend that ROOT be adopted as the standard
physics analysis package for Run II, contingent
on a collaborative agreement with the ROOT team.
It should be recognized that this recommendation
depends critically on timing and on sharing
development with outside collaborators, and the
steering committee should assess the validity of
these assumptions in evaluating the
recommendation. In particular, if the requirement
for an immediate choice is being driven by
on-line needs (which may not require the full
functionality of an off-line analysis package
immediately), it needs to be determined if the
components of NIRVANA that already exist are
adequate for the immediate needs. - LONG-TERM RECOMMENDATIONS
- It is highly likely that by the end of RUN II (or
by the time of the LHC) that commercial
components will be heavily used for analysis
tasks. Commercial offerings should continue to be
investigated and made available (perhaps on
limited platforms). The Computing Division should
also initiate formal collaboration with the LHC
project so as to have some influence on the
choices made and direction taken. These two
initiatives, while lower priority than the
immediate ROOT support and development needs,
should position us to take full advantage of
expected evolution of these products.
15What is JAS
- Analysis framework based on JAVA
- Developped at SLAC by T. Johnson
- See the presentation by M. Ronan after this talk
- Aims at similar complete functionality as ROOT
- Smaller user community (NLC, BABAR online)
16Java Libraries and APIs
- Standard Libraries and APIs
- 2D 3D graphics GUI (Swing) Imaging
Printing - Database connectivity (JDBC) ODMG
- Collections, IO (Serialization), Data Compression
- Networking, Sockets, SSL, Corba, RMI
- Java Beans (components), Help
- Multimedia, Sound, Speech
- Security, Code Signing, Cryptography
- Math, Arbitrary Precision Math
- Shared Data (Collaborative Applications)
- Huge Community-Ware software archive
- IBM alone has hundreds of Java resources on its
Alphaworks site
17Remote Data Analysis
TCP/IP Network
Data Analysis Engine
GUI
Padded Cell
Experiment Extensions (Event Display)
- Data
- Zebra
- Jazelle
- Paw
- Root
- Objectivity
Experiment Interface
C Code
18Plot Display Package
- 1-d/2-d Histogram/ScatterPlot Display
- multiple axes, direct user interaction, overlays,
fitting
19JAS Availability
- 1.0 (Beta) currently available
- Windows (NT, 95, 98) Unix (SolarisLinux)
- Installed on Solaris at SLAC (/usr/local/bin/jas)
- Limitations
- Detailed documentation still under development
- May still be some changes to user API
- Download from http//www-sldnt.slac.stanford.edu/
jas - 2.0 Pre-release by July 1
- More plot types
- More flexible control of histograms
- Ability to easily compare multiple datasets
- More n-tuple handling tools (c.f. HippoDraw)
- Greatly improved printing
20More Info
- Java Analysis Studio
- http//www-sldnt.slac.stanford.edu/jas
- Please give us feedback
- jas-feedback_at_sld-mail.slac.stanford.edu
- Mailing List
- http//www.slac.stanford.edu/cgi-bin/lwgate/JAS-L/
- Also general mailing list for Java in HEP
- http//www.slac.stanford.edu/cgi-bin/lwgate/HEP-JA
VA/
21Some comments on JAS from D. Ferrero Merlino
- Pro
- portability, remote execution, GUI
- Cons
- Interoperability with C
- Performance
- Scripting
- LCB recommandations
- look for IRIS Explorer alternatives
- Investigate JAVA solutions
- A technical student joined the DAT section in
July - try to integrate HTL and Tags in JAS
- Evaluate C interoperability
22The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
23The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
24The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
25Colt
- Efficient High Level Data structures algorithms
for - Off-line Data Analysis
- Histogramming
- Monte Carlo Simulation
- NTuple like manipulations
- Approach
- summon some of the best concepts, designs and
implementations thought up over time by the
community - port or improve them
- introduce new approaches where need arises
- Results so far
- In overlapping areas competitive or superior to
toolkits such as STL, Root, HTL, CLHEP, TNT, GSL,
C-RAND / WIN-RAND, (all C/C) as well as IBM
Array, JDK 1.2 Collections framework, JGL (all
Java), - in terms of performance (!), functionality and
(re)usability
26Colt Conclusions
- Technology Tracking
- Java may soon be a major player in performance
sensitive scientific and technical computing - look at LHC time-scale and be prepared for that
- Colt distribution
- Users need libraries to get their job done
- Java lacks foundation toolkits broadly available
and conveniently accessible in C/C and Fortran - Build an infrastructure for scalable scientific
and technical computing in Java - Dont reinvent the wheel - share ressources in
Open Source efforts - Document, package and distribute loosely coupled
set of libraries under one single uniform
umbrella - Visit http//nicewww.cern.ch/hoschek/colt/index.h
tm - and get your hands dirty...
27What is LHC
- The OO replacement of CERNLIB
- Collaborative approach between CERN/IT division
and the LHC experiments - Initial trend favor commercial products(
Objectivity, Iris Explorer) - Iris Explorer has been rejected by the
collaborations - (No documents available!)
- Present focus Short term effort to provide a
new solution
28View of Interactivity in LHC
- Explorer based analysis tool was not accepted by
users - Request to create new tool
- PAW-like functionality (at least)
- PAW-like interface (command-line)
- early prototype required
- with restricted functionality
29Requirements for analysis tool
- Based on Abstract Interfaces to packages
- Histogramming
- Fitting
- Plotting
- Analysis
- UserInterface
- Implementation flexible
- possible to replace packages with minimal impact
on other parts
30Components
- Services
- HistogramFactory
- HistogramManager
- Fitter
- Plotter
- Analyzer (dyn. loaded C)
- uses HistogramManager to register created histos
- access to all exp. Data/tags/...
31Components (II)
- Basic Classes
- Histograms (1D, 2D for start)
- Points (1D, 2D)
- coordinates (with (asymmetric) errors)
- value (with (asymmetric) errors)
- VectorOfPoints
- added value to vectorltPointgt
- scaling, shifting,
- IF from histograms to fitting/plotting
32User interface
- re-use scripting language(s)
- use SWIG for IF to python (perl, tcl,
java(alpha), ) - class model allows for old-fashioned and
new-style analysis models - hist.plot()
- vector.fromHistogram(hist) plotter.plot(vector)
33Status
- initial design (for prototype) done
- implementing first prototype
- Histograms
- Plotter
- Fitter
- VectorOfPoints
- HistogramManager
- Analyzer
- work in progress .
- more news soon ...
34What is OpenScientist
- A ToolKit developped by G. Barrand (LAL Orsay)
- Very strong focus on interoperability of various
packages and collaborative development - Integrated into the HEPVis collab.
- Limited user base
35The key to openness
THistoTObject
Rio
.root file
Histo
Obj
.DB file
d_HistoooObj
SoPlotter
__
SbPlottedHistogram
Use the adapter pattern
36The NxM issue
- A nice idea automatic production of adapters.
- Example SWIG
Tcl
Tclgt histo
tcl_Histo
Histo
python_Histo
Python
gt histo
?
SWIG
jni_Histo
JAVA
histo
37Large Array Set
- Huge tuples break the UAF model for storage !
- Introduce the notion of Large Array.
Storage
.s file
StorageArray
Array
Storage2
Storage2Array
.s2 file
VLargeArray
TBranch
.root file
Rio
ooArray
.DB file
Obj
38OpenScientist Status
- Rio, Riot the file IO system of ROOT put in a
stand alone package (free software). - Objectivity a commercial object database.
- Mesa a free implementation of OpenGL.
- SoFree a free implementation of Open Inventor.
- SGI or TGS Inventor commercial implementations
of Inventor. - HEPVis a free collaborative set of classes over
Open Inventor. - Tcl a scripting language.
- KUIP the CERN/PAW command language put in a
stand alone package. - Lab the top Hub package that ties
subpackages together to present a coherant
environment to work with - HCL a home made histogram package
- Midnight the rewritting of Minuit in C by
R.Brun, put in a stand alone package. - It runs on NT and UNIX. It coworks now with
Geant4 (display and plotting).
39A Open Scientist session
40The various approaches from the experiments
- ALICE ROOT(AliROOT)
- CMS/ATLAS/LHCb prospect/evaluate
- BABAR No official tool, ie PAW (JAS online,
ROOT) - CDF/D0 ROOT for RunII
41Some words about AliROOT
- The ROOT framework will provide to ALICE
- Data Storage
- On-line monitoring
- Statistical analysis
- Event Display
42Alice Framework
43(No Transcript)
44CMS/ATLAS/LHCb approach
- Define their data model and framework
independently (eg GAUDI/LHCb, CARF/CMS) - Objectivity for persistency
- Close collaboration with LHC effort
- Evaluate as many products as resonable using test
beam stands - (Produce documents!)
- Invest on Event Displays (ATLAS, CMS)
45LHCb strategy
46LHCb strategy (2)
- Common problems HEP-Analysis
- Foundation Libraries (ex NAG, CLHEP)
- Toolkits(ex HTL)
- LHCb specific Analysis Tools, some will make use
of HEP-wide toolkits - mathematical Libraries
- Histogramming
- Fitting and Minimization
- Visualization
- Data Access
- Components exist in different stages but what
about their interfaces? - LHC is planning to create interfaces on
existing packages
47Atlas Web Page
48CMS Software Task Breakdown
49Tracker TestBeam Online Monitoring
50The trends at HEPVis99
- Collaborative environment
- Try to define common interfaces
- The Open source approach
- How to get out of One man-one tool ?
- Distributed environment
- IDL/CORBA/JAVA
- No ROOT participation
51The near future
- LHC basic histos in a few weeks
- HEPVis collaboration
- How to ease convergence?
- Academic Software Organization
- http//www.lal.in2p3.fr/HEPVis99/ASO/ASO.html
- This organisation has been founded by an
international group of computing scientists,
engineers, physicists, in 1999 to help the
development of software tools for academic
scientific research in an international and
collaborative way. - A first target of this group is to extract a web
based working organisation model aiming, in a
first step, at the production of interactive data
analysis tools for high energy and nuclear physic
experiments. - We hope that this model will be sufficently
general and efficient to apply to other domains
52The event displays
- Goals
- Code debugging
- Event debugging
- Quality control
- Huge underlying technology potential
- User interface toolkit
- Much closer integration in framework
53Event Displays Approaches and Contributions
- ALICE ROOT
- ATLAS WIRED (see J. Hrivnac talk)
- CMS Qt, Iguana, HEPVis
- BABAR WIRED with CORBA
54Event Display requirements
- A/Code debugging
- Needed very early in the development
- Integration with simulated objects
- Compatibility with GEANT4!
- B/Online display
- batch mode
- Access to RAW objects
- Speed
- C/offline analysis
- Integration with reco framework (interactivity)
- Flexibility
- Public relations
55ALICE Geant3 geometry display with ROOT
56CMS Interactive Graphical User Analysis (IGUANA)
- Interactive Detector and Event Visualisation
(CMSCAN) - Physics Analysis Tools
- (Graphical) User Interfaces
- Tasks include
- Assessment of HEP-wide and commercial tools
- Development of missing and CMS-specific
components - (e.g. Detector and Event Visualisation
systems) - Design and implementation of (Graphical) User
Interfaces for CMS - software systems (ORCA, OSCAR, test
beam, PRS,...) - Working closely with and contributing to HEP-wide
projects - (e.g. LHC, HEPVis, GEANT4, etc.)
- Deployment, distribution, and support in the CMS
environment
57A General Idea of a User Application
58CMS Detector and Event Visualization in IGUANA
- Generic software developed (collaborate with
HEPVis CDF, D0, L3,...) - Interactive Graphical User Interface and graphics
manager - Deployed with ORCA (detector elements and
reconstructed objects) - Extend to test-beams and OSCAR (GEANT4 for CMS)
by end of 1999
59Event Displays The new trends
- Hepvis library
- OpenInventor, SoFree
- WIRED
- CORBA
60WIRED Client-Server/File Architecture
WIRED Application
WIRED Server
Geometry and Events
Geometry and Events
WIRED Code
WIRED Gateway
WWW Browser
WIRED Applet
WWWServer
WIRED Code
61GUI (inside Netscape Browser)
62WIRED connected to services via bus
External Bus
Event Viewer
Event Data Server
Geometry Data Server
External Bus
- to access Data
- to access other Services
- to enable Collaboration
Event Viewer
State Manager
Reconstruction Server
63The BABAR experience
- Main characteristics
- Data and constants stored in Objectivity
- Very large statistics for a start-up (do not plan
against you!)(Best achieved 45 pb-1/day, 1.05
1033) - 1 fb-1 in 4 months (as many B-B pairs as LEP in 6
years) - Mostly uncalibrated detector at run start
- Too early to draw definitive conclusions from
observed performances
64TheBABAR Tools
- Prompt reconstruction immediately following data
taking - REC,AOD,TAG data stored in database
- (AODmicroDST)
- AOD also available as PAW-Ntuples as a temporary
initial measure - Event display incorporated in the framework
65Some Confirmations
- Possible to do some zero-order physics at AOD
level(Also meaning not possible to do First
order physics at this level!) - Calibrations need REC
- Detector performance/ Detector Understanding
needs AODpartial REC - Event display essential
66 BABAR initial constraints
- No export
- just working now, 1 M AOD evts at Lyon and RAL
- CPU limitations and slow turnaround
- Restricted access to data
- Rolling calibration scheme not yet implemented
67The initial problems
- Slow access to datagtInsufficient calibration up
to now and not yet optimal detector performances - Providing very easy standalone access to REC data
would speed up the process - 15-30 of total stats available to the average
user not too good, not too bad! - Review Committee in August 1999
MC width 5-7 MeV/c
68Babar software committee recommendations
- Provide users with another fast access to data
ROOT/IO files based - batch access to ROOT/IO files in a first step
- BABAR code interactive in ROOT (/) at the end of
the year - Send data to regional centers to reduce the
burden at SLAC - Put in place a Risk management plan to assess
Objectivity progress towards design performances - Resources management at SLAC
- Duplicate the Opr farm to allow development in
parallel with production
69Conclusions
- A lot of technology exists
- Statistical tools
- Paw very succesful. Need collaboration towards
PAW_OO. Still some more work on requirements - ROOT/LHC/OpenScientist/JAS present front
runners - JAVA interface with C (CORBA)
- Event displays much more connected to the
experiments. Trends is to distributed computing
JAS/WIRED and/or more integration (ROOT) - Do not forget human factors!
70The key issues
- No main underlying technical issues
- Integration
- Data model/Statistical tool
- Statistical tool/Event Display
- Interoperability
- 1 experiment and several outside packages
- Build your own package
- Collaborative effort
- Time scale
71CMS Software Milestones
Sept 1999
- The OO Proof of Concept phase has been completed
- The Functional Prototype phase is well
underway - CMS must provide functional software by end 1999
/ beginning 2000