Data Analysis Tools G' Wormser, LAL Orsay - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Data Analysis Tools G' Wormser, LAL Orsay

Description:

Very large productivity boost ' in the physicists community with the ... is a potential serious drawback (buggy, undocumented, limited C features, hard ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 72
Provided by: WORM1
Category:

less

Transcript and Presenter's Notes

Title: Data Analysis Tools G' Wormser, LAL Orsay


1
Data Analysis Tools G. Wormser, LAL Orsay
  • The topics
  • End-user data (statistical) Analysis Tools
  • Event Displays
  • (Data Quality Control)
  • The inputs
  • Feedback from LHC/HEP experiments
  • The various analysis packages
  • HEPVis99
  • Personal experience from BABAR
  • The key issues
  • Conclusions

2
Historical perspective PAW
  • Very large  productivity boost  in the
    physicists community with the introduction of a
    universal analysis tool program PAW
  • very easy to use , available everywhere
  • Ntuples, MINUIT, presentation package
  • fortran interpreter
  • macros/script (KUIP, .kumac)
  • No integration within experiments framework
  • No overhead!
  • But not possible to benefit from infrastructure
    (no access to code, constants, data not in
    ntuples,event display)

3
The new environment
  • OO Data structures (ROOT,Objectivity,etc)
  • Analysis codes and tools in OO language
  • We want  PAW_OO !
  • Very large datasets
  • want Better integration within the framework
  • Very powerful CPUs
  • Better interactivity

4
User Basic Requirements
  • Histo and  tuples 
  • Knowledge of the experiment data structure
  • Interpreted OO langage
  • Fitting package
  • Script/macros
  • Presentation package

5
Example Detailed requirements from ATLAS
  • AnT design should be modular and reusable, and
    allow modules addition and deletion without major
    changes to the program.
  • AnT should save and restart analysis procedures
    in the same state as at the exit time.
  • AnT should provide a standard mechanism to store
    information and operations executed in each
    analysis procedure (i.e. information about a
    dataset, selection cuts, calibration data used -
    if attributes were re-calculated in an analysis
    job) to allow their recalculations with identical
    results.
  • AnT should provide a standard mechanism to store
    information on any errors encountered in any data
    manipulation (i.e. fitting, mathematical
    manipulations, display). The information should
    be stored in an object generated by the data
    operations.
  • AnT should provide a standard mechanism to append
    information on the data related to an analysis
    (for example - criteria used to select data and
    conditions used to collect data) to the analysis
    results.
  • AnT should provide a standard mechanism to store
    and view results of the preliminary, the
    intermediate, and the final stage of analysis.
  • AnT should allow viewing of results in the
    interactive form and a possibility to save them,
    if needed, in a standard format for possible
    inclusion in informal and formal publications.
  • AnT should display one or more events
    simultaneously.
  • AnT should make it possible to plot, graph and
    represent graphically in other ways results from
    simple and multiple data sets.
  • AnT should be easy enough to learn its basic
    functionalitys in a short time ( few hours).

6
Technical Requirements
  • Lifetime of the experimentsgtLifetime of the
    packages
  • Coexistence of several packages in one experiment
  • Collaborative development of the packages
  • Modularity
  • Interoperability
  • Evolutionarity
  • Portability
  • Maintenance
  • Documentation
  • Users support
  • User extension

7
The various products
  • ROOT (Statistical Event Display)
  • JAS (Statistical Event Display)
  • LHC (Statistical )
  • OpenScientist (Statistical Event Display)
  • WIRED (Event Display)
  • HippoDraw (Statistical)
  • Colt (Statistical)
  • No purely commercial products !

8
What is ROOT
  • Ambitious replacement for PAW by its main author,
    R. Brun and his group , written in C
  • Covers all aspects of data analysis
  • Data storage (ROOT I/O)
  • Statistical analysis
  • C interpreter CINT
  • Event Display
  • Initially built as all-in-one-package, evolution
    towards more modularity
  •  Open source  approach
  • Large and growing users base

9
ROOT users base
  • ALICE
  • LHCb test beam (Outer tracking)
  • CDF,D0
  • BABAR (see later)
  • JLC
  • STAR and many other nuclear physics projects

10
Root class structure
11
Some ROOT examples from various expts
12
An Online ROOT application from ALICE
13
Fermilab Review committee Evaluation of ROOT
( 98)
  • 1) ROOT is a complete, full-featured package that
    meets the functional requirements
  • 2) There are some trivial unacceptable features
    (use of CMZ, lack of build scripts) which should
    not be a stumbling block, but will require a
    formal collaboration with the ROOT team
  • 3) There is a large, world-wide user base, but so
    far limited use for serious HEP analysis
  • 4) ROOT can cope with the CDF and D0 data models
  • 5) ROOT has an effective internal data format
    well matched to HEP needs
  • 6) The present version of CINT is a potential
    serious drawback (buggy, undocumented, limited
    C features, hard to support, poorly
    engineered). This will require a decision to
    enhance/upgrade/replace, which would require
    significant work.
  • 7) the user interface is not very friendly
  • 8) The interconnectedness of the various modules
    is substantial. External modules must conform to
    (ROOT specific non-standard) ROOT protocols to be
    functional.
  • 9) The package is not highly engineered (ie, it
    has grown organically rather than been designed).
    The current implementation reflects this
    evolution, for example, it has not kept up with
    the C language standard (has its own container
    classes, etc.) Even beyond CINT, the product has
    many bugs.
  • 10) It will require some relatively
    straightforward customization to support casual
    users
  • 11) There is an active and responsive support
    team with good archives and an active mailing
    list

14
Fermilab review Committee recommendations
  • RECOMMENDATIONS FOR RUN II
  • We recommend that ROOT be adopted as the standard
    physics analysis package for Run II, contingent
    on a collaborative agreement with the ROOT team.
    It should be recognized that this recommendation
    depends critically on timing and on sharing
    development with outside collaborators, and the
    steering committee should assess the validity of
    these assumptions in evaluating the
    recommendation. In particular, if the requirement
    for an immediate choice is being driven by
    on-line needs (which may not require the full
    functionality of an off-line analysis package
    immediately), it needs to be determined if the
    components of NIRVANA that already exist are
    adequate for the immediate needs.
  • LONG-TERM RECOMMENDATIONS
  • It is highly likely that by the end of RUN II (or
    by the time of the LHC) that commercial
    components will be heavily used for analysis
    tasks. Commercial offerings should continue to be
    investigated and made available (perhaps on
    limited platforms). The Computing Division should
    also initiate formal collaboration with the LHC
    project so as to have some influence on the
    choices made and direction taken. These two
    initiatives, while lower priority than the
    immediate ROOT support and development needs,
    should position us to take full advantage of
    expected evolution of these products.

15
What is JAS
  • Analysis framework based on JAVA
  • Developped at SLAC by T. Johnson
  • See the presentation by M. Ronan after this talk
  • Aims at similar complete functionality as ROOT
  • Smaller user community (NLC, BABAR online)

16
Java Libraries and APIs
  • Standard Libraries and APIs
  • 2D 3D graphics GUI (Swing) Imaging
    Printing
  • Database connectivity (JDBC) ODMG
  • Collections, IO (Serialization), Data Compression
  • Networking, Sockets, SSL, Corba, RMI
  • Java Beans (components), Help
  • Multimedia, Sound, Speech
  • Security, Code Signing, Cryptography
  • Math, Arbitrary Precision Math
  • Shared Data (Collaborative Applications)
  • Huge Community-Ware software archive
  • IBM alone has hundreds of Java resources on its
    Alphaworks site

17
Remote Data Analysis
TCP/IP Network
Data Analysis Engine
GUI
Padded Cell
Experiment Extensions (Event Display)
  • Data
  • Zebra
  • Jazelle
  • Paw
  • Root
  • Objectivity

Experiment Interface
C Code
18
Plot Display Package
  • 1-d/2-d Histogram/ScatterPlot Display
  • multiple axes, direct user interaction, overlays,
    fitting

19
JAS Availability
  • 1.0 (Beta) currently available
  • Windows (NT, 95, 98) Unix (SolarisLinux)
  • Installed on Solaris at SLAC (/usr/local/bin/jas)
  • Limitations
  • Detailed documentation still under development
  • May still be some changes to user API
  • Download from http//www-sldnt.slac.stanford.edu/
    jas
  • 2.0 Pre-release by July 1
  • More plot types
  • More flexible control of histograms
  • Ability to easily compare multiple datasets
  • More n-tuple handling tools (c.f. HippoDraw)
  • Greatly improved printing

20
More Info
  • Java Analysis Studio
  • http//www-sldnt.slac.stanford.edu/jas
  • Please give us feedback
  • jas-feedback_at_sld-mail.slac.stanford.edu
  • Mailing List
  • http//www.slac.stanford.edu/cgi-bin/lwgate/JAS-L/
  • Also general mailing list for Java in HEP
  • http//www.slac.stanford.edu/cgi-bin/lwgate/HEP-JA
    VA/

21
Some comments on JAS from D. Ferrero Merlino
  • Pro
  • portability, remote execution, GUI
  • Cons
  • Interoperability with C
  • Performance
  • Scripting
  • LCB recommandations
  • look for IRIS Explorer alternatives
  • Investigate JAVA solutions
  • A technical student joined the DAT section in
    July
  • try to integrate HTL and Tags in JAS
  • Evaluate C interoperability

22
The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
23
The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
24
The Colt Distribution - Open Source Libraries
for High Performance Scientific and Technical
Computing in Java
Wolfgang Hoschek CERN IT/PDP
25
Colt
  • Efficient High Level Data structures algorithms
    for
  • Off-line Data Analysis
  • Histogramming
  • Monte Carlo Simulation
  • NTuple like manipulations
  • Approach
  • summon some of the best concepts, designs and
    implementations thought up over time by the
    community
  • port or improve them
  • introduce new approaches where need arises
  • Results so far
  • In overlapping areas competitive or superior to
    toolkits such as STL, Root, HTL, CLHEP, TNT, GSL,
    C-RAND / WIN-RAND, (all C/C) as well as IBM
    Array, JDK 1.2 Collections framework, JGL (all
    Java),
  • in terms of performance (!), functionality and
    (re)usability

26
Colt Conclusions
  • Technology Tracking
  • Java may soon be a major player in performance
    sensitive scientific and technical computing
  • look at LHC time-scale and be prepared for that
  • Colt distribution
  • Users need libraries to get their job done
  • Java lacks foundation toolkits broadly available
    and conveniently accessible in C/C and Fortran
  • Build an infrastructure for scalable scientific
    and technical computing in Java
  • Dont reinvent the wheel - share ressources in
    Open Source efforts
  • Document, package and distribute loosely coupled
    set of libraries under one single uniform
    umbrella
  • Visit http//nicewww.cern.ch/hoschek/colt/index.h
    tm
  • and get your hands dirty...

27
What is LHC
  • The OO replacement of CERNLIB
  • Collaborative approach between CERN/IT division
    and the LHC experiments
  • Initial trend favor commercial products(
    Objectivity, Iris Explorer)
  • Iris Explorer has been rejected by the
    collaborations
  • (No documents available!)
  • Present focus Short term effort to provide a
    new solution

28
View of Interactivity in LHC
  • Explorer based analysis tool was not accepted by
    users
  • Request to create new tool
  • PAW-like functionality (at least)
  • PAW-like interface (command-line)
  • early prototype required
  • with restricted functionality

29
Requirements for analysis tool
  • Based on Abstract Interfaces to packages
  • Histogramming
  • Fitting
  • Plotting
  • Analysis
  • UserInterface
  • Implementation flexible
  • possible to replace packages with minimal impact
    on other parts

30
Components
  • Services
  • HistogramFactory
  • HistogramManager
  • Fitter
  • Plotter
  • Analyzer (dyn. loaded C)
  • uses HistogramManager to register created histos
  • access to all exp. Data/tags/...

31
Components (II)
  • Basic Classes
  • Histograms (1D, 2D for start)
  • Points (1D, 2D)
  • coordinates (with (asymmetric) errors)
  • value (with (asymmetric) errors)
  • VectorOfPoints
  • added value to vectorltPointgt
  • scaling, shifting,
  • IF from histograms to fitting/plotting

32
User interface
  • re-use scripting language(s)
  • use SWIG for IF to python (perl, tcl,
    java(alpha), )
  • class model allows for old-fashioned and
    new-style analysis models
  • hist.plot()
  • vector.fromHistogram(hist) plotter.plot(vector)

33
Status
  • initial design (for prototype) done
  • implementing first prototype
  • Histograms
  • Plotter
  • Fitter
  • VectorOfPoints
  • HistogramManager
  • Analyzer
  • work in progress .
  • more news soon ...

34
What is OpenScientist
  • A ToolKit developped by G. Barrand (LAL Orsay)
  • Very strong focus on interoperability of various
    packages and collaborative development
  • Integrated into the HEPVis collab.
  • Limited user base

35
The key to openness
THistoTObject
Rio
.root file
Histo
Obj
.DB file
d_HistoooObj
SoPlotter
__
SbPlottedHistogram
Use the adapter pattern
36
The NxM issue
  • A nice idea automatic production of adapters.
  • Example SWIG

Tcl
Tclgt histo
tcl_Histo
Histo
python_Histo
Python
gt histo
?
SWIG
jni_Histo
JAVA
histo
37
Large Array Set
  • Huge tuples break the UAF model for storage !
  • Introduce the notion of Large Array.

Storage
.s file
StorageArray
Array
Storage2
Storage2Array
.s2 file
VLargeArray
TBranch
.root file
Rio
ooArray
.DB file
Obj
38
OpenScientist Status
  • Rio, Riot the file IO system of ROOT put in a
    stand alone package (free software).
  • Objectivity a commercial object database.
  • Mesa a free implementation of OpenGL.
  • SoFree a free implementation of Open Inventor.
  • SGI or TGS Inventor commercial implementations
    of Inventor.
  • HEPVis a free collaborative set of classes over
    Open Inventor.
  • Tcl a scripting language.
  • KUIP the CERN/PAW command language put in a
    stand alone package.
  • Lab the top  Hub  package that ties
    subpackages together to present a coherant
    environment to work with
  • HCL a home made histogram package
  • Midnight the rewritting of Minuit in C by
    R.Brun, put in a stand alone package.
  • It runs on NT and UNIX. It coworks now with
    Geant4 (display and plotting).

39
A Open Scientist session
40
The various approaches from the experiments
  • ALICE ROOT(AliROOT)
  • CMS/ATLAS/LHCb prospect/evaluate
  • BABAR No official tool, ie PAW (JAS online,
    ROOT)
  • CDF/D0 ROOT for RunII

41
Some words about AliROOT
  • The ROOT framework will provide to ALICE
  • Data Storage
  • On-line monitoring
  • Statistical analysis
  • Event Display

42
Alice Framework
43
(No Transcript)
44
CMS/ATLAS/LHCb approach
  • Define their data model and framework
    independently (eg GAUDI/LHCb, CARF/CMS)
  • Objectivity for persistency
  • Close collaboration with LHC effort
  • Evaluate as many products as resonable using test
    beam stands
  • (Produce documents!)
  • Invest on Event Displays (ATLAS, CMS)

45
LHCb strategy

46
LHCb strategy (2)
  • Common problems HEP-Analysis
  • Foundation Libraries (ex NAG, CLHEP)
  • Toolkits(ex HTL)
  • LHCb specific Analysis Tools, some will make use
    of HEP-wide toolkits
  • mathematical Libraries
  • Histogramming
  • Fitting and Minimization
  • Visualization
  • Data Access
  • Components exist in different stages but what
    about their interfaces?
  • LHC is planning to create interfaces on
    existing packages

47
Atlas Web Page
48
CMS Software Task Breakdown
49
Tracker TestBeam Online Monitoring
50
The trends at HEPVis99
  • Collaborative environment
  • Try to define common interfaces
  • The Open source approach
  • How to get out of  One man-one tool ?
  • Distributed environment
  • IDL/CORBA/JAVA
  • No ROOT participation

51
The near future
  • LHC basic histos in a few weeks
  • HEPVis collaboration
  • How to ease convergence?
  • Academic Software Organization
  • http//www.lal.in2p3.fr/HEPVis99/ASO/ASO.html
  • This organisation has been founded by an
    international group of computing scientists,
    engineers, physicists, in 1999 to help the
    development of software tools for academic
    scientific research in an international and
    collaborative way.
  • A first target of this group is to extract a web
    based working organisation model aiming, in a
    first step, at the production of interactive data
    analysis tools for high energy and nuclear physic
    experiments.
  • We hope that this model will be sufficently
    general and efficient to apply to other domains

52
The event displays
  • Goals
  • Code debugging
  • Event debugging
  • Quality control
  • Huge underlying technology potential
  • User interface toolkit
  • Much closer integration in  framework 

53
Event Displays Approaches and Contributions
  • ALICE ROOT
  • ATLAS WIRED (see J. Hrivnac talk)
  • CMS Qt, Iguana, HEPVis
  • BABAR WIRED with CORBA

54
Event Display requirements
  • A/Code debugging
  • Needed very early in the development
  • Integration with simulated objects
  • Compatibility with GEANT4!
  • B/Online display
  • batch mode
  • Access to RAW objects
  • Speed
  • C/offline analysis
  • Integration with reco framework (interactivity)
  • Flexibility
  • Public relations

55
ALICE Geant3 geometry display with ROOT
56
CMS Interactive Graphical User Analysis (IGUANA)
  • Interactive Detector and Event Visualisation
    (CMSCAN)
  • Physics Analysis Tools
  • (Graphical) User Interfaces
  • Tasks include
  • Assessment of HEP-wide and commercial tools
  • Development of missing and CMS-specific
    components
  • (e.g. Detector and Event Visualisation
    systems)
  • Design and implementation of (Graphical) User
    Interfaces for CMS
  • software systems (ORCA, OSCAR, test
    beam, PRS,...)
  • Working closely with and contributing to HEP-wide
    projects
  • (e.g. LHC, HEPVis, GEANT4, etc.)
  • Deployment, distribution, and support in the CMS
    environment

57
A General Idea of a User Application
58
CMS Detector and Event Visualization in IGUANA
  • Generic software developed (collaborate with
    HEPVis CDF, D0, L3,...)
  • Interactive Graphical User Interface and graphics
    manager
  • Deployed with ORCA (detector elements and
    reconstructed objects)
  • Extend to test-beams and OSCAR (GEANT4 for CMS)
    by end of 1999

59
Event Displays The new trends
  • Hepvis library
  • OpenInventor, SoFree
  • WIRED
  • CORBA

60
WIRED Client-Server/File Architecture
WIRED Application
WIRED Server
Geometry and Events
Geometry and Events
WIRED Code
WIRED Gateway
WWW Browser
WIRED Applet
WWWServer
WIRED Code
61
GUI (inside Netscape Browser)
62
WIRED connected to services via bus
External Bus
Event Viewer
Event Data Server
Geometry Data Server
External Bus
  • to access Data
  • to access other Services
  • to enable Collaboration

Event Viewer
State Manager
Reconstruction Server
63
The BABAR experience
  • Main characteristics
  • Data and constants stored in Objectivity
  • Very large statistics for a start-up (do not plan
    against you!)(Best achieved 45 pb-1/day, 1.05
    1033)
  • 1 fb-1 in 4 months (as many B-B pairs as LEP in 6
    years)
  • Mostly uncalibrated detector at run start
  • Too early to draw definitive conclusions from
    observed performances

64
TheBABAR Tools
  • Prompt reconstruction immediately following data
    taking
  • REC,AOD,TAG data stored in database
  • (AODmicroDST)
  • AOD also available as PAW-Ntuples as a temporary
    initial measure
  • Event display incorporated in the framework

65
Some Confirmations
  • Possible to do some zero-order physics at AOD
    level(Also meaning not possible to do First
    order physics at this level!)
  • Calibrations need REC
  • Detector performance/ Detector Understanding
    needs AODpartial REC
  • Event display essential

66
BABAR initial constraints
  • No export
  • just working now, 1 M AOD evts at Lyon and RAL
  • CPU limitations and slow turnaround
  • Restricted access to data
  • Rolling calibration scheme not yet implemented

67
The initial problems
  • Slow access to datagtInsufficient calibration up
    to now and not yet optimal detector performances
  • Providing very easy standalone access to REC data
    would speed up the process
  • 15-30 of total stats available to the average
    user not too good, not too bad!
  • Review Committee in August 1999

MC width 5-7 MeV/c
68
Babar software committee recommendations
  • Provide users with another fast access to data
    ROOT/IO files based
  • batch access to ROOT/IO files in a first step
  • BABAR code interactive in ROOT (/) at the end of
    the year
  • Send data to regional centers to reduce the
    burden at SLAC
  • Put in place a Risk management plan to assess
    Objectivity progress towards design performances
  • Resources management at SLAC
  • Duplicate the Opr farm to allow development in
    parallel with production

69
Conclusions
  • A lot of technology exists
  • Statistical tools
  • Paw very succesful. Need collaboration towards
     PAW_OO. Still some more work on requirements
  • ROOT/LHC/OpenScientist/JAS present front
    runners
  • JAVA interface with C (CORBA)
  • Event displays much more connected to the
    experiments. Trends is to distributed computing
    JAS/WIRED and/or more integration (ROOT)
  • Do not forget human factors!

70
The key issues
  • No main underlying technical issues
  • Integration
  • Data model/Statistical tool
  • Statistical tool/Event Display
  • Interoperability
  • 1 experiment and several outside packages
  •  Build your own  package
  • Collaborative effort
  • Time scale

71
CMS Software Milestones
Sept 1999
  • The OO Proof of Concept phase has been completed
  • The Functional Prototype phase is well
    underway
  • CMS must provide functional software by end 1999
    / beginning 2000
Write a Comment
User Comments (0)
About PowerShow.com