A Comparison of Data Analysis Packages - PowerPoint PPT Presentation

About This Presentation
Title:

A Comparison of Data Analysis Packages

Description:

Customized GUIs. Accessing over web. A comparison of data analysis packages. CHEP2000 9-Feb 2000 ... enables you to build intuitive GUIs with drag-and-drop ease. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 38
Provided by: Computing78
Category:

less

Transcript and Presenter's Notes

Title: A Comparison of Data Analysis Packages


1
A Comparison of Data Analysis Packages
  • Irwin Gaines, Jeff Kallenbach
  • Fermilab

2
Outline
  • Introduction a little history
  • Build vs. Buy general considerations
  • User Requirements
  • Basic Features
  • Advanced features
  • Conclusions

3
Introduction
  • Previous generation HEP experiments have used a
    ubiquitous homemade product PAW
  • Why? Commercial systems did not offer either
    functionality or, more important, performance
  • Use of a universal product allows
  • data sharing (ntuple files)
  • procedure and environment sharing (kumac files)

4
Build vs. Buy
  • Old days (70s-80s) in house development effort
    free, any software purchase is expensive
  • More recently(90s)attractive licensing terms,
    development costs should be amortized over as
    large a user base as possible, Support?
  • Now Consider full product lifetime costs,
    including development, licensing, support. Does
    product need to be customized or enhanced to meet
    HEP needs?

build
buy
5
Project Scope
  • Selecting events based on programmed selection
    criteria
  • Preparing various statistical distributions of
    various mathematical functions of data in the
    selected events
  • Linking in high level language programs to
    process event data prior to plotting
  • Modifying selection criteria and plotted
    functions interactively
  • Fitting the distributions
  • Comparing and performing calculations on
    different distributions
  • Preserving selection criteria and functions for
    later use or to pass to others
  • Saving samples of events in a variety of
    specialized formats for later analysis
  • Accessing these specially formatted event samples
    to make plots, fits, statistical outputs, etc.

6
User Requirements
  • Web reference http//www.fnal.gov/projects/runii/
    pasrec/
  • Data Access
  • Data Analysis
  • Data Presentation
  • Usability
  • Support and Maintenance

7
User Requirements Data Access
  • Access rates (online)
  • Access rates (offline)
  • Serial vs. random access
  • Granularity of access
  • Foreign I/O Formats
  • Specialized optimized output formats

8
User Requirements Data Analysis
  • Scripting language
  • User control
  • Data selection
  • Input/Output
  • Numerical and mathematical functionality
  • Offline compatibility
  • Prototyping

9
User Requirements Data Presentation
  • Interactive visualization
  • Presentation quality graphical output
  • Formal publication graphical output

10
User Requirements Usability
  • Batch vs. interactive
  • Sharing data structures
  • Shared access by several clients
  • Parallel processing (using distinct data streams)
  • Debugging and profiling
  • Modularity (user code)
  • Modularity (system code)
  • Access to source code
  • Robustness
  • Web based documentation
  • Use of standards
  • Portability
  • Scalability
  • Performance
  • User Friendliness

11
User Requirements Support
  • Maturity
  • customer base
  • product lifetime
  • product survivability
  • product support
  • licensing

12
User Requirements Maintenance
  • who provides maintenance
  • what does it cost
  • maintenance infrastructure
  • maturity and completeness
  • modularity
  • portability
  • standards
  • reliability and security
  • application specific issues

13
Main Contenders
  • Homemade package ROOT
  • Commercial Package IDL (other commercial
    packages offer similar features IDL appeared to
    be most aggressive in licensing terms)

14
Basic Features
  • plotting
  • fitting
  • event selection
  • command languages
  • event I/O

15
Gee Whiz plots
16
Plots, Fits, Event selection
  • ROOT from browser, from tree viewer, from
    command line
  • All plots are active,can be manipulated, saved
    for later use, printed in a variety of formats
  • IDLcommand line examples on following slides
  • plots can be either static or active, displayed
    or printed

17
Displaying a Histogram
Open the a root file Browse the file
  • Display a histogram
  • The Canvas

18
Fitting, Coloring, and Zooming
  • Adding a gaussian fit
  • Coloring the histogram
  • Zooming

19
The Tree Viewer
  • Tree Viewer buttons
  • Variables
  • Slider
  • XYZ
  • Draw, Scan, Break
  • Ilist, Olist
  • Gopt
  • Weight

20
Scripting language
  • ROOT
  • CINT C interpreter (almost full C syntax)
  • commands are methods of root classes
  • Full access to compiled code (in any language)
  • IDL
  • natural control language (see examples)
  • commands are part of scripting syntax
  • full access to compiled code (in any language)

21
IDL command language
chain"d3_51.nhis","d3_68.nhis","d3_99.nhis","d3_
19.nhis","d3_04.nhis"
  • concatenate several files of ntuples
  • read in a variable
  • event selection (cut on several variables)
  • plot histogram

masshtGetVar(chain,"Rmass")
cut4where(lsig gt 5 and iso1 lt .05 and clsec gt
.05 and iso2 lt .03)
plot,histogram(mass(cut4),binsizemybin)
22
IDL Command Language
  • Fit plot and draw fit
  • plot-gtliveplot for interactive plots

dist histogram(mass(cut4),binsizemybin) xfindg
en(134)mybin1.7 dfitgaussfit(x,dist,a) plo
t,x,dist oplot,x,dfit,color20
23
(No Transcript)
24
(No Transcript)
25
Reading ntuples with IDL
ht2IDL - An Interface between HEP Data files and
IDL As part of our investigation of the
Interactive Data Language (IDL) for use in our
environment, we have assembled a prototype of
what we call ht2IDL (for "hepTuple to IDL). The
is a small package of C code and IDL procedure
files which enable the user to access HEP data
stores, such as HBOOK files, from the IDL
session. It uses the HepTuple package from PAT.
How the package works Like most modern tools,
IDL provides the capability to interface with
external functions written by the user. This is
accomplished by writing some code, using a
C-based interface, then compiling it and linking
it into a shared-object file. Then, by creating
some simple helper files for IDL, and starting
IDL from the correct directory, where all of the
new interface code lies, the user has access to
all of the new functionality provided the written
code and the IDL "External Interface" In our
prototype, this was all accomplished on an
SGI/IRIX system. In order to attempt to achieve
maximum compatibility with the RunII environment,
it was decided to use KCC. In principal there is
no reason it should not work with CC or g.
Then, referring to the IDL External Developers'
Guide, we wrote some code which uses the HepTuple
library to read HBOOK files, load the data into
data structures compatible with IDL, and then
return them to the IDL session. We have written a
prototype provides an interface to the HBOOK
files (using HepTuple), makefiles and some
documentation on how to use them, and sample IDL
scripts (called "procedure" files) to invoke the
ht2IDL functions and display and manipulate the
results. http//patwww.fnal.gov/pas/idl/ht2idl.h
tml
26
Support Features
  • Commercial products have excellent documentation,
    generally good support, but
  • you pay for it
  • hard to customize, usually dont get source
  • homemade products moving to free software support
    model (support by community)
  • can modify source to enhance or customize
  • relatively easy to use others code
  • both require a local support organization

27
ROOT How Tos
28
Advanced Features
  • Optimized I/O and very large data samples
  • Using native user objects
  • Customized GUIs
  • Accessing over web

29
Optimized I/O
  • Two separate issues
  • data in memory vs. data on disk (efficient disk
    access necessary for large data files)
  • cant improve on disk speed unless objects that
    are read together are next to each other on disk
    (column wise n-tuple and generalizations)

30
ROOT I/O
  • Many years of struggle/experience to use disk
    based data
  • optimized data formats for efficient access
    CWNT--gt split trees
  • Formats designed with HEP type data access in mind

31
IDL I/O
  • Basically memory based
  • Associated I/O allows mapping an IDL array or
    structure variable onto a file
  • I/O occurs automatically when the associated
    variable is subscripted, accessing only the
    desired object
  • data set size limited by file size rather than
    memory size
  • direct access to each element in the file
    including convenient event selection by indexing
  • files can have multiple associated structures
    (full events, tracks, hits, etc)
  • performance still limited by record structure

32
Access to user objects
  • Root script language is C, user classes can be
    used by interpreter if their header files are run
    through rootcint to create dictionary
  • IDL supports structures, a collection of scalars,
    arrays and other structures. Needs an external
    structure definition file to allow use in
    commands no automatic way to create these from
    class headers

33
IDL GUI Builder
Available in IDL 5.3, the IDL GUIBuilder enables
you to build intuitive GUIs with drag-and-drop
ease. A convenient control palette with icons
such as radio buttons, checkboxes, and horizontal
and vertical sliders let you quickly construct
interfaces that users understand. Widget
properties are easily editable. Pre-made bitmaps
give you graphical cues for customizing buttons
relevant to their function. Also, widgets are
arranged in row and column geometry for on-screen
consistency. At the code level, built-in comments
help you understand what each widget and event
will accomplish.
34
What Is ION?
  • An easy method for users to leverage the graphics
    and analysis power of IDL in web based applets
    and applications
  • Allows users to share IDL applications with
    non-IDL users
  • Easy set-up, use and management

35
ION Overview
36
ION Applications
  • Web publishing is obvious, but what else?
  • Applications based on ION
  • Workgroups can develop and easily deploy data
    processing and visualization apps with ION
  • Thin clients download fast and can be updated
    easily
  • Applications can exist in any Java enabled
    machine and still access the power of IDL

37
Conclusions
  • Both satisfy user requirements
  • Commercial products offer all basic functionality
    and many attractive advanced features
  • Homemade products still better optimized for
    specific HEP use
  • Support models evolving (open source model)
  • Can we mix and match to get best of both worlds?
Write a Comment
User Comments (0)
About PowerShow.com