Title: Interactive Data Analysis on the Grid with JAS and Globus
1Interactive Data Analysis on the Grid with JAS
and Globus
David Alexander, Brian Miller, John
Exby Tech-X Corporation (www.techxhome.com) Bould
er, Colorado Tony Johnson, Massimiliano Turri,
Booker Bense Stanford Linear Accelerator
Center Menlo Park, California
Supported by U.S. Department of Energy Small
Business Innovative Research Grant
DE-FG03-02ER83556 and Stanford Linear
Accelerator Center
2Project Overview
- Started with Java Analysis Studio (JAS)
- Has distributed analysis system based on RMI
- Set up test grids on Linux clusters
- Used Globus Toolkit 2.0
- Each node had GRAM GridFTP servers and Java
Runtime Environment - Wrote a JAS grid plug-in
- Used Java CoG Kit 0.9
- Demonstrated at SC2002
- Hit remote and on-site cluster
3Java Analysis Studio (JAS)jas.freehep.org
- Open source application
- Built for interactive data analysis, but flexible
modularized - Publication quality plotting facilities
- User writes Java code to analyze data
4Java Analysis Studio (JAS)jas.freehep.org
- Abstracted data source interface
- Modules are written to work with a variety of
file formats (PAW, HIPPO, AIDA, Root, ODBC, flat
files, SIO, HEP) - Distributed System Available
- Versatile Well used in high energy physics
- Pure Java (Portable, Web Start installation
upgrade) - Flexible topology (stand-alone, client/server,
cluster) - Integration w/ BaBar, Geant4, Wired
TechXHome.com
5Design Ideas Added Features
- Goal clustered deployment, launch, federation
- Special JAS Job use
- Minimal prerequisites
- Bare grid Globus, Java, nothing else
- Heterogeneous cluster
- Off-grid (or not) client, data, codebase
- Clients dont need to be superusers
- Optional background deployment
- Single sign on
TechXHome.com
6About Resource Discovery
- Resource discovery
- Software needs location of data files
- Software needs location of Java-enabled hosts
- Pluggable LDIF source (MDS, URL of text file)
- Community Authorization Service
- Fine-grained access control
- Is resource discovery in a way
7Move code to data with GridFTP
- Location transparency
- User sees data sets
- Could also have user choice
- Automatic deployment of JAS
- Multi-threaded task set
- Verification of code version, GridFTP codebase to
node if new - GridFTP/link data to user sandbox
- Deploy control and catalog servers only on
cluster head node - Worker nodes wait for catalog server to run
TechXHome.com
8Launch Application with GlobusRun
- Automatic launch of Java servers
- Java Data Servers are run on specified
JRE-enabled nodes - Special Grid Job is now started (exit the Wizard)
- Code loaded into client or written in editor
- -compiled
- -automatically distributed to Java Data Servers
- -results (std out, std err, histograms) sent
back
TechXHome.com
9A few more Impressive Features
- User can stop analysis, change code, restart.
- Distributed debugging can catch individual node
failures. - Histogram re-bin slider surprisingly responsive
TechXHome.com
10Headaches and Issues
- Versions of Globus vs. Java CoG Kit
- CoG properties configuration
- Client server clocks disagree
- MS-Windows text line breaks
- Abandoned jobs
- Firewalls
TechXHome.com
11Future Ideas
- Upgrade to Globus Toolkit 3
- Pre-install code on cluster head or portal
machine and deploy from there - Use more grid services (Condor, Replica)
- Implement interfaces or service descriptions from
PPDG CS-11 group.
TechXHome.com
12Further Information on JAS
- for the latest on JAS see the 3pm Catogory 9
paper - JAS3 - A general purpose data analysis framework
- for HENP and beyond.
- CONTACTS
- David Alexander, alexanda_at_txcorp.com
- Brian Miller, bmiller_at_txcorp.com
- Tony Johnson, tony_johnson_at_SLAC.stanford.edu
- Massimiliano Turri, turri_at_SLAC.stanford.edu
- Java Analysis Studio, http//jas.freehep.org