Ganga Status Update Will Reece - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Ganga Status Update Will Reece

Description:

Will Reece - Imperial College London. Page 3. User Statistics ... Uses full sample b m, b m and b c m to ntuple. Likes Splitters but Would Like More Warnings ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 35
Provided by: emm91
Category:
Tags: bb | ganga | reece | status | update

less

Transcript and Presenter's Notes

Title: Ganga Status Update Will Reece


1
Ganga Status UpdateWill Reece
2
Outline
  • User Statistics
  • User Experiences
  • New Features in 4.3.0
  • Upcoming Features
  • Reference Manual
  • Testing Tools
  • Summary

3
User Statistics
25 Users
  • 557 Unique Users Since Jan 1, 110 per Week
  • 113 LHCb Users, 25 Unique per Week

http//gangamon.cern.ch8888/
4
User Experiences
  • Feedback from Active LHCb Users
  • Helps prioritize features
  • Tells us what Needs Improvement
  • and what is already good!
  • Mailing Lists Good Source
  • Will Look at Some Case Studies

5
Robert Lambert
  • Used Gauss to Generate 70m Events
  • Studying final state asymmetries ? custom decay
  • Needed 10-3 precision across 10 Pt bins
  • Compared Custom Decay with DC06
  • Used Ganga and DIRAC ? 4000 Jobs
  • 2 Years of CPU Time!
  • Very Happy with DIRAC Success rate
  • Ganga Front-end Really Easy!
  • Likes SplitByFiles (but Replica Issues)
  • Wants Merge of Subjobs

6
Eduardo Rodrigues
  • Toy MC Used for g Sensitivity Studies
  • Bs?Dsp, Bs?DsK channels
  • Needed large data set ? Used Ganga and LCG
  • Uses ROOT and RooFit ? Root App
  • Ran 3000 toy experiments
  • Each experiment takes 2-3 hours ? 1 year CPU!
  • Had some problems with LCG ? Planning to use
    Dirac
  • Using PyROOT for e.g. Simplified Studies
  • Root App and LCG Backend with standard python
    modules
  • Has had good experience both with LSF and Grid

7
Mitesh Patel
  • Uses Ganga to Study Small Backgrounds
  • B ? (D0/D0)(?Kp,KK,pp)K (LHCB-2006-066)
  • Looking at suppressed (10-7) decays to measure g
  • Bd ? Kmm as New Physics Probe (LHCB-2007-038)
  • Uses full sample b?m, b?m and b?c?m to ntuple
  • Likes Splitters but Would Like More Warnings
  • Has Submitted 1000s of Jobs
  • Benefited from Developer Support
  • More Examples Would be Nice

8
New in 4.3.0
  • GNU GPL License
  • Sun Grid Engine Support
  • Core Updates
  • Oracle backend for remote repository
  • Subjob access to job repository optimized
  • DIRAC Support for Root Application
  • PyROOT
  • Run python jobs using the ROOT libraries
  • Gaudi Updates ROOT Map files
  • Many Bugfixes ? Improved Stability!
  • Testing framework

http//ganga.web.cern.ch/ganga/release/4.3.0/
9
Ganga Goes GPL
  • 4.3.0 is First GPL Release
  • Aim is to protect project
  • Applies to Future Releases
  • Ganga Used Commercially
  • Clear license needed

http//www.gnu.org/licenses/gpl.html
10
SGE Backend Now Supported
  • Sun Grid Engine Support Added
  • Common batch system
  • Can Use Following Applications
  • Executable
  • Root
  • Any Gaudi

11
DIRAC Submission for ROOT
  • Submit Jobs Using ROOT to DIRAC
  • Uses new functionality in DIRAC v2r13
  • DIRAC Recommended for Remote ROOT Jobs
  • Improved reliability
  • Superior job debugging info
  • Excellent job monitoring

DIRAC is LHCb Standard for Distributed Analysis
12
PyROOT Support
  • ROOT Provides Python Bindings
  • Python is quick and easy to write ? Productive!
  • Ganga Now Supports Use in Root App
  • Need Correct Python Version for ROOT
  • Determined Automatically
  • LHCb Configuration uses LCG versions
  • /afs/cern.ch/sw/lcg/external/
  • Can be controlled in .gangarc file

13
(No Transcript)
14
PyROOT Support
  • Root Documentation Updated
  • help(Root) in Ganga

15
Gaudi Updates ROOT Map
  • ROOT Map used to Auto-load Libraries
  • Found via CMT
  • Now Preparing for 4.3.x
  • Expect new LHCb Functionality in 4.3.2

16
Upcoming Features
Features planned for 4.3.x or 4.4
  • Framework for Job Merging
  • Merge text and ROOT files
  • Job Slices
  • LFC Aware Splitter for Gaudi
  • Caching for Datasets
  • Summary Printing of Objects
  • Improved Credential Management

https//twiki.cern.ch/twiki/bin/view/ArdaGrid/Gang
aIndexGangaFourFour
17
Merging of Jobs and Subjobs
Ganga 4.3.x
  • Jobs may have Many Subjobs
  • Hand Merge?
  • Time Consuming and Error Prone ? Automate
  • Merge Subjobs
  • Combines subjob output
  • Can Run on Master Job Completion
  • or from Command Line
  • Merging Text and ROOT Files Supported
  • What else is needed?
  • Can Merge Lists of Jobs

18
Automatic Merge
  • Attach Merge Object to Job
  • Merge run on completion

19
Command Line Merge
  • Create List of Jobs to Merge
  • Will recursively merge subjobs
  • Run Merge on Command Line
  • Support Job Slices in Ganga 4.4

20
Types of Merge
  • TextMerger Concatenate Text
  • Unordered, but adds headers
  • RootMerger Combines ROOT Files
  • Uses hadd ? Adds histograms and trees
  • MultipleMerger Chain Merge Objects
  • SmartMerger Merge by Extension
  • Associations in .gangarc file

21
Job Slices
Ganga 4.4
  • Change Semantics of jobs Object
  • Support slices ? jobs-1, jobs05
  • Index by Job ID ? use __call__ e.g. jobs(45)
  • Allow Job Operations on Slices
  • copy, fail, kill, peek, remove, resubmit, submit
  • Job Subjobs also a Job Slice
  • Can Create Job Slice with select
  • select(time'yesterday')
  • select(status'failed')

https//twiki.cern.ch/twiki/bin/view/ArdaGrid/Gang
aJobIndexingSlices
22
LFC Aware Splitter for Gaudi
  • Gaudi Provides SplitByFiles
  • Splits job into subjobs with subset of data files
  • Data Files not Available in all Sites
  • Some subjobs are unrunnable
  • DIRAC v2r14 Allows Query of LFC
  • Sort files by location ? optimal splitting
  • New DiracSplitter
  • Splits files by file locations. Must use LFNs
  • Protects against mistyped file names ? Error

Ganga 4.4
23
Performance of LFC Replica Query
  • Last SW Week
  • DIRAC v2r13 LFC Query Slow
  • 0.5s per file ? 5min for 600 files
  • DIRAC v2r14 Bulk Query
  • Much Improved Performance
  • Factor 10 times faster
  • 30s for 600 files
  • Thanks to DIRAC Team!

DIRAC v2r13 Single Query
DIRAC v2r14 Multiple Query
24
Performance of LFC Replica Query
  • Further Speed Up Needed?
  • Multithreaded query worse
  • Limited by LFC
  • Queue system used?
  • Use Replica Caching
  • Cache stored per file
  • Cache date stored
  • Users Query with Dataset
  • updateReplicaCache()
  • DiracSplitter Still Slow
  • Will print time estimate at start

1397 Unique Files Queried
Error bars show s of 5 measurements
25
Printing Summary of Objects
  • Printing Verbose
  • E.g. Job object with many subjobs
  • Summary as Default
  • Lists show length
  • Objects define own summary
  • Get Full Print
  • full_print(j)
  • Same on object attributes

Ganga 4.4
26
(No Transcript)
27
Improved Credential Management
Ganga 4.4
  • Ganga Manages Credentials That Expire
  • AFS Token, Grid Proxy
  • Expiring Tokens Affect Ganga Session
  • Ganga May Not Clean-Up Services on Exit
  • Introducing InternalService Objects
  • Ensures correct clean-up
  • Services not used when expired
  • Alert Users Before Credentials Expire
  • Ganga Shuts Down Gracefully

28
Upcoming Feature Remote Workspaces
  • Roaming Ganga Profile
  • Store Workspace Remotely
  • Access input and output files anywhere
  • Work across multiple machines
  • Local Cache Created on Demand
  • Currently at Prototyping Stage
  • Exciting new functionality!
  • Release Schedule is Uncertain

Ganga 4.x
29
The Ganga Reference Manual
  • Aim is to Show Ganga Help Online
  • Same information as help in Ganga
  • Documentation Generated from Source
  • Have Prototype Online
  • Missing documentation to be filled in ? on-going!
  • Manual will be Generated with Release
  • Feedback on Documentation Appreciated
  • Let us know if anything is not clear

http//ganga.web.cern.ch/ganga/user/GPI/
30
(No Transcript)
31
Testing Tools
  • Use Test Framework
  • Based on unittest
  • Reports with Release
  • Helps Find Bugs!
  • Now Collect Coverage
  • Use Figleaf Library
  • Should improve testing
  • Identifies untested code

32
(No Transcript)
33
The LHCb Distributed Analysis Mailing List
  • Replaces Current List for LHCb Users
  • project-ganga_at_cern.ch
  • lhcb-distributed-analysis_at_cern.ch
  • Can sign up at http//simba2.cern.ch
  • Encourages User Community
  • Less support burden for developers!

https//mmm.cern.ch/public/archive-list/l/lhcb-dis
tributed-analysis/
34
Summary
  • User Statistics 557 Unique Users in 07
  • Ganga is de facto Grid front end tool for LHCb
  • Ganga has New Features in 4.3.0
  • Dirac Handler for Root, PyROOT Support, etc.
  • Interested Features Upcoming
  • Merge framework, DiracSplitter
  • Reference Manual Coming Soon

http//ganga.web.cern.ch/ganga/
Write a Comment
User Comments (0)
About PowerShow.com