Quattor status, plans - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Quattor status, plans

Description:

Working towards Release 1.2: CDB enhancements and bugfixes. ACL support for templates ... Bugfixes/RFE's (see http://savannah.cern.ch/projects/elfms), e.g. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 21
Provided by: german3
Category:

less

Transcript and Presenter's Notes

Title: Quattor status, plans


1
Quattor status, plans
  • CNAF visit
  • 17 October 2005
  • German Cancio, Marco Emilio Poleggi
  • CERN IT/FIO

http//cern.ch/elfms
2
Outline
  • CERN/IT reorganisation
  • Quattor status and next steps
  • CERN current issues
  • Integration with LEMON
  • LEAF (if time permits)

3
CERN/IT reorganisation
  • From November 1st, CERN/IT will be reorganised
  • GM (Grid Middleware) and GD (Grid Deployment)
    groups merge
  • Merged GD group led by Ian Bird
  • Frederic Hemmer new Deputy Department Head
  • ADC (Architecture and Data Challenges) group
    splitted up
  • Bernd Panzer-Steindel LCG Architect
  • Linux Support moves to FIO Group
  • Physics Databases
  • New group aimed towards Grid Physics Services
  • Led by Juergen Knobloch
  • Internal FIO (Fabric Infrastructure and
    Operation) Group reorganisation
  • CASTOR development and deployment splitted up
  • New Fabric Developments section, joining
    CASTOR, ELFms and Remedy development staff -gt
    aliviate pressure on CASTOR

4
Quattor status next steps
  • Deployment status

5
Feedback from sites
  • Sites using Quattor 15
  • Out of 20 testing/trying
  • 12 LCG sites vs. 3 non-LCG sites
  • All LCG sites interested in gLite
  • native configuration templates/components vs.
    Yaim
  • 8 (native) vs. 3 (Yaim)
  • Usage for configuring local services
  • 8 sites
  • of nodes
  • Ranging from 4 to 600 (excl. CERN 2450)
  • Total 960, plans to grow to 2600
  • Many sites building up capacities
  • Most frequent issues
  • Steep learning curve
  • Documentation (installation, and missing -
    HOW-TOs)
  • Structure, and complexity of templates
  • Installation subsystem (AII) limitations in
    particular partitioning
  • Outdated Quattor Core release

GDB meeting, 6/05
6
Quattor status next steps
  • Deployment status
  • Some sites added (Taipei, India)
  • Quattor Core Release status
  • Quattor 1.1 production version
  • Multiple bug fixes
  • AII improved (automated partitioning)
  • Enhanced installation guide (gt70 pages)
  • Most sites migrated without problems
  • Difficulties when upgrading from 1.0 to 1.1 and
    LCG-2.4 to 2.6 (NIKHEF)
  • Quattor 1.2 at end of year (see later - TBD)
  • Quattor LCG integration status
  • LCG-2.4, and LCG-2.6
  • Native templates LCG-2.6 available (several
    minor releases by QWG)
  • YAIM Support for 2.6 added in NCM Component

7
Quattor status next steps (II)
  • Development status
  • Working towards Release 1.2 CDB enhancements and
    bugfixes
  • ACL support for templates
  • Control who can access what templates with an ACL
    based mechanism
  • Set Read, Write, Admin permissions on templates
  • Users and groups of users
  • Support for Namespaces
  • Current template structure is flat -gt all
    templates in one directory
  • PAN has support for namespaces and load paths
    -gt directories in CDB
  • Allows a clearer structuring and also to have
    multiple versions of the same template in CDB,
    e.g. for test and production setups or for
    managing multiple sites
  • Restructure standard template set
  • Take advantage of namespaces

8
Quattor Status next steps (III)
  • SOAP version of SWRep
  • Using same authentication as CDB
  • Bulk RPM handling -gt more efficient RPM mirroring
  • Bugfixes/RFEs (see http//savannah.cern.ch/projec
    ts/elfms), e.g.
  • CDB enhancements (commit time optimisations)
  • CDB CLI enhancements (lock info, template search,
    ..)
  • Misc
  • CCM to accept non-local profiles
  • Integrate new RPMT-PY support
  • Rewrite of RPMT in Python, taking advantage of
    RPM Python bindings (like Yum)
  • More fault-tolerant behaviour in case of SPMA
    errors due to server overloading
  • Port to SL4/RHES4
  • No problems foreseen, but requires testing

9
Quattor Status next steps (IV)
  • Future developments
  • PAN Compiler
  • Allow to type what schema entries can be defined
    (set/modified) in what templates
  • Currently, you can modify all schema entries in
    all templates, which is powerful but may lead to
    confusion and mistakes
  • Easier GUI integration by adding a new data
    type template
  • Template type where only key-value entries can be
    defined easy parsing/writing by GUIs
  • CDB backend
  • Current SQL-based back-end is slow and
    aysnchronous
  • Investigate how to speedup back-end
  • Investigate alternative back-ends, like native
    XML DB based ones
  • Including possibility of joining them into PAN
  • Two-phase distributed commit for distributed
    checkings (e.g. duplicated serial numbers)
  • CVS storage
  • CVS is limited in functionality
  • Investigate if Subversion could/should be used
    instead
  • Cal provides a Subversion-based interface, but
    limited functionality cf. to CDB
  • Autobuild facility (for developers)
  • Current release process is manual, heavy, and
    error-prone
  • Automated generation of code and documentation
    (components), like EDG did

10
CERN current issues
  • CERN relies on CDB as unique place for storing
    information of CC nodes
  • Configuration
  • Inventory
  • State
  • The current CDB solution with SQL backend doesnt
    answer all requirements
  • E.g. how to store information on racks
  • A pure SQL solution for replacing CDB has been
    investigated but implementation is too complex
  • In particular, CDB hierarchies and schema
    flexibility
  • Plan is now to continue on hybrid approach
  • Requires work on CDB backend improvements
    mentioned earlier
  • GUIs built on top of CDB/CDBSQL, LEMON and LEAF
  • E.g. CERNs CCTracker for high-level workflows
  • Scalability
  • Ensure CDB (and CDBSQL) scalability to 8000
    nodes
  • Requires splitting up PANC compilation process
    in parallel if possible

11
CERN current issues (II)
  • Security
  • deployment of secure XML transport (HTTPS) and
    ACLs
  • Service based views (user/mgmt perspective)
  • Synoptical view of what services are running how
  • Needs to be built on top of Quattor and Lemon
  • Would require a separate service definition DB
  • Support for BOTH LCG YAIM and native LCG
    templates
  • CERN does NOT develop nor use the native LCG
    templates!
  • No knowledge, nor testing infrastructure
    available at CERN
  • (Not really a CERN issue - but a Quattor project
    one!)

12
Quattor-LEMON integration
  • Quattor and Lemon are tightly integrated at CERN
  • Configuration of Lemon Agent and Server
  • CDB holds definitions of all sensors, metric
    classes, and metric instances
  • An NCM component (ncm-fmonagent) generates the
    Agent config file
  • Another NCM component updates the Oracle Server
    configuration
  • Configuration of Lemon Web Pages
  • Information on what clusters exist, and what
    nodes belong to which cluster, is extracted from
    CDBSQL
  • Visualization of Quattor configuration
  • Indexed CDB templates available, linked to node
    and cluster status pages
  • XML profiles display
  • Alarm generation
  • E.g. generate an alarm if the configured kernel
    version differs from the actual one
  • Visualization of CC equipment
  • Geometry of CC (racks, robots, etc)
  • Location of each node in the CC (what rack)
  • Examples (CERN server)

13
  • http//cern.ch/leaf

14
LEAF - LHC Era Automated Fabric
  • LEAF is a collection of workflows for high level
    node hardware and state management, on top of
    Quattor and LEMON
  • HMS (Hardware Management System)
  • Track systems through all physical steps in
    lifecycle eg. installation, moves, vendor calls,
    retirement
  • Automatically requests installs, retires etc. to
    technicians
  • GUI to locate equipment physically
  • HMS implementation is CERN specific (based on
    Remedy workflows), but concepts and design should
    be generic
  • SMS (State Management System)
  • Automated handling (and tracking of) high-level
    configuration steps
  • Reconfigure and reboot all cluster nodes for new
    kernel and/or physical move
  • Drain and reconfig nodes for diagnosis / repair
    operations
  • Issues all necessary (re)configuration commands
    via Quattor
  • extensible framework plug-ins for site-specific
    operations possible

15
LEAF Deployment
  • HMS in full production for all nodes in CC
  • HMS heavily used during CC node migration ( 1500
    nodes)
  • SMS in production for all quattor managed nodes
  • Current work
  • More automation, and handling of other HW types
    for HMS
  • More service specific SMS clients (eg. tape
    disk servers)
  • Developing asset management GUI (CCTracker)
  • Multiple select, dragdrop nodes to automatically
    initiate HMS moves and SMS operations
  • Interface to LEMON GUI

16
Managing the Fabric with CCTracker
  • Visualize, locate and manage CC objects using
    high-level workflows
  • Visualize
  • physical location of equipment

17
Managing the Fabric with CCTracker
  • Visualize, locate and manage CC objects using
    high-level workflows
  • Visualize
  • physical location of equipment
  • properties

18
Managing the Fabric with CCTracker
  • Visualize, locate and manage CC objects using
    high-level workflows
  • Visualize
  • physical location of equipment
  • properties
  • Initiate and track workflows on hardware and
    services
  • e.g. add/move/remove/retire operations, update
    properties, kernel and OS upgrades

19
Use Case Move rack of machines
1. new location
HMS
6. Request move
9. Install work order
Technicians
ServiceMgr
2. Set to standby
10. Set to production
7a. Update
SMS
7b. Update
NW DB
3. Update
Quattor CDB
11. Update
  • 5. Take out of production
  • Close queues and drain jobs
  • Disable alarms

4. Refresh
13. Put into production
12. Refresh
20
Many Thanks to INFN-CNAF for the provided
support!!
Questions, Discussion
Write a Comment
User Comments (0)
About PowerShow.com