Javier%20Jaen%20Martinez - PowerPoint PPT Presentation

About This Presentation
Title:

Javier%20Jaen%20Martinez

Description:

How are Farms evolving in non HEP environments? ... Dynamite. NQS. PBS. NQE. Condor. DNQS. DQS. Codine. Utopia. LSF. LHC - 28 September 1999 ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 47
Provided by: frederi8
Category:

less

Transcript and Presenter's Notes

Title: Javier%20Jaen%20Martinez


1
Farm Computing Issues and Examples
  • Javier Jaen Martinez
  • CERN IT/PDP

2
Table of Contents
  • Motivation Goals
  • Types of Farms
  • Core Issues
  • Examples
  • JMX A Management Technology
  • Summary

3
Study Goals
  • How are Farms evolving in non HEP environments?
  • Have Generic PC Farms and Filter Farms shared
    requirements for system/application monitoring,
    control and management?
  • Will we benefit from future developments in other
    domains?
  • Which are the emerging technologies for farm
    computing?

4
Introduction
  • According to Pfister there are three ways to
    improve performance
  • In terms of computing technologies
  • work harder using faster hardware
  • work smarter using more efficient algorithms
    and techniques
  • getting help depending on how processors,
    memory and interconnect are laid out MPP, SMP,
    Distributed Systems and Farms

Work harder
Work smarter
Get Help
5
Motivation
  • IT/PDP is already using commodity farms
  • All 4 LHC experiments will use Event Filter Farms
  • Commodity Farms are also becoming very popular
    for non HEP applications

6
Motivation
1000s tasks and 1000s of nodes to be controlled
monitored and managed (system and application
management challenge).
7
Types of Farms
  • In our domain
  • Event Filter Farms
  • To filter data acquired in previous levels of a
    DAQ
  • Reduce aggregated throughput by rejecting
    uninteresting events or by compressing them

8
Types of Farms
  • Batch Data Processing
  • Job reads data from tape process information and
    writes back data
  • Each job runs on a separate node
  • Job management performed by a batch scheduler
  • Nodes with good CPU performance and large disks
  • Good connectivity to mass storage
  • Inter-node communication not critical
    (independent jobs)
  • Interactive Data Analysis
  • Analysis and data mining
  • Traverse large databases as fast as possible
  • Programs may run in parallel
  • Nodes with great CPU performance and large disks
  • High performance inter-process communication

9
Types of farms
  • Montecarlo Simulation
  • Used to simulate detectors
  • Simulation jobs run independently on each node
  • Similar to a batch data processing system (maybe
    with less disk requirements)
  • Others
  • Workgroup Services
  • Central Data Recording Farms
  • Disk server Farms,
  • ...

10
Types of farms
  • In non HEP environments
  • High Performance Farms (Parallel)
  • a collection of interconnected stand-alone
    computers cooperatively working together as a
    single, integrated computing resource
  • Farm seen as a computer architecture for parallel
    computation
  • High Availability Farms
  • Mission Critical Applications
  • Hot Standby
  • Failover and Failback

11
Key Issues in Farm Computing
  • Size Scalability (physical application)
  • Enhanced Availability (failure management)
  • Single System Image (look-and-feel of one
    system)
  • Fast Communication (networks protocols)
  • Load Balancing (CPU, Net, Memory, Disk)
  • Security and Encryption (farm of farms)
  • Distributed Environment (Social issues)
  • Manageability (admin. and control)
  • Programmability (offered API)
  • Applicability (farm-aware and non-aware app.)

12
Core Issues (Maturity)
M o n i t o r i n g
Load Balancing
Mature Development
Failure Management
SSI
Fast Communication
Manageability
Future Challenge
13
Monitoring why?
  • Performance Tuning
  • Environment changes dynamically due to the
    variable load on the system and the network.
  • improving or maintaining the quality of the
    services according to those changes
  • Exists a reactive control monitoring that acts on
    farm parameters to obtain desired performance
  • Fault Recovery
  • to know the source of any failure in order to
    improve robustness and reliability.
  • automatic fault recovery service needed in farms
    with hundreds of nodes (migration, )
  • Security
  • to detect and report security violation events

14
Monitoring Why?
  • Performance Evaluation
  • to evaluate applications/system performance at
    run-time.
  • Evaluation is performed off-line with data
    monitored on-line
  • Testing
  • to check correctness of new applications running
    in a farm by
  • detecting erroneous or incorrect operations
  • obtaining activity reports of certain functions
    of the farm
  • obtaining a complete history of the farm in a
    given period of time

15
Monitoring Types
Generation
Processing
Dissemin.
Presentat.
Instrumentation Collection Traces generation
Traces merging database updating correlation filte
ring
Users Managers Control Systems
Pull/Push Distrib/Central. Time/Event Collection
Format
Online/Offline On Demand/Autom Storage Format
Dissem. Format Access Type Access
Control Demand/Auto
Present. Format
How Many Monitoring tools are available
16
Monitoring Tools
Maple.
SAS.
Cheops
NextPoint
NetLogger
Ganymede
MTR
ResponseNetworks
MeasureNet
Network health
http//www.slac.stanford.edu/cottrell/tcom/nmtf.h
tml
No Integrated tools for services, applications,
devices, network monitoring
17
Monitoring Strategies?
  • Define common strategies
  • What to be monitored?
  • Collection strategies
  • Processing alternatives
  • Displaying techniques
  • Obtain Modular implementations
  • Good example ATLAS Back End Software
  • IT Division has started a monitoring project
  • Integrated monitoring
  • Service Oriented

18
Fast Communication
Killer Platform
  
ns
ms
µs
Killer Switch
  • Fast processors and fast networks
  • The time is spent in crossing between them

19
Fast Communication
  • Remove the kernel from critical path
  • Offer to user applications a fully protected,
    virtual, direct (zero copy send messages),
    user-level access to the network interface
  • This idea has been specified in VIA (Virtual
    Interface Architecture)

Application
High Level Comm. Lib (MPI, ShM Put/Get, PVM)
Send/Recv/RDMA
Buff Manag./Synchro
VI Kernel Agent
VI Network Adapter
20
Fast Communication
  • VIAs predecesors
  • Active Messages (Berkeley Now project, Fast
    Sockets)
  • Fast Messages (UCSD MPI, Shmem Put/Get, Global
    Arrays)
  • Applications using sockets, MPI, ShMem, can
    benefit from these fast communication layers
  • Several Farms (HPVM (FM), NERSC PC cluster
    (M-VIA), ) already benefit from this technology

21
Fast Communication (Fast Mess)
10,000
100
77.1 MB/s
1,000
Bandwidth (MB/s)
Latency (µs)
100
10
10
11.1µs
FM packet size
1
1
4
16
64
256
1K
4K
16K
64K
Message size (bytes)
22
Fast Communication
HPVM
Pwr. Chal.
SP-2
T3E
Origin 2K
Beowulf
23
Single System Image
  • A single system image is the illusion, created by
    software or hardware, that presents a collection
    of resources as one, more powerful resource.
  • Strong SSI results in farms appearing like a
    single machine to the user, to applications, and
    to the network.
  • The SSI level is a good measure of the coupling
    degree of the nodes in a farm
  • Every farm has a certain degree of SSI (A farm
    with no SSI at all is not a farm).

24
Benefits of Single System Image
  • Usage of system resources transparently
  • Transparent process migration and load balancing
    across nodes.
  • Improved reliability and higher availability
  • Improved system response time and performance
  • Simplified system management
  • Reduction in the risk of operator errors
  • User need not be aware of the underlying system
    architecture to use these machines effectively
  • (C) from Jain

25
SSI Services
  • Single Entry Point
  • Single File Hierarchy xFS, AFS, ...
  • Single Control Point Management from single GUI
  • Single memory space
  • Single Job Management Glunix, Codine, LSF
  • Single User Interface Like workstation/PC
    windowing environment
  • Single I/O Space (SIO)
  • any node can access any peripheral or disk
    devices without the knowledge of physical
    location.

26
SSI Services
  • Single Process Space (SPS)
  • Any process on any node create process with
    cluster wide process wide and they communicate
    through signal, pipes, etc, as if they are one a
    single node.
  • Every SSI has a boundary
  • Single system support can exist at different
    levels
  • OS Level MOSIX
  • MiddlewareCodine,PVM
  • Application Level Monitoring App, Back-End SW

27
Scheduling Software
  • Goal enables the scheduling of system activities
    and execution of applications while offering high
    availability services transparently
  • Usually works completely outside the kernel and
    on top of machines existing operating system
  • Advantages
  • Load Balancing
  • Use spare CPU cycles
  • Provide Fault tolerance
  • In practice, increased and reliable throughput of
    user applications

28
SS Generalities
  • The workings of a typical SS
  • Create a job description file job name,
    resources, desired platform,
  • Job description file is sent by the client
    software to a master scheduler
  • The master scheduler has an overall view queues
    that have been configured plus the computational
    load of the nodes in the farm
  • The master ensures that the resources being used
    are load balanced and ensures that jobs complete
    sucessfully

29
SS Main features
  • Application Support
  • are batch, interactive and parallel jobs
    supported?
  • multiple configurable queues?
  • Job Scheduling and allocation
  • Allocation Policy taking into account system
    load, CPU type, computational load, memory, disk
    space,
  • Checkpointingsave state at regular intervals
    during job execution. Job an be restarted from
    last checkpoint
  • Migration move job to another node in the farm
    to achieve dynamic load balancing or perform a
    sequence of activities on different specialized
    nodes
  • Monitoring/ Suspension/Resumption

30
SS Main features
  • Dynamics of resources
  • Resources, queues, and nodes reconfigured
    dynamically
  • Existence of Single points of failure
  • Fault tolerance re-run a job if system crashes
    and check for needed resources

31
SSPackages
Research CCS Condor Dynamic Network Queueing
System Distributed Queueing System Generic
NQS Portable Batch System Prospero Resource
Manager MOSIX Far Dynamite
Commercial Codine (Genias) LoadBalancer
(Tivoli) LSF (Platform) Network Queueing
Environment (SGI) TaskBroker (HP)
Condor
DNQS
Utopia
NQS
DQS
NQE
PBS
Codine
LSF
32
SS Some examples
  • CODINE LSF
  • to be used in large heterogeneous networked env.
  • Dynamic and static load balancing
  • Batch, interactive, parallel jobs
  • Checkpointing Migration
  • Offers API for new distributed applications
  • No single Point of failure
  • Job accounting data and analysis tools
  • Modification of resource reservation for started
    jobs and specification of releasable shared
    resources (LSF)
  • MPI (LSF) vs MPI, PVM, Express, Linda (Codine)
  • Reporting tools (LSF)
  • C API (LSF), ?? (Codine)
  • No Checkpointing of forked jobs or signaled jobs

33
Failure Management
  • Traditionally associated to Scheduling Sw and
    oriented to long running processes (CPU
    intensive)
  • If a CPU intensive process crashes --gt wasted CPU
  • Solution
  • Save the state of the process periodically
  • In case of failure process restarted from last
    checkpoint
  • Strategies
  • store checkpoints in files using a distributed
    file system (slows down computation, NFS is poor,
    AFS caching of Checkpoints may flush other useful
    data)
  • checkpoint servers (dedicated node with disk
    storage and management functions for
    checkpointing)

34
Failure Management
  • Levels
  • Transparent checkpointing checkpointing library
    linked against an executable binary. The library
    checkpoints transparently the process (condor,
    libckpt, Hector)
  • User directed Checkpointing (directives included
    in the applications code to perform specific
    checkpoints of particular memory segments)
  • Future challenges
  • Decoupling Failure management and scheduling
  • Define strategies for System failure recovery (at
    kernel level?)
  • Define strategies for task failure recovery

35
Examples MOSIX Farms
  • MOSIX Multicomputer OS for UNIX
  • An OS module (layer) that provides the
    applications with the illusion of working on a
    single system
  • Remote operations are performed like local
    operations
  • Strong SSI at kernel level

36
Example MOSIX Farms
Preemptive process migration that can
migrate---gtany process, anywhere, anytime
  • Supervised by distributed algorithms that
    respond on-line to global resource availability
    - transparently
  • Load-balancing - migrate process from over-loaded
    to under-loaded nodes
  • Memory ushering - migrate processes from a node
    that has exhausted its memory, to prevent
    paging/swapping

37
Example MOSIX Farms
  • A scalable cluster configuration
  • 50 Pentium-II 300 MHz
  • 38 Pentium-Pro 200 MHz (some are SMPs)
  • 16 Pentium-II 400 MHz (some are SMPs)
  • Over 12 GB cluster-wide RAM
  • Connected by the Myrinet 2.56 G.b/s LANRuns
    Red-Hat 6.0, based on Kernel 2.2.7
  • Download MOSIX
  • http//www.mosix.cs.huji.ac.il/

38
Example HPVM Farms
  • GOAL Obtain Supercomputing performance from a
    pile of PCs
  • Scalability 256 processors demonstrated
  • Networking over Myrinet interconnect
  • OS LINUX and NT (going NT)

CORBA
Winsock 2
HPF
  • Available now
  • Under development

Global Arrays
SHMEM
MPI
Illinois Fast Messages (FM)
39
Example HPVM Farms
  • SSI at middleware level
  • MPI, and LSF
  • Fast CommunicationFast Messages
  • Monitoring none yet
  • Manageability (still poor)
  • HPVM front-end (Java applet LSF features)
  • Symera (under development at NCSA)
  • DCOM based management tool (only for NT)
  • Add/remove node from cluster
  • logical cluster definition
  • distributed processes control monitoring
  • Other NERSC PC Cluster and Beowulf

40
Example Disk server Farms
  • To transfer data sets between disk and
    applications.
  • IT/PDP
  • RFIO package (optimize large sequential data
    transfers)
  • each disk server system runs one master RFIO
    daemon in the background and a new requests lead
    to the spawning of further RFIO daemons.
  • Memory space is used for caching
  • SSI Weak
  • Load balancing of rfio daemons in different nodes
    of the farm
  • Single memory space I/O space could be useful
    in a disk server farm with heterogeneous machines

41
Example Disk server Farms
  • Monitoring
  • RFIO daemons status, load of farm nodes, memory
    usage, caching hit rates,...
  • Fast Messaging rfio techniques using TCP sockets
  • Manageability storage, daemons, caching
    management
  • Linux based disk servers performance is now
    comparable to UNIX disk servers (benchmarking
    study by Bernd Panzer IT/PDP)!!!!
  • DPSS (Distributed Parallel Storage Server)
  • collection of disk servers which operate in
    parallel over a wide area network to provide
    logical block level access to large data sets
  • SSI
  • applications are not aware of declustered data.
  • Load balancing if replicated data
  • Monitoring Java Agents Monitoring and Management
  • Fast Messaging Dynamic TCP buffer size adjustment

42
JMX A Management Technology
  • JMX Java Management Extensions (Basics)
  • defines a management architecture, APIs, and
    management services all under a single
    specification
  • resources can be made manageable without regards
    as to how its manager is implemented (SNMP,
    Corba, Java Manager)
  • Based on Dynamic Agents
  • Platform and Protocol independent
  • JDMK 3.2

Management Applic
Manager Level (JMX Manager)
Agent Level (JMX Agent)
Instrumentation Level (JMX Resource)
Managed Resource
43
JMX Components
44
JMX Applications
  • Implement distributed SNMP monitoring
    infrastructures
  • Heterogeneus farms (NTLinux) management
  • Environments where Management Intelligence or
    requirements change over time
  • Environments where Management Clients maybe
    implemented using different technologies.

45
Summary
  • Farms scale and intended use will grow in the
    next years
  • We presented a set of factors to compare
    different farm computing approaches
  • Developments from non HEP domains can be used in
    HEP farms
  • Fast Networking
  • Monitoring
  • System Management
  • However Application and tasks Management is very
    dependant on particular domains

46
Summary
  • EFF community should
  • Share common experiences (specific subfields in
    future meetings)
  • Define common monitoring requirements and
    mechanisms, SSI requirements, management
    procedures (filtering, reconstruction,
    compression, )
  • Follow on developments in management of High
    Performance computing farms (same challenge of
    management of thousands of processes/threads)
  • Obtain if possible modular implementations of
    these requirements that constitute EFF Management
    Approach
Write a Comment
User Comments (0)
About PowerShow.com