les.robertsoncern.ch foil 1 - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

les.robertsoncern.ch foil 1

Description:

High Performance Computing (HPC) and High Throughput Computing (HTC) Parallel ... Analyse the purchase patterns of Wallmart customers in the LA area last month ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 69
Provided by: lesr150
Category:

less

Transcript and Presenter's Notes

Title: les.robertsoncern.ch foil 1


1
What is High Throughput Distributed Computing?
  • CERN Computing Summer School 2001
  • Santander
  • Les Robertson
  • CERN - IT Division
  • les.robertson_at_cern.ch

2
Outline
  • High Performance Computing (HPC) and High
    Throughput Computing (HTC)
  • Parallel processing
  • so difficult with HPC applications
  • so easy with HTC
  • Some models of distributed computing
  • HEP applications
  • Offline computing for LHC
  • Extending HTC to the Grid

3
Speeding Up the Calculation?
  • Use the fastest processor available
  • -- but this gives only a small factor over
    modest (PC) processors
  • Use many processors, performing bits of the
    problem in parallel
  • -- and since quite fast processors are
    inexpensive we can think
  • of using very many processors in parallel

4
High Performance or High Throughput?
  • The key questions are granularity degree of
    parallelism
  • Have you got one big problem or a bunch of little
    ones? To what extent can the problem be
    decomposed into sort-of-independent parts
    (grains) that can all be processed in parallel?
  • Granularity
  • fine-grained parallelism the independent bits
    are small, need to exchange information,
    synchronise often
  • coarse-grained the problem can be decomposed
    into large chunks that can be processed
    independently
  • Practical limits on the degree of parallelism
  • how many grains can be processed in parallel?
  • degree of parallelism v. grain size
  • grain size limited by the efficiency of the
    system at synchronising grains

5
High Performance v. High Throughput?
  • fine-grained problems need a high performance
    system
  • that enables rapid synchronisation between the
    bits that can be processed in parallel
  • and runs the bits that are difficult to
    parallelise as fast as possible
  • coarse-grained problems can use a high throughput
    system
  • that maximises the number of parts processed per
    minute
  • High Throughput Systems use a large number of
    inexpensive processors, inexpensively
    interconnected
  • while High Performance Systems use a smaller
    number of more expensive processors expensively
    interconnected

6
High Performance v. High Throughput?
  • There is nothing fundamental here it is just
    a question of financial trade-offs like
  • how much more expensive is a fast computer than
    a bunch of slower ones?
  • how much is it worth to get the answer more
    quickly?
  • how much investment is necessary to improve the
    degree of parallelisation of the algorithm?
  • But the target is moving -
  • Since the cost chasm first opened between fast
    and slower computers 12-15 years ago an enormous
    effort has gone into finding parallelism in big
    problems
  • Inexorably decreasing computer costs and
    de-regulation of the wide area network
    infrastructure have opened the door to ever
    larger computing facilities clusters ?
    fabrics ? (inter)national gridsdemanding
    ever-greater degrees of parallelism

7
High Performance Computing
8
A quick look at HPC problems
  • Classical high-performance applications
  • numerical simulations of complex systems such as
  • weather
  • climate
  • combustion
  • mechanical devices and structures
  • crash simulation
  • electronic circuits
  • manufacturing processes
  • chemical reactions
  • image processing applications like
  • medical scans
  • military sensors
  • earth observation, satellite reconnaisance
  • seismic prospecting

9
Approaches to parallelism
  • Domain decomposition
  • Functional decomposition

graphics from Designing and Building Parallel
Programs (Online), by Ian Foster -
http//www-unix.mcs.anl.gov/dbpp/
10
Of course its not that simple
graphic from Designing and Building Parallel
Programs (Online), by Ian Foster -
http//www-unix.mcs.anl.gov/dbpp/
11
The design process
  • Data or functional decomposition
  • ? building an abstract task model
  • Building a model for communicationbetween tasks
  • ? interaction patterns
  • Agglomeration to fit the abstractmodel to the
    constraints of the target hardware
  • interconnection topology
  • speed, latency, overhead ofcommunications
  • Mapping the tasks to the processors
  • load balancing
  • task scheduling

graphic from Designing and Building Parallel
Programs (Online), by Ian Foster -
http//www-unix.mcs.anl.gov/dbpp/
12
Large scale parallelism the need for standards
  • Supercomputer market is on trouble diminishing
    number of suppliers questionable future
  • Increasingly risky to design for specific tightly
    coupled architectures like - SGI (Cray, Origin),
    NEC, Hitachi
  • Require a standard for communication between
    partitions/tasks that works also on loosely
    coupled systems (massively parallel processors
    MPP IBM SP, Compaq)
  • Paradigm is message passing rather than shared
    memory tasks
    rather than threads
  • Parallel Virtual Machine - PVM
  • MPI Message Passing Interface

13
MPI Message Passing Interface
  • industrial standard http//www.mpi-forum.org
  • source code portability
  • widely available efficient implementations
  • SPMD (Single Program Multiple Data) model
  • Point-to-point communication (send/receive/wait
    blocking/non-blocking)
  • Collective operations (broadcast scatter/gather
    reduce)
  • Process groups, topologies
  • comprehensive and rich functionality

14
MPI Collective operations
Defining high level data functions allows
highly efficient implementations, e.g. minimising
data copies
  • IBM Redbook - http//www.redbooks.ibm.com/redbooks
    /SG245380.html

15
The limits of parallelism - Amdahls Law
  • If we have N processors
  • s p Speedup
    s p/N
  • taking s as the fraction of the time spent in the
    sequential part of the program ( s p 1)
  • 1 Speedup
    ? 1/s s (1-s)/N

s time spent in a serial processor on serial
parts of the code p time spent in a serial
processor on parts that could be executed in
parallel
Amdahl, G.M., Validity of single-processor
approach to achieving large scale computing
capability Proc. AFIPS Conf., Reston, VA, 1967,
pp. 483-485
16
Amdahls Law - maximum speedup
17
Load Balancing - real life is (much) worse
s1
  • Often have to use barrier synchronisation between
    each step, and different cells require different
    amounts of computation
  • Real time sequential part s ?si
  • Real time parallelisable part on a sequential
    processor p ?k ?jpkj
  • Real time parallelised T s
    ?max(pkj)
  • T s ?max(pkj) s p/N

si
sj
t
sN
18
Gustafsons Interpretation
  • The problem size scales with the number of
    processors
  • With a lot more processors (computing capacity)
    available you can and will do much more work in
    less time
  • The complexity of the application rises to fill
    the capacity available
  • But the sequential part remains approximately
    constant
  • Gustafson, J.L., Re-evaluating Amdahls Law, CACM
    31(5), 1988, pp. 532-533

19
Amdahls Law - maximum speedup with Gustafsons
appetite
potential 1,000 X speedup with 0.1 sequential
code
20
The importance of the network
sequential
  • Communication Overhead adds to the inherent
    sequential part of the program to limit the
    Amdahl speedupLatency the round-trip time
    (RTT) to communicate between two
    processorscommunications overhead
  • c latency data_transfer_time
  • s p
    Speedup s c p/N
  • For fine grained parallel programs the problem is
    latency, not bandwidth

parallelisable
communications overhead
t
21
Latency
  • Comparison Efficient MPI implementation on
    Linux cluster (source Real World Computing
    Partnership, Tsukuba Research Center)

22
High Throughput Computing
23
High Throughput Computing - HTC
  • Roughly speaking
  • HPC deals with one large problem
  • HTC is appropriate when the problem can be
    decomposed into many (very many) smaller problems
    that are essentially independent
  • Build a profile of all MasterCard customers who
    purchased an airline ticket and rented a car in
    August
  • Analyse the purchase patterns of Wallmart
    customers in the LA area last month
  • Generate 106 CMS events
  • Web surfing, Web searching
  • Database queries
  • HPC problems that are hard to parallelise
    single processor performance is important
  • HTC problems that are easy to parallelise
    can be adapted to very large numbers of
    processors

24
HTC - HPC
25
Distributed Computing
26
Distributed Computing
  • Local distributed systems
  • Clusters
  • Parallel computers (IBM SP)
  • Geographically distributed systems
  • Computational Grids
  • HPC as we have seen
  • Needs low latency AND good communication
    bandwidth
  • HTC distributed systems
  • The bandwidth is important, the latency is less
    significant
  • If latency is poor more processes can be run in
    parallel to cover the waiting time

27
Shared Data
  • If the granularity is course enough the
    different parts of the problem can be
    synchronised simply by sharing data
  • Example event reconstruction
  • all of the events to be reconstructed are stored
    in a large data store
  • processes (jobs) read successive raw events,
    generating processed event records, until there
    are no raw events left
  • the result is the concatenation of the processed
    events (and folding together some histogram data)
  • synchronisation overhead can be minimised by
    partitioning the input and output data

28
Data Sharing - Files
  • Global file namespace
  • maps universal name to network node, local name
  • Remote data access
  • Caching strategies
  • Local or intermediate caching
  • Replication
  • Migration
  • Access control, authentication issues
  • Locking issues
  • NFS
  • AFS
  • Web folders
  • Highly scalable for read-only data

29
Data Sharing Databases, Objects
  • File sharing is probably the simplest paradigm
    for building distributed systems
  • Database and object sharing look the same
  • But
  • Files are universal, fundamental systems concepts
    standard interfaces, functionality
  • Databases are not yet fundamental, built-in but
    there are only a few standards
  • Objects even less so still at the application
    level so harder to implement efficient and
    universal caching, remote access, etc.

30
Client-server
  • Examples
  • Web browsing
  • Online banking
  • Order entry
  • ..
  • The functionality is divided between the two
    parts for example
  • exploit locality of data (e.g. perform searches,
    transformations on node where data resides)
  • exploit different hardware capabilities (e.g.
    central supercomputer, graphics workstation)
  • security concerns restrict sensitive data to
    defined geographical locations (e.g. account
    queries)
  • reliability concerns (e.g. perform database
    updates on highly reliable servers)
  • Usually the server implements pre-defined,
    standardised functions

31
3-Tier client-server
32
Peer-to-Peer - P2P
  • Peer-to-Peer ? decentralisation of function and
    control
  • Taking advantage of the computational resources
    at the edge of the network
  • The functions are shared between the distributed
    parts without central control
  • Programs to cooperate without being designed as a
    single application
  • So P2P is just a democratic form of parallel
    programming -
  • SETI
  • The parallel HPC problems we have looked at,
    using MPI
  • All the buzz of P2P is because new interfaces
    promise to bring this to the commercial world
    allow different communities, businesses to
    collaborate through the internet
  • XML
  • SOAP
  • .NET
  • JXTA

33
Simple Object Access Protocol - SOAP
  • SOAP simple, lightweight mechanism for
    exchanging objects between peers in a distributed
    environment using XML carried over HTTP
  • SOAP consists of three parts
  • The SOAP envelope - what is in a message who
    should deal with it, and whether it is optional
    or mandatory
  • The SOAP encoding rules - serialisation
    definition for exchanging instances of
    application-defined datatypes.
  • The SOAP Remote Procedure Call representation

34
Microsofts .NET
  • .NET is a framework, or environment for building,
    deploying and running Web services and other
    internet applications
  • Common Language Runtime - C, C, Visual Basic
    and JScript
  • Framework classes
  • Aiming at a standard but Windows only

35
JXTA
  • Interoperability
  • locating JXTA peers
  • communication
  • Platform, language and network independence
  • Implementable on anything
  • phone VCR - PDA PC
  • A set of protocols
  • Security model
  • Peer discovery
  • Peer groups
  • XML encoding

http//www.jxta.org/project/www/docs/TechOverview.
pdf
36
End of Part 1Tomorrow HEP applicationsOfflin
e computing for LHCExtending HTC to the Grid
37
HEP Applications
38
Data Handling and Computation for Physics Analysis
event filter (selection reconstruction)
detector
processed data
event summary data
raw data
batch physics analysis
event reprocessing
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
les.robertson_at_cern.ch
39
HEP Computing Characteristics
  • Large numbers of independent events - trivial
    parallelism job granularity
  • Modest floating point requirement - SPECint
    performance
  • Large data sets - smallish records, mostly
    read-only
  • Modest I/O rates - few MB/sec per fast processor
  • Simulation
  • cpu-intensive
  • mostly static input data
  • very low output data rate
  • Reconstruction
  • very modest I/O
  • easy to partition input data
  • easy to collect output data

40
Analysis
  • ESD analysis
  • modest I/O rates
  • read only ESD
  • BUT
  • Very large input database
  • Chaotic workload
  • unpredictable, no limit to the requirements
  • AOD analysis
  • potentially very high I/O rates
  • but modest database

41
HEP Computing Characteristics
  • Large numbers of independent events - trivial
    parallelism job granularity
  • Large data sets - smallish records, mostly
    read-only
  • Modest I/O rates - few MB/sec per fast processor
  • Modest floating point requirement - SPECint
    performance
  • Chaotic workload
  • research environment ? unpredictable, no limit to
    the requirements
  • Very large aggregate requirements computation,
    data
  • Scaling up is not just big it is also complex
  • and once you exceed the capabilities of a single
    geographical installation ?

42
Task Farming
43
Task Farming
  • Decompose the data into large independent chunks
  • Assign one task (or job) to each chunk
  • Put all the tasks in a queue for a
    schedulerwhich manages a large farm of
    processors, each of which has access to all of
    the data
  • The scheduler runs one or more jobs on each
    processor
  • When a job finishes the next job in the queue is
    started
  • Until all the jobs have been run
  • Collect the output files

44
Task Farming
  • Task farming is good for
  • a very large problem
  • Which has
  • selectable granularity
  • largely independent tasks
  • loosely shared data

HEP -- Simulation -- Reconstruction -- and much
of the Analysis
45
The SHIFT Software Model (1990)
standard APIs disk I/O mass storage job
scheduler can be implemented over an IP
network mass storage model tape data cached on
disk (stager) physical implementation -
transparent to the application/user scalable,
heterogeneous flexible evolution scalable
capacity multiple platformsseamless integration
of new technologies
From the applications viewpoint this is simply
file sharing all data available to all
processes
les.robertson_at_cern.ch
46
Current Implementation of SHIFT
Linux PC controllers Robots STK
Powderhorn Drives - STK 9840, STK 9940, IBM 3590
Ethernet 100BaseT, Gigabit
racks of dual-cpu Linux PCs
Linux PC controllers IDE disks
47
Fermilab Reconstruction Farms
  • 1991 farms of RISC workstations introduced for
    reconstruction
  • replaced special purpose processors (emulators,
    ACP)
  • Ethernet network
  • Integrated with tape systems
  • cps job scheduler, event manager

48
Condor a hunter of unused cycles
  • The hunter of idle workstations (1986)
  • ClassAd Matchmaking
  • users advertise their requirements
  • systems advertise their capabilities
    constraints
  • Directed Acyclic Graph Manager DAGman
  • define dependencies between jobs
  • Checkpoint reschedule restart
  • if the owner of the workstation returns
  • or if there is some failure
  • Share data through files
  • global shared files
  • Condor file system calls
  • Flocking
  • interconnecting pools of Condor-content
    workstations

http//www.cs.wisc.edu/condor/
49
Layout of the Condor Pool
ClassAd Communication Pathway
50
How Flocking Works
  • Add a line to your condor_config
  • FLOCK_HOSTS Pool-Foo, Pool-Bar

Collector
Negotiator
Submit Machine
Central Manager (CONDOR_HOST)
Pool-Foo Central Manager
Pool-Bar Central Manager
Schedd
51
Home Condor Pool
Friendly Condor Pool
52
Finer grained HTC
53
The food chain in reverse -- The PC has
consumed the market for larger computers
destroying the species -- There is no
choice but to harness the PCs
54
Berkeley - Networks of Workstations (1994)
  • Single system view
  • Shared resources
  • Virtual machine
  • Single address space
  • Global Layer Unix GLUnix
  • Serverless Network File Service xFS
  • Research project

A Case for Networks of Workstations NOW, IEEE
Micro, Feb, 1995, Thomas E. Anderson, David E.
Culler, David A. Patterson http//now.cs.berkeley
.edu
55
Beowulf
  • Nasa Goddard (Thomas Sterling, Donald Becker) -
    1994
  • 16 Intel PCs Ethernet - Linux
  • Caltech/JPL, Los Alamos
  • Parallel applications from the Supercomputing
    community
  • Oak Ridge 1996 The Stone SouperComputer
  • problem generate an eco-region map of the US, 1
    km grid
  • 64-way PC cluster proposal rejected
  • re-cycle rejected desktop systems
  • The experience, emphasis on do-it-yourself,
    packaging of some of the tools, and probably the
    name stimulated wide-spread adoption of
    clusters in the super-computing world

56
Parallel ROOT Facility - Proof
  • ROOT object oriented analysis tool
  • Queries are performed in parallel on an arbitrary
    number of processors
  • Load balancing
  • Slaves receive work from Master process in
    packets
  • Packet size is adapted to current load, number of
    slaves, etc.

proof
57
  • LHC Computing

58
CERN's Users in the World
Europe 267 institutes, 4603 usersElsewhere
208 institutes, 1632 users
59
The Large Hadron Collider Project 4 detectors
CMS
ATLAS
Storage Raw recording rate 0.1 1
GBytes/sec Accumulating at 5-8
PetaBytes/year 10 PetaBytes of
disk Processing 200,000 of todays fastest
PCs
LHCb
60
  • Worldwide distributed computing system
  • Small fraction of the analysis at CERN
  • ESD analysis using 12-20 large regional centres
  • how to use the resources efficiently
  • establishing and maintaining a uniform physics
    environment
  • Data exchange with tens of smaller regional
    centres, universities, labs

61
Disk
Mass Storage
CPU
Planned capacity evolution at CERN
62
Are Grids a solution?
  • The Grid Ian Foster, Carl Kesselman The
    Globus Project
  • Dependable, consistent, pervasive access to
    high-end resources
  • Dependable
  • provides performance and functionality guarantees
  • Consistent
  • uniform interfaces to a wide variety of resources
  • Pervasive
  • ability to plug in from anywhere

63
The Grid
The GRID ubiquitous access to computation in
the sense that the WEB provides ubiquitous
access to information
64
Globus Architecturewww.globus.org
Applications
Uniform application program interface to grid
resources
High-level Services and Tools
GlobusView
Testbed Status
DUROC
globusrun
MPI
Nimrod/G
MPI-IO
CC
middleware
Grid infrastructure primitives
Core Services
GRAM
Nexus
Metacomputing Directory Service
Globus Security Interface
Heartbeat Monitor
Gloperf
GASS
Mapped to local implementations, architectures,
policies
65
  • The nodes of the Grid
  • are managed by different people
  • so have different access and usage policies
  • and may have different architectures
  • The geographical distribution
  • means that there cannot be a central status
  • status information and resource availability is
    published (remember Condor Classified Ads)
  • Grid schedulers can only have an approximate view
    of resources
  • The Grid Middleware tries to present this as a
    coherent virtual computing centre

66
Core Services
  • Security
  • Information Service
  • Resource Management Grid scheduler, standard
    resource allocation
  • Remote Data Access global namespace, caching,
    replication
  • Performance and Status Monitoring
  • Fault detection
  • Error Recovery Management

67
The Promise of Grid Technology
  • What does the Grid do for you?
  • you submit your work
  • and the Grid
  • Finds convenient places for it to be run
  • Optimises use of the widely dispersed resources
  • Organises efficient access to your data
  • Caching, migration, replication
  • Deals with authentication to the different sites
    that you will be using
  • Interfaces to local site resource allocation
    mechanisms, policies
  • Runs your jobs
  • Monitors progress
  • Recovers from problems
  • .. and .. Tells you when your work is complete

68
LHC Computing Model2001 - evolving
The opportunity of Grid technology
The LHC Computing Centre
les.robertson_at_cern.ch
Write a Comment
User Comments (0)
About PowerShow.com