Parallel - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel

Description:

Title: Parallel & Distributed Computing Seminar (ICS691) Author: Henri Casanova Last modified by: Administrator Created Date: 5/13/2005 2:20:40 PM – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 54
Provided by: HenriCa7
Learn more at: http://users.cis.fiu.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel


1
High-Performance Grid Computing and Research
Networking
Introduction to High Performance Computing
Instructor S. Masoud Sadjadi http//www.cs.fiu.ed
u/sadjadi/Teaching/ sadjadi At cs Dot fiu Dot
edu
2
Acknowledgements
  • The content of many of the slides in this lecture
    notes have been adopted from the online resources
    prepared previously by the people listed below.
    Many thanks!
  • Henri Casanova
  • Principles of High Performance Computing
  • http//navet.ics.hawaii.edu/casanova
  • henric_at_hawaii.edu
  • Ligang He
  • http//www.dcs.warwick.ac.uk/liganghe
  • Email liganghe_at_dcs.warwick.ac.uk
  • Kai Wang
  • Department of Computer Science
  • University of South Dakota
  • http//www.usd.edu/Kai.Wang
  • Kyril Faenov
  • Director of High Performance Computing
  • Windows Server Group
  • Andrew Tanenbaum

3
Agenda
  • HPC Introduction
  • HPC Applications
  • HPC Goals
  • Concurrency
  • History

4
High Performance Computing
  • Difficult to define - its a moving target.
  • In 1980s
  • a supercomputer was performing 100 Mega FLOPS
  • FLOPS FLoating point Operations Per Second
  • Today
  • a 2G Hz desktop/laptop performs a few Giga FLOPS
  • a supercomputer performs tens of Tera FLOPS
    (Top500)
  • High Performance Computing loosely an order of
    1000 times more powerful than the latest desktops

5
Units of Measure in HPC
  • High Performance Computing (HPC) units are
  • Flops floating point operations
  • Flop/s floating point operations per second
  • Bytes size of data (double precision floating
    point number is 8)
  • Typical sizes are millions, billions, trillions
  • Mega Mflop/s 106 flop/sec Mbyte 106 byte
  • (also 220 1048576)
  • Giga Gflop/s 109 flop/sec Gbyte 109 byte
  • (also 230 1073741824)
  • Tera Tflop/s 1012 flop/sec Tbyte 1012 byte
  • (also 240 10995211627776)
  • Peta Pflop/s 1015 flop/sec Pbyte 1015 byte
  • (also 250 1125899906842624)
  • Exa Eflop/s 1018 flop/sec Ebyte 1018 byte

6
Metric Units
  • The principal metric prefixes.

7
High Performance Computing
  • HPC
  • The term high performance computing (HPC) refers
    to the use of (parallel) supercomputers and
    computer clusters, that is, computing systems
    comprised of multiple (usually mass-produced)
    processors linked together in a single system
    with commercially available interconnects.
  • Wikipedia
  • This is in contrast to mainframe computers,
    which are generally monolithic in nature.
  • Wikipedia

8
High Performance Computing
  • HPC
  • The more current and evolving definition of HPC
    refers to High Productivity Computing, and
    reflects the purpose and use model of the myriad
    of existing and evolving architectures, and the
    supporting ecosystem of software, middleware,
    storage, networking and tools behind the next
    generation of applications.
  • Wikipedia
  • Parallel Computing
  • Computing on parallel computers
  • Super Computing
  • Computing on top 500 machines

9
High Performance Computing
  • The definition that we use in this course
  • How do we make computers to compute bigger
    problems faster?
  • Three main issues
  • Hardware How do we build faster computers?
  • Software How do we write faster programs?
  • Hardware and Software How do they interact?
  • Many perspectives
  • architecture
  • systems
  • programming
  • modeling and analysis
  • simulation
  • algorithms and complexity

10
High Performance Computing
  • HPC Related Technologies
  • HPC is an all-encompassing term for related
    technologies that continually push computing
    boundaries.
  • Computer architecture
  • CPU, memory, VLSI
  • Compilers
  • Identify inefficient implementations
  • Make use of the characteristics of the computer
    architecture
  • Choose suitable compiler for a certain
    architecture
  • Algorithms (for parallel and distributed systems)
  • How to program on parallel and distributed
    systems
  • Middleware
  • From Grid computing technology
  • Application-gtmiddleware-gtoperating system
  • Resource discovery and sharing

11
High Performance Computing
  • The key techniques for making computers compute
    bigger problems faster is to use multiple
    computers at once
  • Later in this lecture, we will learn why!
  • This is called parallelism
  • It takes 1000 hours for this program to run on
    one computer!
  • Well, if I use 100 computers maybe it will take
    only 10 hours?!
  • This computer can only handle a dataset thats
    2GB!
  • So maybe if I use 100 computers I can deal with a
    200GB dataset?!
  • We will spend enough time to learn and experience
    different flavors of parallel computing
  • shared-memory parallelism
  • distributed-memory parallelism
  • hybrid parallelism

12
Agenda
  • HPC Introduction
  • HPC Applications
  • HPC Goals
  • Concurrency
  • History

13
Words of Wisdom
  • Four or five computers should be enough for the
    entire world until the year 2000.
  • T.J. Watson, Chairman of IBM, 1945.
  • 640KB of memory ought to be enough for
    anybody.
  • Bill Gates, Chairman of Microsoft,1981.
  • You may laugh at their vision today, but
  • Lesson learned Dont be too visionary and try to
    make things work! )
  • We now know this was not quite true!
  • Games
  • Digital video/images
  • Databases
  • Operating systems
  • But the first people to really need more
    computing oomph where scientists
  • And they go way back

14
Evolution of Science
  • Traditional scientific and engineering
  • Do theory or paper design
  • Perform experiments or build system
  • Limitations
  • Too difficult -- build large wind tunnels
  • Too expensive -- build a throw-away airplane
  • Too slow -- wait for climate or galactic
    evolution
  • Too dangerous -- weapons, drug design, climate
    experiments
  • Solution
  • Use high performance computer systems to simulate
    the phenomenon

15
Scientific Computing
  • Use of computers to solve/compute scientific
    models
  • For instance, many natural phenomena can be well
    approximated by differential equations
  • Classic Example Heat Transfer
  • Consider a 1-D material between 2 heat sources

T H
T L
x
16
Scientific Computing
  • Use of computers to solve/compute scientific
    models
  • For instance, many natural phenomena can be well
    approximated by partial differential equations
    (PDEs)
  • Problem compute f(x,t)

T H
T L
f(x,t) temperature at location x at time t
0 lt x lt X
17
Heat Transfer
  • The laws of physics say that
  • where alpha depends on the material
  • where f(0,t) H, f(X,t) L and f(x,0) are all
    fixed
  • Called the boundary conditions
  • Question How do we solve this PDE?
  • It does not have an analytical solution
  • Therefore it must be solved numerically (i.e.,
    via approximation)

18
Heat Transfer
  • One well-known methods to solve the heat equation
    is called finite differences
  • Approach
  • Discretize the domain decide that the values of
    f(x,t) will only be known for some finite (but
    large) number of values of x and t
  • The discretized domain is called a mesh
  • All x values are separated by ?x
  • All t values are separated by ?t
  • Then, one replaces partial derivatives by
    algebraic differences
  • In the limit, when ?x and ?t go to zero, we get
    close to the real solution

19
Heat Transfer
  • There are many different approximations of the
    partial derivatives, based on Taylor series
    developments, etc.
  • For instance, denoting f(x,t) as (discrete) fi,m,
    we can write the Forward Time, Centered Space
    (FTCS) heat transfer equation as
  • The various discretizations of the heat transfer
    equation have advantages and drawbacks in terms
    of
  • complexity
  • numerical stability
  • (if youre into it, there are countless papers
    and textbooks)
  • We have transfer a difficult PDE into some type
    of algebraic induction!
  • Easy to compute in an iterative fashion
  • Given all the values at time m, one can compute
    all the values at time m1

20
Heat Transfer
  • Summary
  • But they all use some matrix or volume of numbers
    (in the 2-D and 3-D cases) and iteratively do
    additions, multiplications and divisions, for
    many iterations
  • Therefore, we can replace difficult calculus by
    simple computations on multi-dimensional arrays
    of numbers
  • Challenges
  • These matrices may be really big, for better
    resolution and larger domains ? Large Data
  • The number of additions and multiplications can
    be overwhelming ? Heavy Computation
  • Hence
  • the early and always constant need of scientists
    to get bigger memories and faster CPUs

21
HPC Applications
  • Science
  • Global climate modeling
  • Astrophysical modeling
  • Biology genomics protein folding drug design
  • Computational Chemistry
  • Computational Material Sciences and Nanosciences
  • Engineering
  • Crash simulation
  • Semiconductor design
  • Earthquake and structural modeling
  • Computation fluid dynamics (airplane design)
  • Combustion (engine design)
  • Business
  • Financial and economic modeling
  • Transaction processing, web services and search
    engines
  • Defense
  • Nuclear weapons -- test by simulation
  • Cryptography

22
Example Computational Fluid Dynamics (CFD)
Replacing NASAs Wind Tunnels with Computers
23
Example Global Climate
  • Problem is to compute
  • f (latitude, longitude, elevation, time) ?
  • temperature, pressure,
    humidity, wind velocity
  • Approach
  • Discretize the domain, e.g., a measurement point
    every 10 km
  • Devise an algorithm to predict weather at time
    t1 given t
  • Uses
  • Predict El Nino
  • Set air emissions standards

24
Global Climate Requirements
  • One piece is modeling the fluid flow in the
    atmosphere
  • Solve Navier-Stokes problem
  • Roughly 100 Flops per grid point with 1 minute
    timestep
  • Computational requirements
  • To match real-time, need 5x1011 flops in 60
    seconds 8 Gflop/s
  • Weather prediction (7 days in 24 hours) ? 56
    Gflop/s
  • Climate prediction (50 years in 30 days) ? 4.8
    Tflop/s
  • To use in policy negotiations (50 years in 12
    hours) ? 288 Tflop/s
  • Lets make it even worse!
  • To 2x grid resolution, computation is gt 8x
  • State of the art models require integration of
    atmosphere, ocean, sea-ice, land models, plus
    possibly carbon cycle, geochemistry and more
  • Current models are coarser than this!

25
High Resolution Climate Modeling on NERSC-3 P.
Duffy, et al., LLNL
26
1000-year climate
  • Demonstration of the Community Climate Model
    (CCSM2)
  • A 1000-year simulation shows long-term, stable
    representation of the earths climate.
  • 760,000 processor hours used
  • Temperature change shown

Warren Washington and Jerry Meehl, National
Center for Atmospheric Research Bert Semtner,
Naval Postgraduate School John Weatherly, U.S.
Army Cold Regions Research and Engineering Lab
Laboratory et al. http//www.nersc.gov/aboutnersc
/pubs/bigsplash.pdf
27
Agenda
  • HPC Introduction
  • HPC Applications
  • HPC Goals
  • Concurrency
  • History

28
Goals of HPC
  • Minimize turn-around time
  • to complete specific application problems (strong
    scaling)
  • Maximise the problem size
  • that can be solved given a set amount of time
    (weak scaling)
  • Identify compromise between
  • performance and cost.
  • Note Most supercomputers are obsolete
  • in terms of performance before the end of their
    physical life.

29
Maximizing Performance
  • How is performance maximized?
  • Reduce the time per instruction (cycle time) 1
    clock rate.
  • Increase the number of instructions executed
    per-cycle 2 pipelining.
  • Allow multiple processors to work on different
    parts of the same program at the same time 3
    parallel execution.
  • When performance is gained from 1 and 2
  • There is a limit to how quick processors will
    operate.
  • Speed of light and electricity.
  • Heat dissipation.
  • Power consumption
  • An instruction processing procedure cannot be
    divided into infinite stages
  • When performance improvements come from 3
  • Overhead of communications

30
A 10 TFlop/s CPU?
  • Question Could we build a single CPU that
    delivers 10,000 billion floating point operations
    per second (10 TFlops), and operates over 10,000
    billion bytes (10 TByte)?
  • Representative of what many scientists need
    today.
  • Clock rate has to be 10,000 GHz
  • Assume that data travels at the speed of light
  • Assume that the computer is an ideal sphere

31
A 10 TFlop/s CPU?
  • Assume that the machine issues one instruction
    per cycle
  • therefore the clock rate must be 10,000GHz
    1013 Hz
  • Data must travel some distance from the memory to
    the CPU
  • Assume that Each instruction will need at least
    one 8 bytes of memory
  • Assume that data travels at the speed of light
    c3x108 m/s
  • Then the distance between the memory and the CPU
    must be r lt c / 1013 3x10-6 m
  • Then we must have 1013 bytes of memory in 4/3?r3
    3.7e-17 m3
  • Therefore, each word of memory must occupy
    3.7e-30 m3
  • This is 3.7 Angstrom3
  • Or the volume of a very small molecule that
    consists of only a few atoms
  • Current memory densities are 10GB/cm3,
  • or about a factor 1020 from what would be needed!
  • Conclusion Its not going to happen until some
    scifi breakthrough happens

32
Agenda
  • HPC Introduction
  • HPC Applications
  • HPC Goals
  • Concurrency
  • History

33
Concurrency
  • Since we cannot conceivably build a single CPU to
    solve relevant scientific problems, we resort to
    concurrency
  • execution of multiple tasks at the same time
  • Concurrency is everywhere in computers
  • Load a word from memory while adding two
    registers
  • Adding two pairs of registers at the same time
  • Receiving data from the network while writing to
    disk
  • Dual-proc systems
  • Clusters of workstations
  • SETI_at_home
  • Some concurrency is true
  • meaning that things really happen at the same
    time
  • Some concurrency is just the illusion
  • of simultaneous execution, with rapid switching
    among activities

34
Concurrent, parallel, distributed?
  • Concurrency is typically the more general term
  • A program is said to be concurrent if it contains
    more than one execution context
  • e.g., more than one thread/process
  • Typically the word parallel implies some notion
    of high performance / scientific application
    running on a single hardware platform
  • The word distributed typically refers to
    applications that run on multiple computers that
    may not be in the same room
  • These terms are conflated and misused all the
    time in different research communities they mean
    different things.
  • Well see that distinctions are disappearing
    anyway

35
Two Types of HPC
  • Parallel Computing
  • Breaking the problem to be computed into parts
    that can be run simultaneously in different
    processors
  • Distributed Computing
  • Parts of the work to be computed are computed in
    different places
  • Note does not necessarily imply simultaneous
    processing
  • An example C/S model
  • Solve loosely-coupled problems
  • (no much communication)

36
Parallel Computing
  • Architectures of Parallel Computing
  • SMP (Symmetric Multi-Processing)
  • Multiple CPUs, single memory, shared I/O
  • All resources in a SMP machine are equally
    available to each CPU
  • Does not scale well to a large number of
    processors (less than 8)
  • NUMA (Non-Uniform Memory Access)
  • Multiple CPUs
  • Each CPU has fast access to its local area of the
    memory, but slower access to other areas
  • Scale well to a large number of processors
  • Complicated memory access pattern
  • MPP (Massively Parallel Processing)
  • Cluster

37
Reasons for Concurrency
  • Concurrency arises for at least 4 reasons
  • To increase performance or memory capacity
  • To allow users and computers to collaborate
  • To capture the logical structure of a problem
  • To cope with independent physical devices

38
Reason 1
  • To increase performance

39
Reason 1 (cont.)
  • To increase memory capacity
  • Example
  • A 3D weather simulation over Kaneohe Bay (1-meter
    resolution)
  • Say we consider a volume 2km x 2km x 1km over the
    bay
  • Each zone is characterized by, say, temperature,
    wind direction, wind velocity, air pressure, air
    moisture, for a total of (13111)8 56 bytes
  • Therefore we need about 208GB of memory to hold
    the data
  • Option 1 Buy a machine with gt 208 GB RAM
  • 96GB server from Sun about 1 million dollar!
  • They have a 288GB configuration (contact them for
    price)
  • There is a 3TB shared-memory SGI machine at NCSA
  • Option 2 Couple individual machines together
  • Buy 52 4GB Power-edge servers from Dell for 2.5K
    each
  • Slap some network on them and youve got enough
    memory
  • total cost 200K
  • But its not as simple as that!

40
Reason 1 (cont.)
Interferometer Gravitational Wave Observatory
(LIGO) Tiny distortions of space and time caused
when very large masses, such as stars, move
suddenly. 1TB/day (1024 GB/day), Year-long
experiments
The Compact Muon Solenoid At CERN, designed to
study proton-proton collisions with high quality
measurements (12,000 tons) 10 GB/sec!!! Many
PB/year (1024 TB/year)
41
Reason 2
  • To allow users and computers to collaborate
  • Example
  • Assume that we want to allow users to do on-line
    purchases
  • We need Web browsers, Web servers, Database
    servers
  • All these are processes
  • They all communicate with multiple processes
    simultaneously, they are all multithreaded,
    running on multiple machines, some of them are
    multi-proc servers
  • Its just a big concurrent system and it is
    critical that it be fast and correct!

42
Reason 3
  • To capture the logical structure of a problem
  • Example
  • Lets assume that we want to write a program that
    simulates the interactions between a robot and
    living entities
  • We can implement the robot as its own thread
  • The code is just the code of the robot
  • We can implement each entity as its own thread
  • The code is the simulation of the entitys
    behavior
  • Now we let them loose at the beginning
  • They may meet, interact, etc.
  • All of this happens without a central notion of
    control, although I may be running on a single
    CPU
  • Concurrency just fits the problem

43
Reason 4
  • To cope with independent physical devices
  • Example
  • Lets assume that we want to write a program that
    receives data from the network, processes it, and
    writes output to the disk
  • We can read from the network and write to disk at
    the same time almost for free
  • We can compute on the data while I receive from
    the network almost for free
  • We can compute on the data while I write the
    previously computed data to disk almost for
    free
  • We are better off writing this program as three
    concurrent threads (even if on a single CPU)
  • Each thread uses one independent device of the
    computer

44
Agenda
  • HPC Introduction
  • HPC Applications
  • HPC Goals
  • Concurrency
  • History

45
A brief history of concurrency
  • First machines were used in single-user mode
  • The user would declare I am going to use the
    machine for 2PM till 4PM
  • Then the user would go in the special machine
    room and sit there for 2 hours
  • The user punches cards, which were prepared in
    advance
  • The user tries to run the program
  • The user tries to debug the program
  • etc. etc.
  • Extreme lack of productivity
  • During the users thinking time, the
    multi-million machine practically does nothing!

46
A brief history of concurrency
  • Batch Processing!
  • Instead of reserving the machine for a lapse of
    time to do all the activities (including
    debugging), the user just submits requests to a
    queue
  • The queue serves requests in order (possibly with
    priorities)
  • When the program fails and stops, another program
    is scheduled to use the machine immediately
  • Great! But how about the CPU idle time during the
    I/O!

47
A brief history of concurrency
48
A brief history of concurrency
  • Multi-programming!
  • Multiple programs reside in memory at once
  • Required interrupts and memory protection
  • Interrupts are used to switch programs between
    devices and CPUs
  • Concurrency issues in the O/S
  • race conditions, deadlocks, critical sections
  • semaphores, monitors, etc.
  • beginning of theory of concurrent systems (1960)
  • Increase in memory size
  • Development of virtual memory

49
A brief history of concurrency
  • Multiprogramming system
  • three jobs in memory

50
A brief history of concurrency
  • Time-sharing!
  • For fast, interactive response, one needs fast
    context switching
  • Makes it possible to have the illusion that one
    is alone on a (perhaps slower) machine
  • Already common by 1970
  • Led to concurrency in user applications!
  • The users application is logically two
    concurrent tasks
  • The user can now implement it as two concurrent
    tasks!

51
A brief history of concurrency
  • Technology advances!
  • Multiple CPUs on a motherboard
  • faster buses, shared-memory, cache coherency
  • Networked computers
  • distributed memory
  • Clusters, ..., Internet
  • Concurrency across CPUs
  • Also Concurrency within the CPU at the hardware
    level
  • Beyond CPU and I/O devices
  • Multiple units (e.g., ALUs)
  • Vector processors
  • Pipelining

52
History, Another Perspective
  • 1960s Scalar processor
  • Process one data item at a time
  • 1970s Vector processor
  • Can process an array of data items at one go
  • Architecture
  • Overhead
  • Later 1980s Massively Parallel Processing (MPP)
  • Up to thousands of processors, each with its own
    memory and OS
  • Break down a problem
  • Later 1990s Cluster
  • Not a new term itself, but renewed interests
  • Connecting stand-alone computers with high-speed
    network
  • Later 1990s Grid
  • Tackle collaboration
  • Draw an analogue from Power grid

53
Issues with concurrency
  • Concurrency appears at all levels of current
    systems
  • hardware
  • O/S
  • Application
  • Many fields of computer science study concurrency
    issues
  • Three main issues
  • Performance
  • Correctness
  • Programmability

54
Many connected areas
  • Computer architecture
  • Networking
  • Operating Systems
  • Scientific Computing
  • Theory of Distributed Systems
  • Theory of Algorithms and Complexity
  • Scheduling
  • Internetworking
  • Programming Languages
  • Distributed Systems
  • High Performance Computing
Write a Comment
User Comments (0)
About PowerShow.com