Principles of High Performance Computing ICS 632 - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Principles of High Performance Computing ICS 632

Description:

This field is both old and new, very diverse, possibly complicated, ... Conclusion: It's not going to happen until some scifi breakthrough happens. Concurrency ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 51
Provided by: henrica
Category:

less

Transcript and Presenter's Notes

Title: Principles of High Performance Computing ICS 632


1
Principles of High Performance Computing (ICS
632)
  • Henri Casanova
  • http//navet.ics.hawaii.edu/casanova
  • henric_at_hawaii.edu

2
What is this course about?
  • High Performance Computing How do we make
    computers compute bigger problems faster?
  • This field is both old and new, very diverse,
    possibly complicated, interesting
  • Three main issues
  • Hardware How do we build faster computers?
  • Software How do we write faster programs?
  • Hardware and Software How do they interact?
  • Many perspectives
  • architecture
  • systems
  • programming
  • modeling and analysis
  • simulation
  • algorithms and complexity

3
What is this class about?
  • The key techniques for making computers compute
    bigger problems faster is to use multiple
    computers at once
  • This is called parallelism
  • It takes 1000 hours for this program to run, if I
    use 100 computers maybe it will take only 10
    hours
  • This computer can only handle a dataset thats
    2GB, so maybe if I use 100 computers I can deal
    with a 200GB dataset
  • We will spend most of the class talking about and
    experiencing different flavors of parallel
    computing
  • shared-memory parallelism (a little bit - see
    ICS432)
  • distributed-memory parallelism
  • hybrid parallelism

4
What are we going to do?
  • This class is both hands-on . . .
  • We will write parallel program and execute them
    on parallel architectures
  • . . . and about some algorithms.
  • We will have
  • programming assignments
  • reading assignments with discussions
  • a project
  • a paper presentation
  • a final exam

5
Programming Assignments
  • To be done in 1 or 2 weeks
  • To make sure that you know how to use the
    techniques/tools that well describe in class
  • To make sure that youll be able to do your
    project
  • Some will be competitive

6
Pencil-Paper Assignments
  • Well have a few assignments that will not
    involve any programming but only reasoning about
    algorithms

7
Reading Assignments
  • You will be assigned 1-2 papers to read during
    the semester
  • Well discuss these papers in class and well
    answer questions about them
  • There will be a few questions about the papers on
    the final exam

8
Projects
  • During several weeks (to be determined depending
    on topics, but perhaps as much as half of the
    semester for some students), you will do a
    project
  • You can pick the topic yourself (but I must
    approve it)
  • based on your own research
  • based on work for another class
  • based on something that caught your interest
    during this class
  • Ill have a list of canned project topics
  • some from my own research, perhaps
  • some that Ive been curious about
  • some that are classic but that still
    interesting
  • The intent is not for the project to be huge, but
    to go further than just a simple programming
    assignment

9
Projects
  • You will turn in a (brief) project report
  • What you did and how you did it
  • What performance you achieved running what kind
    of experiments
  • More details on this as we go through the
    programming assignments as warm-ups.
  • You will do a (brief) class presentation of your
    project
  • Just so that all students know what others have
    done
  • Whats interesting is to highlight was was
    difficult, what was fun, what was a pain, what
    you would do differently, what tool you wish you
    had had, etc.
  • Again, more detail on this later in the semester

10
Paper Presentation
  • You will pick one research paper in a list that I
    will provide
  • classic papers or recent exciting papers
  • You will give a (1/2 hour) class presentation
    that will be the beginning of a class discussion
  • What the paper is trying to do
  • How it tries to do it
  • What the conclusion is
  • What you didnt understand
  • What you thought was good/bad
  • What makes a non-boring presentation?
  • Raising issues
  • Being critical
  • Being the devils advocate

11
Paper Presentation
  • I know that youre not researchers in this field
    (yet), so you may not understand everything
  • You can come ask me
  • Well discuss them in class
  • Ill have read the papers )
  • Why presentations?
  • Because as graduate students you should gain
    experience doing it
  • Because hearing what somebody may not understand
    about a paper is a great way to learn
  • Because its a good way for students to hear
    about more things than they could possibly
    learn/read during the semester on their own

12
Class Website
  • URL http//navet.ics.hawaii.edu/casanova/courses
    /ics632_fall08
  • Contains
  • Syllabus
  • Lecture notes
  • With my slides posted regularly
  • Typically one week before the lecture
  • List of assignments, projects, papers
  • To appear when needed

13
The Textbook
  • The textbook is more on the theoretical aspect of
    the course
  • This is the first printing, so let me know all
    typos and mistakes!

14
Grading
  • Grading will be based on
  • Project (20)
  • Programming assignments (40)
  • Paper presentation (15)
  • Final exam (25)

15
QUESTIONS?
16
Words of Wisdom
  • 640KB of memory ought to be enough for
    anybody.
  • Bill Gates, Chairman of Microsoft,1981.
  • We now know this was not _quite_ true
  • Games
  • Digital video/images
  • Databases
  • Operating systems
  • But the first people to really need more
    computing oomph where scientists
  • And they go way back

17
Evolution of Science
  • Traditional scientific and engineering
  • Do theory or paper design
  • Perform experiments or build system
  • Limitations
  • Too difficult -- build large wind tunnels
  • Too expensive -- build a throw-away airplane
  • Too slow -- wait for climate or galactic
    evolution
  • Too dangerous -- weapons, drug design, climate
    experiments
  • Solution
  • Use high performance computer systems to simulate
    the phenomenon

18
Scientific Computing
  • Use of computers to solve/compute scientific
    models
  • For instance, many natural phenomena can be well
    approximated by differential equations
  • Classic Example Heat Transfer
  • Consider a 1-D material between 2 heat sources

T H
T L
x
19
Scientific Computing
  • Use of computers to solve/compute scientific
    models
  • For instance, many natural phenomena can be well
    approximated by partial differential equations
    (PDEs)
  • Classic Example Heat Transfer
  • Consider a 1-D material between 2 heat sources
  • Problem compute f(x,t)

T H
T L
f(x,t) temperature at location x at time t
0 lt x lt X
20
Heat Transfer
  • The laws of physics say that
  • where alpha depends on the material
  • where f(0,t) H, f(X,t) L and f(x,0) are all
    fixed
  • Called the boundary conditions
  • Question How do we solve this PDE?
  • It does not have an analytical solution
  • Therefore it must be solved numerically (i.e.,
    via approximation)

21
Heat Transfer
  • One well-known methods to solve the heat equation
    is called finite differences
  • Approach
  • Discretize the domain decide that the values of
    f(x,t) will only be known for some finite (but
    large) number of values of x and t
  • The discretized domain is called a mesh
  • All x values are separated by ?x
  • All t values are separated by ?t
  • Then, one replaces partial derivatives by
    algebraic differences
  • In the limit, when ?x and ?t go to zero, we get
    close to the real solution

22
Heat Transfer
  • There are many different approximations of the
    partial derivatives, based on Taylor series
    developments, etc.
  • For instance, denoting f(x,t) as (discrete) fi,m,
    one can write the Forward Time, Centered Space
    (FTCS) heat transfer equation as
  • The various discretizations of the heat transfer
    equation have advantages and drawbacks in terms
    of
  • complexity
  • numerical stability
  • (if youre into it, there are countless papers
    and textbooks)
  • We have transfer a difficult PDE into some type
    of algebraic induction!
  • Easy to compute in an iterative fashion
  • Given all the values at time m, one can compute
    all the values at time m1

23
Heat Transfer
  • But they all use some matrix or volume of numbers
    (in the 2-D and 3-D cases) and iteratively do
    additions, multiplications and divisions, for
    many iterations
  • We have replaced difficult calculus by simple
    computations on multi-dimensional arrays of
    numbers
  • Unfortunately there are two new challenges
  • These matrices may be really big, for better
    resolution and larger domains (e.g., spaceshuttle
    hull)
  • The number of additions and multiplications can
    be staggering
  • Hence the early and always constant need of
    scientists to get bigger memories and faster CPUs

24
Challenging HPC computations
  • Science
  • Global climate modeling
  • Astrophysical modeling
  • Biology genomics protein folding drug design
  • Computational Chemistry
  • Computational Material Sciences and Nanosciences
  • Engineering
  • Crash simulation
  • Semiconductor design
  • Earthquake and structural modeling
  • Computation fluid dynamics (airplane design)
  • Combustion (engine design)
  • Business
  • Financial and economic modeling
  • Transaction processing, web services and search
    engines
  • Defense
  • Nuclear weapons -- test by simulation
  • Cryptography

25
Units of Measure in HPC
  • High Performance Computing (HPC) units are
  • Flops floating point operations
  • Flop/s floating point operations per second
  • Bytes size of data (double precision floating
    point number is 8)
  • Typical sizes are millions, billions, trillions
  • Mega Mflop/s 106 flop/sec Mbyte 106 byte
  • (also 220 1048576)
  • Giga Gflop/s 109 flop/sec Gbyte 109 byte
  • (also 230 1073741824)
  • Tera Tflop/s 1012 flop/sec Tbyte 1012 byte
  • (also 240 10995211627776)
  • Peta Pflop/s 1015 flop/sec Pbyte 1015 byte
  • (also 250 1125899906842624)
  • Exa Eflop/s 1018 flop/sec Ebyte 1018 byte

26
Example CFD
Replacing NASAs Wind Tunnels with Computers
27
Example Global Climate
  • Problem is to compute
  • f(latitude, longitude, elevation, time) ?
  • temperature, pressure,
    humidity, wind velocity
  • Approach
  • Discretize the domain, e.g., a measurement point
    every 10 km
  • Devise an algorithm to predict weather at time
    t1 given t
  • Uses
  • Predict El Nino
  • Set air emissions standards

28
Global Climate Requirements
  • One piece is modeling the fluid flow in the
    atmosphere
  • Solve Navier-Stokes problem
  • Roughly 100 Flops per grid point with 1 minute
    timestep
  • Computational requirements
  • To match real-time, need 5x 1011 flops in 60
    seconds 8 Gflop/s
  • Weather prediction (7 days in 24 hours) ? 56
    Gflop/s
  • Climate prediction (50 years in 30 days) ? 4.8
    Tflop/s
  • To use in policy negotiations (50 years in 12
    hours) ? 288 Tflop/s
  • To 2x grid resolution, computation is gt 8x
  • State of the art models require integration of
    atmosphere, ocean, sea-ice, land models, plus
    possibly carbon cycle, geochemistry and more
  • Current models are coarser than this!

29
High Resolution Climate Modeling on NERSC-3 P.
Duffy, et al., LLNL
30
1000-year climate
  • Demonstration of the Community Climate Model
    (CCSM2)
  • A 1000-year simulation shows long-term, stable
    representation of the earths climate.
  • 760,000 processor hours used
  • Temperature change shown

Warren Washington and Jerry Meehl, National
Center for Atmospheric Research Bert Semtner,
Naval Postgraduate School John Weatherly, U.S.
Army Cold Regions Research and Engineering Lab
Laboratory et al. http//www.nersc.gov/aboutnersc
/pubs/bigsplash.pdf
31
Sci. Comp. and this class
  • The traditional application of HPC has been
    scientific computing
  • It has had a lot of influence on the field
  • So well use scientific computing as examples and
    as a basis for some programming assignments
  • Simple stuff, not need to be scared
  • But we will see that the class content is
    applicable to all types of domain

32
A 10 TFlop/s CPU?
  • Question Could we build a single CPU that
    delivers 10,000 billion floating point operations
    per second (10 TFlops), and operates over 10,000
    billion bytes (10 TByte)?
  • Representative what many scientists need today.
  • Clock rate has to be 10,000 GHz
  • Assume that data travels at the speed of light
  • Assume that the computer is an ideal sphere

33
A 10 TFlop/s CPU?
  • The machine issues one instruction per cycle, and
    therefore the clock rate must be 10,000GHz
    1013 Hz
  • Data must travel some distance from the memory to
    the CPU
  • Each instruction will need at least one 8 bytes
    of memory
  • Assume data travels at the speed of light c3e8
    m/s
  • Then the distance between the memory and the CPU
    must be r lt c / 1013 3e-6 m
  • Then we must have have 1013 bytes of memory in
    4/3?r3 3.7e-17 m3
  • Therefore, each word of memory must occupy
    3.7e-30 m3
  • This is 3.7 Angstrom3
  • Or the volume of a very small molecule
    that consists of only a few atoms
  • Current memory densities are 10GB/cm3, or about a
    factor 1020 from what would be needed!
  • Conclusion Its not going to happen until some
    scifi breakthrough happens

34
Concurrency
  • Since we cannot conceivably build a single CPU to
    solve relevant scientific problems, one resorts
    to concurrency execution of multiple tasks at
    the same time
  • Concurrency is everywhere in computers
  • Load a word from memory while adding two
    registers
  • Adding two pairs of registers at the same time
  • Receiving data from the network while writing to
    disk
  • Dual-proc systems
  • Clusters of workstations
  • SETI_at_home
  • Some concurrency is true, meaning that things
    really happen at the same time
  • Some concurrency is just the illusion of
    simultaneous execution, with rapid switching
    among activities

35
Concurrent, parallel, distributed?
  • Concurrency is typically the more general term
  • A program is said to be concurrent if it contains
    more than one execution context
  • e.g., more than one thread/process
  • Typically the word parallel implies some notion
    of high performance / scientific application
    running on a single hardware platform
  • The word distributed typically refers to
    applications that run on multiple computers that
    may not be in the same room
  • These terms are conflated and misused all the
    time in different research communities they mean
    different things.
  • Well see that distinctions are disappearing
    anyway

36
Reasons for Concurrency
  • Concurrency arises for at least 4 reasons
  • To increase performance or memory capacity
  • To allow users and computers to collaborate
  • To capture the logical structure of a problem
  • To cope with independent physical devices

37
Reason 1
  • To increase performance
  • Example
  • Say I want to compute the Mandelbrot Set
  • For each complex number c
  • Define the series
  • Z0 0
  • Zn1 Zn2 c
  • If the series converges, put a black dot at
    point c
  • I could use 4 processors, each computing one
    quadrant of the image
  • I should go 4 times as fast!
  • Not clear in fact

38
Reason 1 (cont.)
  • To increase memory capacity
  • Example
  • A 3D weather simulation over Kaneohe Bay (1-meter
    resolution)
  • Say we consider a volume 2km x 2km x 1km over the
    bay
  • Each zone is characterized by, say, temperature,
    wind direction, wind velocity, air pressure, air
    moisture, for a total of (13111)8 56 bytes
  • Therefore I need about 208GB of memory to hold
    the data
  • Option 1 Buy a machine with gt 208 GB RAM
  • 96GB server from Sun about 1 million dollar!
  • They have a 288GB configuration (contact them for
    price)
  • There is a 3TB shared-memory SGI machine at NCSA
  • Option 2 Couple individual machines together
  • Buy 52 4GB Power-edge servers from Dell for 2.5K
    each
  • Slap some network on them and youve got enough
    memory
  • total cost 200K
  • But its not as simple as that!

39
Large-Scale Data Analysis
Interferometer Gravitational Wave Observatory
(LIGO) Tiny distortions of space and time caused
when very large masses, such as stars, move
suddenly. 1TB/day (1024 GB/day), Year-long
experiments
The Compact Muon Solenoid At CERN, designed to
study proton-proton collisions with high quality
measurements (12,000 tons) 10 GB/sec!!! Many
PB/year (1024 TB/year)
40
Reason 3
  • To allow users and computers to collaborate
  • Example
  • I want to allow users to do on-line purchases
  • I have Web browsers, Web servers, Database
    servers
  • All these are processes
  • They all communicate with multiple processes
    simultaneously, they are all multithreaded,
    running on multiple machines, some of them are
    multi-proc servers
  • Its just a big concurrent system and it is
    critical that it be fast and correct!

41
Reason 3
  • To capture the logical structure of a problem
  • Example
  • I want to write a program that simulates the
    interactions between a robot and living entities
  • I implement the robot as its own thread
  • The code is just the code of the robot
  • I implement each entity as its own thread
  • The code is the simulation of the entitys
    behavior
  • I let them loose at the beginning
  • They may meet, interract, etc.
  • All of this happens without a central notion of
    control, although I may be running on a single
    CPU
  • Concurrency just fits the problem

42
Reason 4
  • To cope with independent physical devices
  • Example
  • I want to write a program that receives data from
    the network, processes it, and writes output to
    disk
  • I can read from the network and write to disk at
    the same time almost for free
  • I can compute on the data while I receive from
    the network almost for free
  • I can compute on the data while I write the
    previously computed data to disk almost for
    free
  • I am better off writing this program as three
    concurrent threads (even if on a single CPU)
  • Each thread uses one independent device of the
    computer

43
A brief history of concurrency
  • First machines were used in single-user mode
  • I declare I am going to use the machine for 2PM
    till 4PM
  • I go in the special machine room and sit there
    for 2 hours
  • I try the punch cards that I have prepared in
    advance
  • I find bugs
  • I debug
  • etc.
  • Extreme lack of productivity
  • During my thinking time, this multi-million
    machine does nothing

44
A brief history of concurrency (2)
  • Batch Processing!
  • Instead of reserving the machine for a lapse of
    time to do all my activities (including
    debugging), I submit requests to a queue
  • The queue serves requests in order (possibly with
    priorities)
  • When my program fails and stops, somebody else
    gets the machine immediately
  • Great but CPU idle during I/O!

45
A brief history of concurrency (3)
  • Multi-programming!
  • Multiple programs reside in memory at once
  • Required interrupts and memory protection
  • Interrupts are used to switch programs between
    devices and CPUs
  • Concurrency issues in the O/S
  • race conditions, deadlocks, critical sections
  • semaphores, monitors, etc.
  • beginning of theory of concurrent systems (1960)
  • Increase in memory size
  • Development of virtual memory

46
A brief history of concurrency (4)
  • Time-sharing!
  • For fast, interactive response, one needs fast
    context switching
  • Makes it possible to have the illusion that one
    is alone on a (perhaps slower) machine
  • Already common by 1970
  • Led to concurrency in user applications!
  • My application is logically two concurrent
    tasks
  • I can now implement it as two concurrent tasks!

47
A brief history of concurrency (5)
  • Technology advances!
  • Multiple CPUs on a motherboard
  • faster buses, shared-memory, cache coherency
  • Networked computers
  • distributed memory
  • Clusters, ..., Internet
  • Concurrency across CPUs
  • Also Concurrency within the CPU at the hardware
    level
  • Beyond CPU and I/O devices
  • Multiple units (e.g., ALUs)
  • Vector processors
  • Pipelining

48
Issues with concurrency
  • Concurrency appears at all levels of current
    systems
  • hardware
  • O/S
  • Application
  • Many fields of computer science study concurrency
    issues
  • Three main issues
  • Performance
  • Correctness
  • Programmability

49
Many connected areas
  • Computer architecture
  • Networking
  • Operating Systems
  • Scientific Computing
  • Theory of Distributed Systems
  • Theory of Algorithms and Complexity
  • Scheduling
  • Internetworking
  • Programming Languages
  • Distributed Systems
  • Parallel Programming

50
Many connected areas
  • Computer architecture
  • Networking
  • Operating Systems
  • Scientific Computing
  • Theory of Distributed Systems
  • Theory of Algorithms and Complexity
  • Scheduling
  • Internetworking
  • Programming Languages
  • Distributed Systems
  • Parallel Programming
Write a Comment
User Comments (0)
About PowerShow.com