greg astfalk - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

greg astfalk

Description:

this is not a talk about hewlett-packard's product offering(s) ... yes, i am being a bit facetious but the idea remains true. parallelism methodologies ... – PowerPoint PPT presentation

Number of Views:272

Avg rating:3.0/5.0

Slides: 38

Provided by: johnst99

Category:

more less

Transcript and Presenter's Notes

Title: greg astfalk

1
(No Transcript)
2
high-end computing technology where is it
heading?
greg astfalk woon yung chung woon-yung_chung_at_hp.c
om
3
prologue

this is not a talk about hewlett-packards
product offering(s)
the context is hpc (high performance computing)
somewhat biased to scientific computing
also applies to commercial computing

4
backdrop

end-users of hpc systems have needs and wants
from hpc systems
the computer industry delivers the hpc systems
there exists a gap between the two wrt
programming
processors
architectures
interconnects/storage
in this talk we (weakly) quantify the gaps in
these 4 areas

5
end-users programming wants

end-users of hpc machines would ideally like to
think and code sequentially
have a compiler and run-time system that produces
portable and (nearly) optimal parallel code
regardless of processor count
regardless of architecture type
yes, i am being a bit facetious but the idea
remains true

6
parallelism methodologies

there exists 5 methodologies to achieve
parallelism
automatic parallelization via compilers
explicit threading
pthreads
message-passing
mpi
pragma/directive
openmp
explicitly parallel languages
upc, et al.

7
parallel programming

parallel programming is a cerebral effort
if lots of neurons plus mpi constitutes
prime-time then parallel programming has
arrived
no major technologies on the horizon to change
this status quo

8
discontinuities

the ease of parallel programming has not
progressed at the same rate that parallel systems
have become available
performance gains require compiler optimization
or pbo
most parallelism requires hand-coding
in the real-world many users dont use any
compiler optimizations

9
parallel efficiency

mindful that the bounds on parallel efficiency
are, in general, far apart
50 efficiency on 32 processors is good
10 efficiency on ?(100) processors is excellent
gt2 efficiency on ?(1000) processors is heroic
a little communication can knee over the
efficiency vs. processor count curve

10
apps with sufficient parallelism

few existing applications can utilize ?(1000), or
even ?(100), processors with any reasonable
degree of efficiency
to date have generally required heroic effort
new algorithms (i.e., data and control
decompositions) or nearly complete are necessary
such large-scale parallelism will have arrived
when msc/nastran and oracle exist on such systems
and utilize the processors

11
latency tolerant algorithms

latency tolerance will be a increasingly
important theme for the future
hardware will not solve this problem
more on this point later
developing algorithms that have significant
latency tolerance will be necessary
this means thinking outside the box about the
algorithms
simple modifications to existing algorithms
generally wont suffice

12
operating systems

development environments will move to nt
heavy-lifting will remain with unix
four unixs to survive (alphabetically)
hp-ux
linux
aix 5l
solaris
linux will be important at the lower-end but will
not significantly encroach on the high-end

13
end-users proc/arch wants

all things being equal high-end users would
likely want a classic cray vector supercomputer
no caches
multiple pipes to memory
single word access
hardware support for gather/scatter
etc.
it is true however that for some applications
contemporary risc processors perform better

14
processors

the processor of choice is now, and will be,
for some time to come the risc processor
risc processors have caches
caches are good
caches are bad
if your code fits in cache, you arent
supercomputing! ?

15
risc processor performance

a rule of thumb is that a risc processor, any
risc processor, gets on average, on a sustained
basis,
10 of its peak performance
the 3? on this is large
achieved performance varies with
architecture
application
algorithm
coding
dataset size
anything else you can think of

16
semiconductor processes

semiconductor processes change every 2-3 years
assuming that technology scaling applies to
subsequent generations then per generation
frequency increase of 40
transistor density increase of 100
energy per transition decrease of 60

17
semiconductor processes

18
what to do with gates

it is not a simple question of what the best use
of the gates is
larger caches
multiple cores
specialized functional units
etc.
the impact of soft errors with decreasing design
rule size will be a important topic
what happens if a alpha particles flips a bit in
a register?

19
processor futures

you can expect, for the short term, moores law
like gains in processors peak performance
doubling of performance every 18-24 months
does not necessarily apply to application
performance
moores law will not last forever
4-5 more turns (maybe?)

20
processor evolution
next generation
performance
IA-64
EPIC

Superscalar risc

2 instructions/cycle

RISC
1 micron - gt .5 micron --gt .35 micron --gt .25
micron --gt .18 micron --gt .13 micron
lt
1 instruction/cycle
CISC
20-30 increase per year due to advances in
underlying semiconductor technology
.3 ins/cycle
time
hp confidential
European analysts briefing, london. September 5,
2000
21
customer spending (m)
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
idc, february 2000

technology disruptions
risc crossed over cisc in 1996
itanium will cross over risc in 2004

22
present high-end architectures

todays high-end architecture is either
smp
ccnuma
cluster of smp nodes
cluster of ccnuma nodes
japanese vector system
all of these architectures work
efficiency varies with application type

23
architectural issues

of the choices available the smp is preferred,
however
smp processor count is limited
cost of scalability is prohibitive
ccnuma addresses these limitations but induces
its own
disparate latencies
better, but still limited, scalability
ras limitations
clusters too have pros and cons
huge latencies
low cost
etc.

24
physics

limitations imposed by physics have led us to
architectures that have a deep memory hierarchy
the algorithmist and programmer must deal with,
and exploit, the hierarchy to achieve good
performance
this is part of the cerebral effort of parallel
programming we mentioned earlier

25
memory hierarchy

typical latencies for todays technology

26
balanced system ratios

a ideal high-end system should be balanced wrt
its performance metrics
for each peak flop/second
0.51 byte of physical memory
10100 byte of disk capacity
416 byte/sec of cache bandwidth
13 byte/sec of memory bandwidth
0.11 bit/sec of interconnect bandwidth
0.020.2 byte/sec of disk bandwidth

27
balanced system

applying the balanced system ratios to a unnamed
contemporary 16 processor smp

28
storage

data volumes are growing at a extremely rapid
pace
disk capacity sold doubled from 1997 to 1998
storage is a increasingly large percent of the
total server sale
disk technology is advancing too slowly
per generation, of 1-1.5 years
access time decreases 10
spindle bandwidth increases 30
capacity increases 50

29
networks

only the standards will be widely deployed
gigabit ethernet
gigabyte ethernet
fibre channel (2x and 10x later)
sio
atm
dwdm backbones
the last mile problem remains with us
inter-system interconnect for clustering will not
keep pace with the demands (for latency and
bandwidth)

30
vendors constraints

rule 1 be profitable to return value to the
shareholders
you dont control the market size
you can only spend 10 of your revenue on rd
dont fab your own silicon (hopefully)
you must be more than just a technical
computing company
to not do this is to fail to meet rule 1 (see
above)

31
market sizes

according to the industry analysts the technical
market is, depending on where you draw the
cut-line, 4-5 billion annually
the bulk of the market is small-ish systems (data
from forest baskett at sgi)

32
a perspective

commercial computing is not a enemy
without the commercial markets revenue our
ability to build hpc-like systems would be
limited
the commercial market benefits from the
technology innovation in the hpc market
is performance left on the table in designing a
system to serve both the commercial and technical
markets
yes

33
why?