Title: Parallel computer architecture overview
1Parallel computer architecture overview
2- Parallel computers definition A collection of
processing elements that cooperate to solve
large problems fast. - Some broad issues that distinguish parallel
computers - Resources
- how large a collection?
- how powerful are the elements?
- how much memory?
- Data access, communication and synchronization
- how do the elements cooperate and communicate?
- how are data transmitted between processors?
- what are the abstractions and primitives for
cooperation? - Performance and scalability
- how does it all translate into performance?
- how does it scale?
3 Trend in parallel computer architecture
development
- History diverse and innovative organizational
structures, often tied to novel programming
models - The architecture is often built around one or two
good ideas in software or hardware. - Rapidly matured under strong technological
constraints - The microprocessor is ubiquitous
- Laptops and supercomputers are fundamentally
similar! - Technological trends cause diverse approaches to
converge - Technological trends make parallel computing
inevitable - Mainstream computing
- Need to understand fundamental principles and
design tradeoffs, not just taxonomies
4Technology trend
- Figure from Pattersons parallel architectures
book (1999) - The performance of micro-processors is catching
up with that of supercomputers.
5- In terms of performance improvement, nothing
beats micro-processors. - To maintain the improvement, more and more
supercomputer features are built in
micro-processors. - Use commodity micro-processors to build
everything (if you cant beat them, join them). - Mainframes and minicomputers pretty much
disappear in todays world, replaced by server
farms (clusters of servers). - Virtualization on clusters.
- Many supercomputers are clusters of
servers/workstations (see www.top500.org).
6Parallel architectures
- Shared memory architectures
- Distributed memory architectures
- Hybrid
7Shared memory architectures
- All processors access all memory as global
address space - Changes made by one processor are visible by
other processors - Two types based on the differences in memory
access speed - Uniform memory access (UMA)
- Non-uniform memory access (NUMA)
8UMA Shared memory architecture (mostly bus-based
MPs)
- Micro on a chip makes it natural to connect many
to shared memory - dominates server and enterprise market, moving
down to desktop - Faster processors began to saturate bus, then
bus technology advanced - today, range of sizes for bus-based systems,
desktop to large servers (Symmetric
Multiprocessor (SMP) machines).
9Bus bandwidth in Intel systems
10NUMA Shared memory architecture
- Identical processors, processors have different
time for accessing different part of the memory. - Often made by physically linking SMP machines
(Origin 2000, up to 512 processors). - The next generation SMP interconnects (Intel
Common System interface (CSI) and AMD
hypertransport) have this flavor, but the
processors are close to each other.
11Cache coherence issue in shared memory
architecture
- Cache coherence
- There are multiple versions of data (memory copy,
and cache copies). - How to maintain a consistent system view?
- Need some mechanism to make the memory system
appear coherent. - Cache coherence protocols.
12Shared memory architecture advantages and
disadvantages
- Advantages
- Globally shared memory provides user-friendly
programming perspective to programmers. - Disadvantage
- Lack of scalability
- No hope for UMA
- What about NUMA
- A lot of small traffic through the interconnect
- adding processors changes the traffic requirement
of the Interconnect. - Writing correct shared memory parallel programs
is not straight forward.
13Distributed memory architectures
- Processors have their own local memory. Memory
addresses in one processor do not map to another
processor. - no concept of global address space.
- No concept of cache coherency.
- To access data in another processor, use explicit
communication.
14Distributed memory architectures
- The networks can be very different for
distributed memory architectures - Massively parallel processors (MPP) usually use
a specially designed network (and node). - IBM Bluegene, IBM SP series
- Clusters usually use commodity system/local area
networks Infiniband, Quadrics, Myrinet, 10 Gbps
Ethernet. - Lemieux at PSC uses Quadrics
- Ranger (NO. 2 top supercomputer) at TACC uses
Infiniband - UC-TG at Argonne uses Myrinet
- The raw speed of the network matches that of the
specially designed network. - May not provide some customized support such as
reduction network. - Grid computers use the Internet as the networks.
15Distributed memory architectures
- MPP, clusters and grid computers targets
different types of applications - MPP and clusters support tightly coupled
applications (large amount of interactions among
processes). - Communicate every 1 microsecond.
- Grid computers can only support coarse-grain
parallel applications or embarrassingly parallel
applications. - Communicate every second.
16Advantages and disadvantages
- Advantages
- Memory is scalable with number of processors.
Increase the number of processors and the size of
memory increases proportionately. - Each processor can rapidly access its own memory
without interference and without the overhead
incurred with trying to maintain cache coherency.
- Cost effectiveness can use commodity,
off-the-shelf processors and networking - Disadvantages
- The programmer is responsible for the details
associated with data communication. - It may be difficult to map existing data
structures, based on global memory, to this
memory organization.
17Converging of the distributed and shared memory
architectures
- The contemporary distributed and shared memory
architectures are converging. - Nodal architectures have always been similar.
- Both requires high bandwidth and low latency
interconnect. - The hardware for these two types of machines
becomes very similar.
18Hybrid distributed memory systems
- SMP-CMP clusters are the current
price/performance sweet spot. - The architecture will dominate for the
foreseeable future. - Two-level hierarchy
- How to best use this type of architecture is
still under heavy investigation.