Models of Parallel Processing - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Models of Parallel Processing

Description:

In a SIMD computer, each processor can execute or ignore the instruction being ... A degree-4 chordal ring with skip distance s; i.e., a p-node ring in which ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: spe9
Category:

less

Transcript and Presenter's Notes

Title: Models of Parallel Processing


1
Models of Parallel Processing
by Shietung Peng
2
SIMD Parallel Computers
  • In a SIMD computer, each processor can execute or
    ignore the instruction being broadcast based on
    its local state or data-dependent conditions.
    However, this leads to some inefficiency in
    executing conditional computations. A possible
    cure is to use the asynchronous version of SIMD,
    known as SPMD.
  • A SIMD computer can be designed based on
    commodity (off-the-shelf) components or with
    custom chips.

3
MIMD Parallel Computers
  • MIMD computers are most effective for medium- to
    coarse-grain parallel applications, where the
    computation is divided into relatively large
    tasks whose executions are assigned to the
    various processors.
  • Within the MIMD, there are three important design
    issues
  • Massively or moderately parallel processors.
  • Tightly or loosely coupled MIMD.
  • Explicitly message passing or virtual shared
    memory.

4
Global Versus Distributed Memory
  • A global-memory multiprocessor is characterized
    by the type and number p of processors, the
    capacity and number m of network modules, and the
    network architecture.
  • Example networks include crossbar, single or
    multiple busses, and multistage interconnection
    networks (MIN).

5
Global Versus Distributed Memory
  • A distributed-memory multi-computer is a
    collection of p processors, each with its own
    private memory, communicates through an
    interconnection network.
  • Distributed-memory MIMD can be interconnected by
    a variety of direct networks. Examples of direct
    networks will be introduced later.

6
The PRAM Shared-memory Model
  • The type of interconnection network used affects
    the way in which efficient algorithms are
    developed. In order to free the programmers from
    such tedious consideration, an abstract model of
    global-memory computers, known as PRAM is
    defined.
  • The abstract PRAM model can be SIMD or MIMD.

7
The PRAM Shared-memory Model
  • The PRAM model is highly theoretical. If one were
    to build a physical PRAM, the processor-to-memory
    connectivity would have to be realized by an
    interconnection network.
  • The figure below shows PRAM with some hardware
    details.

8
The Graph (Distributed-memory) Model
  • The networks used by distributed-memory computers
    is usually represented as a graph.
  • The important parameters of an interconnection
    network include diameter, bisection bandwidth,
    and node degree.
  • Network diameter the longest of the shortest
    paths between various pairs of nodes, which
    should be relatively small if network latency is
    to be minimized.
  • Bisection bandwidth the smallest number of links
    that need to be cut in order to divided the
    network into two sub-networks of half the size.
  • Node degree the number of communication ports
    required of each node, which should be a constant
    if the architecture is to be scalable to larger
    sizes.

9
The Sea of Interconnection Networks
10
Topological Parameters
11
Associative Memory (AM) Model
  • A bit-serial associative memory is capable of
    searching, in one memory access cycle, a single
    bit slice of all active memory words for 0 or 1
    and provides the number of responding words in
    the form of an unsigned integer. For example, the
    instruction search(0,i) will yield the number of
    active memory words that store value 0 in bit
    position i. It also has instructions for
    activating or deactivating memory words based on
    the results of the latest search.

12
Scalable Parallel Computer Architectures
  • Key characteristics of scalable parallel
    computers

13
Pitfalls of Scaling up
14
The Cluster Computer Architecture
  • Cluster computer architecture

15
Hierarchical-bus architectures
  • A variety of hierarchical-bus architectures are
    available for reducing bus traffic by taking
    advantage of the locality of communication within
    small clusters of processors.
  • An example of hierarchical interconnection
    network

16
Abstract Models for Distributed-memory MIMD
  • The development of efficient algorithms suffers
    from the proliferation of available
    interconnection networks, for algorithm design
    must be done virtually from scratch for each new
    architecture.
  • It would be nice if we could abstract away the
    effects of the interconnection topology in order
    to free the algorithm designer from a lot of
    machine-specific details.
  • The idea is to replace the topological
    information with a small number of parameters
    that capture the effect of interconnection
    topology highly accurately.

17
The LogP Model
  • In LogP model, the communication architecture of
    a parallel computer is captured in four
    parameters
  • L Latency upper bound when a small message is
    sent from an arbitrary source node to an
    arbitrary destination node.
  • o The overhead defined as the length of time
    when a processor is dedicated to the transmission
    or reception of a message.
  • g The gap defined as the minimum time that must
    elapse between consecutive message transmissions
    or receptions by a single processor.
  • p Processor multiplicity.

18
Exercise 3
  • Associative processing
  • Devise an AM algorithm to find the largest number
    among the m unsigned integers in the memory.
  • Devise an AM algorithm to find the kth largest
    number among the m unsigned integers in the
    memory.
  • Extend the above algorithm to deal with signed
    integers in 2s- complement format (the sign bit
    carries a negative weight so that 1010 represents
    -8 2 -6).

19
Exercise 3
  • Topological parameters add entries of the
    following topologies in Table shown in the
    lecture note.
  • An X-tree a complete binary tree with nodes on
    the same level connected as a linear array.
  • A hierarchical bus architecture with a maximum
    branching factor b.
  • A degree-4 chordal ring with skip distance s
    i.e., a p-node ring in which processor i is also
    connected to processor is mod p and i-s mod p.

20
Exercise 3
  • Consider the hierarchical multilevel bus
    architecture with four processors in each of the
    low-level clusters. Consider the shear-sort
    algorithm and assume that each transfer over a
    shared bus to another processor or to a switch
    node takes unit time.
  • How long does this system take to emulate
    shear-sort on a 4-by-6 mesh if each processor
    holds a single data item and each cluster emulate
    a column of the mesh?
  • How long does this system take to emulate
    shear-sort on a 6-by-4 mesh?
  • Devise an algorithm for performing a parallel
    prefix computation on this architecture.
Write a Comment
User Comments (0)
About PowerShow.com