Architecture of a Massively Parallel Processor - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Architecture of a Massively Parallel Processor

Description:

Yao Wu. April 25, 2003. Kenneth E. Batcher. OUTLINE. Background. Data-level ... Parallel Processor to the Smithsonian Collection in a ceremony held in Maryland. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 27
Provided by: yingch8
Category:

less

Transcript and Presenter's Notes

Title: Architecture of a Massively Parallel Processor


1
Architecture of a Massively Parallel Processor
  • Kenneth E. Batcher
  • 1980
  • presented by

  • Yao Wu
  • April 25, 2003

2
Kenneth E. Batcher
3
OUTLINE
  • Background
  • Data-level parallelism ? SIMD design
  • Architecture of MPP
  • ARU, ACU, PDMU and staging memory
  • Performance Conclusion

4
Design Goal
  • Application domain?
  • Image processing
  • --- data level parallelism
  • The expected workload?
  • between 109 and 1010 operations per second.
  • --- Very fast (massive parallelism)
  • Cost?
  • --- Special-purpose machine

5
Data Level Parallelism
  • Each task performs the same series
    calculations, but applies them to different data.
  • B(I) A(I) 4
  • ?LOAD A(I)
  • MULT 4
  • STORE B(I)

6
Data Parallelism Execution
  • time P1 P2 P3

7
SIMD Architecture
8
Advantage vs. Disadvantage of SIMD
  • Advantages
  • Simplicity of concept and programming
  • SIMD architectures are deterministic
  • Scalability of size and performance
  • No explicit synchronization is required
  • Disadvantages
  • Lack of applicability to a wide variety of
    problems
  • Places enormous demand on processor-memory
    interconnection bandwidth

9
Massively Parallel Processor
Designed by Goodyear Aerospace Corp. in 1983
Target performance 109 to 1010 operations per
second to process an average of 1013 bits per day.
  • On October 29, 1996, NASA officially handed over
    the worlds first Massively Parallel Processor to
    the Smithsonian Collection in a ceremony held in
    Maryland.
  • Retired in March 1991 after 8 years of service to
    the NASA scientific community

10
Block Diagram of MPP
11
Array Unit (ARU)
  • 2D processing problem ? 2D planes rather than as
    a number of words or bytes
  • Logically, 16,384 Processing elements (PEs)
    organized in 128 x 128 square
  • Redundant rectangle of 128 x 4 PEs for fault
    recovery
  • Each PE is bit-serial to handle operands of any
    length
  • PEs are connected in a 2D mesh where each PE
    communicates with its four neighbors up, down,
    left, and right

12
ARU figure
13
Processing Element (PE)
14
  • A-plane P-plane C-plane B-plane



15
ARU (S-plane)
  • Handles data input output for the ARU
  • On input
  • On output
  • Handle input and output simultaneously

16
ARU (Memory Plane)
  • The capacity is 16,777,216 data bits (over 2MB)
  • A memory plane of 16,384 bits can be randomly
    accessed and transferred in one machine cycle
  • Bit-serial Processing

17
ARU (Processing Plane)
  • There are 35 processing planes in the ARU
  • 30 processing plane are in a planar shift
    register.
  • P-Plane (logic and routing operations)
  • G-Plane (mask operation)
  • A-Plane
  • B-Plane
  • C-Plane
  • Sum-or
  • full-add operation

18
An example of G-plane (mask)
  • Clear all negative items to 0.
  • sign plane G-plane result
  • masked-clear


19
Array Control Unit (ACU)
  • Controls operations in the ARU
  • Performs scalar arithmetic
  • Three independent control units
  • Processing Element Control Unit (PECU)
  • Controls operations in the processing planes of
    the ARU
  • I/O Control Unit (IOCU)
  • Controls S-plane operations in the ARU
  • Main Control Unit (MCU)
  • Executes the main application program of MPP
  • Performs scalar processing

20
ACU figure
21
Program and Data Management Unit (PDMU)
  • Controls the overall flow of program and data in
    the system
  • PDMU is a minicomputer (DEC PDP-11) with custom
    interface to ACU and ARU

22
Staging Memory
  • Transfers data between PDMU and ARU
  • Reorders array of data
  • Pixel format to bit-serial format
  • Reordering via common 219 bit multidimensional-acc
    ess memory (MDA)

23
Speed of typical operations
24
Conclusion
  • The MPP is a ultra high speed SIMD processor
    designed to process 2D image data
  • It is fully programmable
  • Lack of applicability to a wide variety of
    problems
  • We never found any other customers for the MPP
    even though it was one of the fastest machines
    available at that time.

25
References
  • The Massively Parallel Processor J. L. Potter,
    ed. The MIT Press, 1985
  • 25 Years of the International Symposia on
    Computer Architecture, selected papers, Gurindar
    Sohi, ed. 1998
  • Computer Architecture A Quantitative Approach

26
  • Thank you !
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com