The Future of Parallel Computing - PowerPoint PPT Presentation

About This Presentation
Title:

The Future of Parallel Computing

Description:

SA ISA PIPS RM OH Special Purpose Mesh Architectures C P R A Heiko Schr der, 1998 Contents Why meshes ??? Application specific parallel mesh architectures Physical ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 101
Provided by: eePdxEdu
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: The Future of Parallel Computing


1
The Future of Parallel Computing
SA ISA PIPS RM OH
Special Purpose Mesh Architectures
Heiko Schröder, 1998
2
Contents
  • Why meshes ???
  • Application specific parallel mesh architectures

-Systolic Arrays -Instruction Systolic
Arrays -PIPS -Reconfigurable mesh -Optical Highway
3
Physical limits
  • OPS -- 0.3 mm/OP
  • 1000 PEs with OPS --30cm/OP
  • massive parallelism
  • distributed memory

c300 000 km/sec
4
Processor power
5
  • Scaling
  • Faktor 2
  • 1/2 width
  • 1/2 hight
  • 1/2 switching time

0,5 µ
8 x performance!
0,25 µ
6
CMOS transistors
10m
Size of minimal transistor
1m
0,1m
ca. 0,03m
0,01m
1960
1970
1980
1990
2000
2010
2020
2030
7
Mesh/Torus
8
Hypercube
9
VLSI
  • Very
  • Large
  • Scale
  • Integration
  • simple cells
  • few types
  • regular architecture
  • short connections
  • mesh -- torus

10
Pin limitations
11
Bisection width
12
Programming
  • SA --- Systolic Array
  • SIMD --- Single Instruction Multiple Data
  • ISA --- Instruction Systolic Array
  • MIMD --- Multiple Instruction Multiple Data

13
parallel merge
  • initial situation
  • 1.) sort columns
  • (odd-even-transposition sort)
  • 2.) sort rows
  • (odd-even-transposition sort)
  • sorted !!!!

x1
x2 x3 x4 x5 x6
...
x7
...
x17 x18
y1 y2 y3 y4 y5 y6
...
y7
...
y17 y18
14
0-1 principle
  • The 0-1 principle states that if all sequences of
    0 and 1 are sorted properly than this is a
    correct sorter.
  • The sorter must be based on moving data.

15
MIMD-mesh (clocked)

min
max
Time 2n
16
systolic merge
17
systolic merge
18
systolic merge
19
systolic merge
20
systolic merge
21
systolic merge
22
systolic merge
23
systolic merge
24
systolic merge
25
systolic merge
26
systolic merge
27
systolic merge
28
systolic merge
29
systolic merge
30
systolic merge
31
systolic merge
32
systolic merge
33
systolic merge
34
systolic merge
35
systolic merge
36
systolic merge
  • sorted !!!

37
Characteristics of SAs
Extremely high cost-performance no flexibility --
long development time
Suitable for special signal processing tasks ???
38
Systolic architectures I
39
Systolic architectures II
40
ISA merge
41
ISA merge
42
ISA merge
43
ISA merge
44
ISA merge
45
ISA merge
46
ISA merge
47
ISA merge
48
ISA merge
49
ISA merge
50
ISA merge
51
ISA merge
52
ISA merge
53
ISA merge
54
ISA merge
55
ISA merge
56
ISA merge
57
ISA merge
58
ISA merge
59
ISA merge
60
ISA merge
61
ISA merge
62
ISA merge
63
ISA merge
64
ISA merge
65
ISA merge
66
ISA merge
67
ISA merge
68
ISA merge
69
ISA merge
70
ISA merge
71
Hough transform on the ISA
  • good line detection method

Fast tomography
72
robot vision
  • stereo vision

projector
CCD
CCD
73
Use of the ISA
Special features fast aggregate functions (sum,
carry) fast local communication no local
memory typical improvement over PC Factor 20-30
  • Areas of application for ISA
  • automatic optical quality control
  • real time signal processing
  • computer graphics /visualization
  • linear equations
  • Cryptography --gt Tele-medicine ?

74
Instruction Systolic Array
75
PIPS (1990-94)
32x32 torus 16 bit parallel communication 16 bit
add prefetch
1 M bit
1 M bit
memory control
BHP -- CSIRO -- NU -- ADFA 1.4 M
76
Special features local memory SIMD-torus memory
pre-fetch Applications visualization 3D-simulati
on (CFD, FEM)
77
(No Transcript)
78
PIPS
79
Use in industry ?
Performance Gflops
3675
Research
3000
2500
2121
2000
1500
1327
1168
1000
Industry
648
500
693
248
126
1993
1994
1995
1996
80
Investments
Investments into parallel computers M
3500
3000
2500
2000
Research
1500
Industry
1000
500
0
1993
1994
1995
1996
81
Concentration
Number of manufacturers
60
50
49
40
30
21
19
20
11
10
1993
1994
1995
1996
82
Degree of Parallelism
Number of new Systems
450
400
350
300
1 to 63
250
64 to 255
200
256 to 1023
150
1024 and more
100
50
0
Nov-93
Nov-94
Nov-95
Nov-96
Nov-97
May-93
May-94
May-95
May-96
May-97
83
Evaluation
Cost computation time
  • Parallel computers with standard components
  • Imbedded parallel systems

84
reconfigurable mesh
reconfigurable mesh mesh interior connections
low cost
15 positions
85
global OR and modulo 3
log n on EREW-PRAM
log n / log log n on CRCW-PRAM
86
sorting with all-to-all mapping
Sorting sort blocks all-to-all (columns) sort
blocks all-to-all (rows) o-e-sort blocks
87
all-to-all mapping
n x n
88
vertical all-to-all
89
horizontal all-to-all
90
1 step
(k/2)2 steps
2 steps
3 steps
3 steps
2 steps
1 step
91
sorting in optimal time
  • (k/2)2 steps
  • kn1/3
  • each step takes n1/3 time
  • --gt T n/4

Sorting sort blocks (O(n2/3)) all-to-all
(n/2) sort blocks (O(n2/3)) all-to-all (n/2) sort
blocks (O(n2/3)) time n o(n)
92
Reconfigurable mesh
Special features SIMD constant diameter faster
than PRAM ? Suitable applications routing/sorting/
load balancing sparse matrix multiplication segmen
tation / component labeling feature
extraction image database ?
93
Reconfigurable mesh
94
Optical Highway
All-to-all connection
W1 P100 W100 P22
95
(No Transcript)
96
Features of optically connected
meshes SIMD/SPMD/MIMD implement all major
architectures all-to-all communication in 2
steps Bulk synchronous processing (BSP) no
latency hiding no pin-limitation Applications coar
se grain parallel computing only? ray-tracing
? ???
97
Optical Highway
1. H. Schröder et al, RMB --- A Reconfigurable
Multiple Bus Network, HPCA 96, San Jose,
1996 2. H. Schröder, O. Sykora, I. Vrto, Optical
All-to-All Communication for some Product
Graphs, SOFSEM '97, Milovy, Czech Republic,
1997
98
Bisection-width / Diameter
99
Suitable problems ?
diameter log n bisection width n
SA suitable applications?
SA ISA PIPS
ISA 2D-problems, aggregate functions
local communication
PIPS 3D-problems, local communication
RM
RM diameter-bound gt bisection-width-bound
OH
OH PRAM equivalent?
100
?
?
?
?
Write a Comment
User Comments (0)
About PowerShow.com