Research at the Computer Engineering Laboratory of Delft University of Technology - PowerPoint PPT Presentation

About This Presentation

Title:

Research at the Computer Engineering Laboratory of Delft University of Technology

Description:

Aerospace Engineering. Applied Sciences. Architecture. Civil Engineering and Geosciences. Design, Engineering and Production ... Computer Engineering ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 26

Provided by: benj4

Learn more at: https://research.ac.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Research at the Computer Engineering Laboratory of Delft University of Technology

1
Research at theComputer Engineering Laboratory
ofDelft University of Technology

Ben Juurlink

2
Outline

General Information
Group Location
Group Formation
Group Funding
Group Interests
Group Projects
Molen
?-Iliad
MOVE
Pamela
PUB library
Concluding Remarks

3
Group Location
7 faculties 13,000 student 2,100 researchers

Delft University of Technology
Aerospace Engineering
Applied Sciences
Architecture
Civil Engineering and Geosciences
Design, Engineering and Production
Information Technology and Systems
Technology, Policy and Management

Computer Science
Electrical Engineering
Mathematics
Telecommunication
Software Engineering
Microelectronics
Energy
Mediamatica
Mathematical Analysis
Control, Risk, Optimization, Stochastics, and
Systems

4
Group Formation
5
Group Funding 94-98 (in Kfl)
Total financing 6000 Kfl
6
Group Output (94-98)

Degrees
PhD Theses........................................
................................. 9
Eng. degrees......................................
.................................. 5
MSc...............................................
....................................... 87
Publications
Books/Chapters....................................
................................ 7
Journal articles..................................
................................... 47
Conference papers.................................
.............................. 165
Patents...........................................
...................................... 50
Five start-ups

7
Computer Engineering

Computer Engineering Analysis of data processing
requirements for electronic data processing units
and systems and the design (synthesis) of their
architecture, implementation, and realization
Architecture Determine the function to perform
Implementation Establish a method to achieve the
function
Realization Use available means to materialize
the method

8
Computer Engineering Interests
9
Group Projects
MOLEN Embedded system architecture,
multimedia, Java. MOVE Embedded system
synthesis, compilers, hardware software
co-design. PAMELA Performance analysis and
languages. D-ILIAD Computer architecture,
implementation, computer arithmetic, switches.
10
MOLENEmbedded System Design

Topics
Embedded Processor Architectures
Multimedia
Java
Embedded System Tools
Embedded Agents
Current Contributions
Java Processor
Multimedia Instructions
Specialized Units
FPGA Units
Future Directions
Reconfigurable embedded processors

11
Molen Multimedia Instructionand Functional Unit

Published at EUROMICRO98
Motion estimation, sum of absolute differences
s 0
for (j0 jlth j)
if ((v p10-p20)lt0) v -v s v
if ((v p11-p21)lt0) v -v s v
...
if ((v p115-p215)lt0) v -v s v
if (s gt distlim) break
p1 lx
p2 lx
Formula

12
Efficient Implementation ofthe SAD Operation

Straightforward approach
Compute Ai-Bi for all pairs of pixels
Take absolute values
Accumulate absolute values
Cost 4 cycles
MOLEN solution
Observation Ai-Bi maxAi,Bi-minAi,Bi
Problem determine and negate min(Ai-Bi) takes gt
1 cycle
Solution pass min(Ai,Bi) to accumulate stage and
correct
Cost 3 cycles

13
MOVE
Semi-automatic generation of application specific
processors
14
MOVE

Current Contributions
Transport triggered architecture
Operational design framework (add any unit you
like, no restrictions)
Several cheap designs (data logger,
video-enhancer, MPEG-decoder, wireless
communications)
Future Directions
Tune your application to suit your processor
System design
Multiprocessor TTA
Low-power processors

15
Transport Triggered Architecture

Published in e.g. Jnl. of Systems Architecture
99
Transport triggered architecture
Only one instruction MOVE!
FU operations are triggered by moving data to
their input ports
Example
add r1,r2,r3
sub r4,r2,r6
st r4,r1
TTA code
r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
r6-gtO2sub.alu2
Radd.alu1-gtr1 Rsub.alu2-gtr4
r1-gtO1st.ls r4-gtO2st.ls
After bypassing
r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
r6-gtO2sub.alu2
Radd.alu1-gtr1 Rsub.alu2-gtr4
Radd.alu1-gtO1st.ls Rsub.alu2-gtO2st.ls

16
PAMELAPerformance Analysis of Computer Systems

Current Contributions
Specialized Languages
Simulation Tools Methodology
Parallel Algorithms
Delft Architecture Workbench
Future Directions
Complete the Delft Architecture Workbench

17
Static Branch Prediction

Data dependent branches
for (i0 iltn-1 i)
minIndex i
for (ji1 jltn j)
if (aj lt aminIndex) B
minIndex j
swap(ai, aminIndex)
Oblivious static branch predictor B will be
taken 50
Bernoulli model with truth probability p
(profiling) large variance prediction error
New model based on alternating renewal processes
reduces variance prediction error by order of
magnitude
Let D (U) consecutive number of 0s (1s)
Then
Example 110011001100
Then EPA 0.5, VarPA 0

EPA EU/(EDEU)
VarPA (ED2 VarU EU2 VarD) (EDEU
)2
18
D-IliadHigh Performance General Purpose Computers

Topics
Uni Multiprocessors
Internet Processing
Computer Design
High Speed Switches
Current Contributions
Instruction level parallel machines (Superscalar,
SCISM)
New Complex Instructions
New Designs of Arithmetic Processing
New Switch Design
Future Directions
New Architectural paradigm

19
Complex Streamed Instructions

See PACT01, EuroPar01
Drawbacks of MMX-like extensions
Multimedia (MM) register size architecturally
visible and fixed. Ways out
add MM FUs and increase issue width
expensive
increase MM register size
existing codes have to be recompile/rewritten
not beneficial due to small sub-matrices
overhead for converting between packed data types
and alignment
Proposed solution Complex Streamed Instructions
(CSI)
two-dimensional vector (stream) architecture,
streams of arbitrary length
stream is specified by set of stream control
registers
conversion between data types in hardware
no loop control and address generation overhead

20
The Need for a Parallel Computation Model

Parallel computing has not been very successful
One reason lack of a standard parallel
computation model
Properties that a suitable parallel computation
model should possess
Scalability
Portability
Predictability
Model proposed by Valiant (1990)
Bulk-Synchronous Parallel (BSP) model

21
BSP Model

BSP architectural model
set of p processors communicating by sending
point-to-point messages
BSP programming model
computations proceed in phases (supersteps),
separated by barrier synchronizations
BSP cost model
superstep takes time
w g h L
where
w max. work
h max. messages (h-relation)
g bandwidth reciprocal
L latency/synchronization cost

M
M
P
P
communication network
P
P
M
M
barrier sync
barrier sync
22
PUB Library

Paderborn University BSP (PUB) library (IPDPS99)
basics
SPMD
no receive operation barrier synchronization
signifies end of all communication operations
only non-blocking communication primitives
buffered and unbuffered communication
message is placed in buffer associated with
destination processor from which it can be
retrieved after the next barrier sync
Additional features
(non-blocking) collective communication
primitives
ability to partition the processors
running different BSP computations on the same
system (in different threads)

23
PUB ExampleParallel Binary Multisearch

Search butterfly

Proc 0
Proc 1
Proc 2
Proc 3
Local search tree
24
Parallel Binary Multisearch Using PUB
void bin_search(int d, int m) for (inew_m0
iltm i) if (queryiltgkeydinRight(d,m
e) queryigtgkeydinLeft(d,me))
bsp_send(bsp,Opposite(d,me),queryi,sizeo
f(int)) else querynew_m
queryi bsp_sync(bsp) for
(i0iltbsp_nmsgs(bsp)i) msg
bsp_getmsg(bsp,i) querynew_m
(int)(bspmsg_data(msg)) if (d0)
local_search(new_m,query,n,key) else
bin_search(d-1,new_m)
25
Concluding Remarks