Title: Research at the Computer Engineering Laboratory of Delft University of Technology
1Research at theComputer Engineering Laboratory
ofDelft University of Technology
2Outline
- General Information
- Group Location
- Group Formation
- Group Funding
- Group Interests
- Group Projects
- Molen
- ?-Iliad
- MOVE
- Pamela
- PUB library
- Concluding Remarks
3Group Location
7 faculties 13,000 student 2,100 researchers
- Delft University of Technology
- Aerospace Engineering
- Applied Sciences
- Architecture
- Civil Engineering and Geosciences
- Design, Engineering and Production
- Information Technology and Systems
- Technology, Policy and Management
- Computer Science
- Electrical Engineering
- Mathematics
- Telecommunication
- Software Engineering
- Microelectronics
- Energy
- Mediamatica
- Mathematical Analysis
- Control, Risk, Optimization, Stochastics, and
Systems
4Group Formation
5Group Funding 94-98 (in Kfl)
Total financing 6000 Kfl
6Group Output (94-98)
- Degrees
- PhD Theses........................................
................................. 9 - Eng. degrees......................................
.................................. 5 - MSc...............................................
....................................... 87 - Publications
- Books/Chapters....................................
................................ 7 - Journal articles..................................
................................... 47 - Conference papers.................................
.............................. 165 - Patents...........................................
...................................... 50 - Five start-ups
7Computer Engineering
- Computer Engineering Analysis of data processing
requirements for electronic data processing units
and systems and the design (synthesis) of their
architecture, implementation, and realization - Architecture Determine the function to perform
- Implementation Establish a method to achieve the
function - Realization Use available means to materialize
the method
8Computer Engineering Interests
9Group Projects
MOLEN Embedded system architecture,
multimedia, Java. MOVE Embedded system
synthesis, compilers, hardware software
co-design. PAMELA Performance analysis and
languages. D-ILIAD Computer architecture,
implementation, computer arithmetic, switches.
10MOLENEmbedded System Design
- Topics
- Embedded Processor Architectures
- Multimedia
- Java
- Embedded System Tools
- Embedded Agents
- Current Contributions
- Java Processor
- Multimedia Instructions
- Specialized Units
- FPGA Units
- Future Directions
- Reconfigurable embedded processors
11Molen Multimedia Instructionand Functional Unit
- Published at EUROMICRO98
- Motion estimation, sum of absolute differences
- s 0
- for (j0 jlth j)
- if ((v p10-p20)lt0) v -v s v
- if ((v p11-p21)lt0) v -v s v
- ...
- if ((v p115-p215)lt0) v -v s v
- if (s gt distlim) break
- p1 lx
- p2 lx
-
- Formula
12Efficient Implementation ofthe SAD Operation
- Straightforward approach
- Compute Ai-Bi for all pairs of pixels
- Take absolute values
- Accumulate absolute values
- Cost 4 cycles
- MOLEN solution
- Observation Ai-Bi maxAi,Bi-minAi,Bi
- Problem determine and negate min(Ai-Bi) takes gt
1 cycle - Solution pass min(Ai,Bi) to accumulate stage and
correct - Cost 3 cycles
13MOVE
Semi-automatic generation of application specific
processors
14MOVE
- Current Contributions
- Transport triggered architecture
- Operational design framework (add any unit you
like, no restrictions) - Several cheap designs (data logger,
video-enhancer, MPEG-decoder, wireless
communications) - Future Directions
- Tune your application to suit your processor
- System design
- Multiprocessor TTA
- Low-power processors
15Transport Triggered Architecture
- Published in e.g. Jnl. of Systems Architecture
99 - Transport triggered architecture
- Only one instruction MOVE!
- FU operations are triggered by moving data to
their input ports - Example
- add r1,r2,r3
- sub r4,r2,r6
- st r4,r1
- TTA code
- r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
r6-gtO2sub.alu2 - Radd.alu1-gtr1 Rsub.alu2-gtr4
- r1-gtO1st.ls r4-gtO2st.ls
- After bypassing
- r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
r6-gtO2sub.alu2 - Radd.alu1-gtr1 Rsub.alu2-gtr4
Radd.alu1-gtO1st.ls Rsub.alu2-gtO2st.ls
16PAMELAPerformance Analysis of Computer Systems
- Current Contributions
- Specialized Languages
- Simulation Tools Methodology
- Parallel Algorithms
- Delft Architecture Workbench
- Future Directions
- Complete the Delft Architecture Workbench
17Static Branch Prediction
- Data dependent branches
- for (i0 iltn-1 i)
- minIndex i
- for (ji1 jltn j)
- if (aj lt aminIndex) B
- minIndex j
- swap(ai, aminIndex)
-
- Oblivious static branch predictor B will be
taken 50 - Bernoulli model with truth probability p
(profiling) large variance prediction error - New model based on alternating renewal processes
reduces variance prediction error by order of
magnitude - Let D (U) consecutive number of 0s (1s)
- Then
- Example 110011001100
- Then EPA 0.5, VarPA 0
EPA EU/(EDEU)
VarPA (ED2 VarU EU2 VarD) (EDEU
)2
18D-IliadHigh Performance General Purpose Computers
- Topics
- Uni Multiprocessors
- Internet Processing
- Computer Design
- High Speed Switches
- Current Contributions
- Instruction level parallel machines (Superscalar,
SCISM) - New Complex Instructions
- New Designs of Arithmetic Processing
- New Switch Design
- Future Directions
- New Architectural paradigm
19Complex Streamed Instructions
- See PACT01, EuroPar01
- Drawbacks of MMX-like extensions
- Multimedia (MM) register size architecturally
visible and fixed. Ways out - add MM FUs and increase issue width
- expensive
- increase MM register size
- existing codes have to be recompile/rewritten
- not beneficial due to small sub-matrices
- overhead for converting between packed data types
and alignment - Proposed solution Complex Streamed Instructions
(CSI) - two-dimensional vector (stream) architecture,
streams of arbitrary length - stream is specified by set of stream control
registers - conversion between data types in hardware
- no loop control and address generation overhead
20The Need for a Parallel Computation Model
- Parallel computing has not been very successful
- One reason lack of a standard parallel
computation model - Properties that a suitable parallel computation
model should possess - Scalability
- Portability
- Predictability
- Model proposed by Valiant (1990)
- Bulk-Synchronous Parallel (BSP) model
21BSP Model
- BSP architectural model
- set of p processors communicating by sending
point-to-point messages - BSP programming model
- computations proceed in phases (supersteps),
separated by barrier synchronizations - BSP cost model
- superstep takes time
- w g h L
- where
- w max. work
- h max. messages (h-relation)
- g bandwidth reciprocal
- L latency/synchronization cost
M
M
P
P
communication network
P
P
M
M
barrier sync
barrier sync
22PUB Library
- Paderborn University BSP (PUB) library (IPDPS99)
basics - SPMD
- no receive operation barrier synchronization
signifies end of all communication operations - only non-blocking communication primitives
- buffered and unbuffered communication
- message is placed in buffer associated with
destination processor from which it can be
retrieved after the next barrier sync - Additional features
- (non-blocking) collective communication
primitives - ability to partition the processors
- running different BSP computations on the same
system (in different threads)
23PUB ExampleParallel Binary Multisearch
Proc 0
Proc 1
Proc 2
Proc 3
Local search tree
24Parallel Binary Multisearch Using PUB
void bin_search(int d, int m) for (inew_m0
iltm i) if (queryiltgkeydinRight(d,m
e) queryigtgkeydinLeft(d,me))
bsp_send(bsp,Opposite(d,me),queryi,sizeo
f(int)) else querynew_m
queryi bsp_sync(bsp) for
(i0iltbsp_nmsgs(bsp)i) msg
bsp_getmsg(bsp,i) querynew_m
(int)(bspmsg_data(msg)) if (d0)
local_search(new_m,query,n,key) else
bin_search(d-1,new_m)
25Concluding Remarks
- Not discussed
- testing
- ISA extensions for sparse matrix computations
- computer arithmetic using single-electron
technology - reconfigurable processors
- network processors
- low power
- ...
- For further information, please contact me
(benj_at_ce.et.tudelft.nl) or see - ce.et.tudelft.nl
- ce.et.tudelft.nl/benj
- www.upb.de/pub
Thank You