Research at the Computer Engineering Laboratory of Delft University of Technology - PowerPoint PPT Presentation

About This Presentation
Title:

Research at the Computer Engineering Laboratory of Delft University of Technology

Description:

Aerospace Engineering. Applied Sciences. Architecture. Civil Engineering and Geosciences. Design, Engineering and Production ... Computer Engineering ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 26
Provided by: benj4
Category:

less

Transcript and Presenter's Notes

Title: Research at the Computer Engineering Laboratory of Delft University of Technology


1
Research at theComputer Engineering Laboratory
ofDelft University of Technology
  • Ben Juurlink

2
Outline
  • General Information
  • Group Location
  • Group Formation
  • Group Funding
  • Group Interests
  • Group Projects
  • Molen
  • ?-Iliad
  • MOVE
  • Pamela
  • PUB library
  • Concluding Remarks

3
Group Location
7 faculties 13,000 student 2,100 researchers
  • Delft University of Technology
  • Aerospace Engineering
  • Applied Sciences
  • Architecture
  • Civil Engineering and Geosciences
  • Design, Engineering and Production
  • Information Technology and Systems
  • Technology, Policy and Management
  • Computer Science
  • Electrical Engineering
  • Mathematics
  • Telecommunication
  • Software Engineering
  • Microelectronics
  • Energy
  • Mediamatica
  • Mathematical Analysis
  • Control, Risk, Optimization, Stochastics, and
    Systems

4
Group Formation
5
Group Funding 94-98 (in Kfl)
Total financing 6000 Kfl
6
Group Output (94-98)
  • Degrees
  • PhD Theses........................................
    ................................. 9
  • Eng. degrees......................................
    .................................. 5
  • MSc...............................................
    ....................................... 87
  • Publications
  • Books/Chapters....................................
    ................................ 7
  • Journal articles..................................
    ................................... 47
  • Conference papers.................................
    .............................. 165
  • Patents...........................................
    ...................................... 50
  • Five start-ups

7
Computer Engineering
  • Computer Engineering Analysis of data processing
    requirements for electronic data processing units
    and systems and the design (synthesis) of their
    architecture, implementation, and realization
  • Architecture Determine the function to perform
  • Implementation Establish a method to achieve the
    function
  • Realization Use available means to materialize
    the method

8
Computer Engineering Interests
9
Group Projects
MOLEN Embedded system architecture,
multimedia, Java. MOVE Embedded system
synthesis, compilers, hardware software
co-design. PAMELA Performance analysis and
languages. D-ILIAD Computer architecture,
implementation, computer arithmetic, switches.
10
MOLENEmbedded System Design
  • Topics
  • Embedded Processor Architectures
  • Multimedia
  • Java
  • Embedded System Tools
  • Embedded Agents
  • Current Contributions
  • Java Processor
  • Multimedia Instructions
  • Specialized Units
  • FPGA Units
  • Future Directions
  • Reconfigurable embedded processors

11
Molen Multimedia Instructionand Functional Unit
  • Published at EUROMICRO98
  • Motion estimation, sum of absolute differences
  • s 0
  • for (j0 jlth j)
  • if ((v p10-p20)lt0) v -v s v
  • if ((v p11-p21)lt0) v -v s v
  • ...
  • if ((v p115-p215)lt0) v -v s v
  • if (s gt distlim) break
  • p1 lx
  • p2 lx
  • Formula

12
Efficient Implementation ofthe SAD Operation
  • Straightforward approach
  • Compute Ai-Bi for all pairs of pixels
  • Take absolute values
  • Accumulate absolute values
  • Cost 4 cycles
  • MOLEN solution
  • Observation Ai-Bi maxAi,Bi-minAi,Bi
  • Problem determine and negate min(Ai-Bi) takes gt
    1 cycle
  • Solution pass min(Ai,Bi) to accumulate stage and
    correct
  • Cost 3 cycles

13
MOVE
Semi-automatic generation of application specific
processors
14
MOVE
  • Current Contributions
  • Transport triggered architecture
  • Operational design framework (add any unit you
    like, no restrictions)
  • Several cheap designs (data logger,
    video-enhancer, MPEG-decoder, wireless
    communications)
  • Future Directions
  • Tune your application to suit your processor
  • System design
  • Multiprocessor TTA
  • Low-power processors

15
Transport Triggered Architecture
  • Published in e.g. Jnl. of Systems Architecture
    99
  • Transport triggered architecture
  • Only one instruction MOVE!
  • FU operations are triggered by moving data to
    their input ports
  • Example
  • add r1,r2,r3
  • sub r4,r2,r6
  • st r4,r1
  • TTA code
  • r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
    r6-gtO2sub.alu2
  • Radd.alu1-gtr1 Rsub.alu2-gtr4
  • r1-gtO1st.ls r4-gtO2st.ls
  • After bypassing
  • r2-gtO1add.alu1 r3-gtO2add.alu1 r2-gtO1sub.alu2
    r6-gtO2sub.alu2
  • Radd.alu1-gtr1 Rsub.alu2-gtr4
    Radd.alu1-gtO1st.ls Rsub.alu2-gtO2st.ls

16
PAMELAPerformance Analysis of Computer Systems
  • Current Contributions
  • Specialized Languages
  • Simulation Tools Methodology
  • Parallel Algorithms
  • Delft Architecture Workbench
  • Future Directions
  • Complete the Delft Architecture Workbench

17
Static Branch Prediction
  • Data dependent branches
  • for (i0 iltn-1 i)
  • minIndex i
  • for (ji1 jltn j)
  • if (aj lt aminIndex) B
  • minIndex j
  • swap(ai, aminIndex)
  • Oblivious static branch predictor B will be
    taken 50
  • Bernoulli model with truth probability p
    (profiling) large variance prediction error
  • New model based on alternating renewal processes
    reduces variance prediction error by order of
    magnitude
  • Let D (U) consecutive number of 0s (1s)
  • Then
  • Example 110011001100
  • Then EPA 0.5, VarPA 0

EPA EU/(EDEU)
VarPA (ED2 VarU EU2 VarD) (EDEU
)2
18
D-IliadHigh Performance General Purpose Computers
  • Topics
  • Uni Multiprocessors
  • Internet Processing
  • Computer Design
  • High Speed Switches
  • Current Contributions
  • Instruction level parallel machines (Superscalar,
    SCISM)
  • New Complex Instructions
  • New Designs of Arithmetic Processing
  • New Switch Design
  • Future Directions
  • New Architectural paradigm

19
Complex Streamed Instructions
  • See PACT01, EuroPar01
  • Drawbacks of MMX-like extensions
  • Multimedia (MM) register size architecturally
    visible and fixed. Ways out
  • add MM FUs and increase issue width
  • expensive
  • increase MM register size
  • existing codes have to be recompile/rewritten
  • not beneficial due to small sub-matrices
  • overhead for converting between packed data types
    and alignment
  • Proposed solution Complex Streamed Instructions
    (CSI)
  • two-dimensional vector (stream) architecture,
    streams of arbitrary length
  • stream is specified by set of stream control
    registers
  • conversion between data types in hardware
  • no loop control and address generation overhead

20
The Need for a Parallel Computation Model
  • Parallel computing has not been very successful
  • One reason lack of a standard parallel
    computation model
  • Properties that a suitable parallel computation
    model should possess
  • Scalability
  • Portability
  • Predictability
  • Model proposed by Valiant (1990)
  • Bulk-Synchronous Parallel (BSP) model

21
BSP Model
  • BSP architectural model
  • set of p processors communicating by sending
    point-to-point messages
  • BSP programming model
  • computations proceed in phases (supersteps),
    separated by barrier synchronizations
  • BSP cost model
  • superstep takes time
  • w g h L
  • where
  • w max. work
  • h max. messages (h-relation)
  • g bandwidth reciprocal
  • L latency/synchronization cost

M
M
P
P
communication network
P
P
M
M
barrier sync
barrier sync
22
PUB Library
  • Paderborn University BSP (PUB) library (IPDPS99)
    basics
  • SPMD
  • no receive operation barrier synchronization
    signifies end of all communication operations
  • only non-blocking communication primitives
  • buffered and unbuffered communication
  • message is placed in buffer associated with
    destination processor from which it can be
    retrieved after the next barrier sync
  • Additional features
  • (non-blocking) collective communication
    primitives
  • ability to partition the processors
  • running different BSP computations on the same
    system (in different threads)

23
PUB ExampleParallel Binary Multisearch
  • Search butterfly

Proc 0
Proc 1
Proc 2
Proc 3
Local search tree
24
Parallel Binary Multisearch Using PUB
void bin_search(int d, int m) for (inew_m0
iltm i) if (queryiltgkeydinRight(d,m
e) queryigtgkeydinLeft(d,me))
bsp_send(bsp,Opposite(d,me),queryi,sizeo
f(int)) else querynew_m
queryi bsp_sync(bsp) for
(i0iltbsp_nmsgs(bsp)i) msg
bsp_getmsg(bsp,i) querynew_m
(int)(bspmsg_data(msg)) if (d0)
local_search(new_m,query,n,key) else
bin_search(d-1,new_m)
25
Concluding Remarks
  • Not discussed
  • testing
  • ISA extensions for sparse matrix computations
  • computer arithmetic using single-electron
    technology
  • reconfigurable processors
  • network processors
  • low power
  • ...
  • For further information, please contact me
    (benj_at_ce.et.tudelft.nl) or see
  • ce.et.tudelft.nl
  • ce.et.tudelft.nl/benj
  • www.upb.de/pub

Thank You
Write a Comment
User Comments (0)
About PowerShow.com