Ben Abdallah Abderazek - PowerPoint PPT Presentation

About This Presentation
Title:

Ben Abdallah Abderazek

Description:

Networks-on-Chip Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab_at_u-aizu.ac.jp – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 72
Provided by: Gues103
Category:

less

Transcript and Presenter's Notes

Title: Ben Abdallah Abderazek


1
Networks-on-Chip
  • Ben Abdallah Abderazek
  • The University of Aizu,
  • Graduate School of Computer Science and Eng.
  • Adaptive Systems Laboratory,
  • E-mail benab_at_u-aizu.ac.jp

03/01/2010
2
Part I Application RequirementsNetwork on
Chip A paradigm Shift in VLSICritical problems
addressed by NoCTraffic abstractions Data
AbstractionNetwork delay modeling
3
Application Requirements
  • Signal processing
  • Hard real time
  • Very regular load
  • High quality
  • Typically on DSPs
  • Media processing
  • Hard real time
  • Irregular load
  • High quality
  • SoC/media processors
  • Multimedia
  • Soft real time
  • Irregular load
  • Limited quality
  • PC/desktop

Very challenging!
4
What the Internet Needs?
ASIC (large, expensive to develop, not flexible)
SoC, MCSoC?
Increasing Huge Amount of Packets Routing,
Packet Classification, Encryption, QoS, New
Applications and Protocols, etc..
  • High processing power
  • Support wire speed
  • Programmable
  • Scalable
  • Specially for network applications

General Purpose RISC (not capable enough)
5
Example - Network Processor (NP)
  • 16 pico-procesors and 1 powerPC
  • Each pico-processor
  • Support 2 hardware threads
  • 3 stage pipeline fetch/decode/execute
  • Dyadic Processing Unit
  • Two pico-processors
  • 2KB Shared memory
  • Tree search engine
  • Focus is layers 2-4
  • PowerPC 405 for control plane operations
  • 16K I and D caches
  • Target is OC-48

IBM PowerNP
6
Example - Network Processor (NP)
  • NP can be applied in various network layers and
    applications
  • Traditional apps forwarding, classification
  • Advanced apps transcoding, URL-based switching,
    security etc.
  • New apps

7
Telecommunication Systems and NoC Paradigm
  • The trend nowadays is to integrate
    telecommunication system on complex multicore SoC
    (MCSoC)
  • Network processors,
  • Multimedia hubs ,and
  • base-band telecom circuits
  • These applications have tight time-to-market and
    performance constraints

8
Telecommunication Systems and NoC Paradigm
  • Telecommunication multicore SoC is composed of 4
    kinds of components
  • Software tasks,
  • Processors executing software,
  • Specific hardware cores , and
  • Global on-chip communication network

9
Telecommunication Systems and NoC Paradigm
  • Telecommunication multicore SoC is composed of 4
    kinds of components
  • Software tasks,
  • Processors executing software,
  • Specific hardware cores , and
  • Global on-chip communication network

This is the most challenging part.
10
Technology Architecture Trends
  • Technology trends
  • Vast transistor budgets
  • Relatively poor interconnect scaling
  • Need to manage complexity and power
  • Build flexible designs (multi-/general-purpose)
  • Architectural trends
  • Go parallel !
  • Keep core complexity constant or simplify
  • Result is lots of modules (cores, memories,
    offchip interfaces, specialized IP cores, etc.)

11
Wire Delay vs. Logic Delay
Operation Delay (.13mico) Delay (.05micro)
32-bit ALU Operation 650ps 250ps
32-bit Register read 325ps 125ps
Read 32-bit from 8KB RAM 780ps 300ps
Transfer 32-bit across chip (10mm) 1400ps 2300ps
Transfer 32-bit across chip (200mm) 2800ps 4600ps
21 global on-chip communication to operation
delay 91 in 2010
Ref W.J. Dally HPCA Panel presentation 2002
12
Communication Reliability
  • Information transfer is inherently unreliable at
    the electrical level, due to
  • Timing errors
  • Cross-talk
  • Electro-magnetic interference (EMI)
  • Soft errors
  • The problem will get increasingly worse as
    technology scales down

13
Evolution of on-chip communication
14
Traditional SoC nightmare
  • Variety of dedicated interfaces
  • Design and verification complexity
  • Unpredictable performance
  • Many underutilized wires

DMA
CPU
DSP
Control signals
CPU Bus
A
Bridge
B
Peripheral Bus
IO
IO
IO
C
15
Network on Chip A paradigm Shift in VLSI
From Dedicated signal wires
To Shared network
Point- To-point Link
Network switch
Computing Module
16
NoC essential
  • Communication by packets of bits
  • Routing of packets through several hops, via
    switches
  • Efficient sharing of wires
  • Parallelism

17
Characteristics of a paradigm shift
  • Solves a critical problem
  • Step-up in abstraction
  • Design is affected
  • Design becomes more restricted
  • New tools
  • The changes enable higher complexity and capacity
  • Jump in design productivity

18
Characteristics of a paradigm shift
  • Solves a critical problem
  • Step-up in abstraction
  • Design is affected
  • Design becomes more restricted
  • New tools
  • The changes enable higher complexity and capacity
  • Jump in design productivity

We will look at the problem addressed by NoC.
19
Origins of the NoC concept
  • The idea was talked about in the 90s, but actual
    research came in the new illenium.
  • Some well-known early publications
  • Guerrier and Greiner (2000) A generic
    architecture for on-chip packet-switched
    interconnections
  • Hemani et al. (2000) Network on chip An
    architecture for billion transistor era
  • Dally and Towles (2001) Route packets, not
    wires on-chip interconnection networks
  • Wingard (2001) MicroNetwork-based integration
    of SoCs
  • Rijpkema, Goossens and Wielage (2001) A router
    architecture for networks on silicon
  • Kumar et al. (2002) A Network on chip
    architecture and design methodology
  • De Micheli and Benini (2002) Networks on chip
    A new paradigm for systems on chip design

20
Don't we already know how to design
interconnection networks?
  • Many existing network topologies, router designs
    and theory has already been developed for high
    end supercomputers and telecom switches
  • Yes, and we'll cover some of this material, but
    the trade-offs on-chip lead to very different
    designs!!

20
21
Critical problems addressed by NoC
1) Global interconnect design problem delay,
power, noise, scalability, reliability
2) System integration productivity problem
3) Chip Multi Processors (key to power-efficient
computing
22
1(a) NoC and Global wire delay
Long wire delay is dominated by Resistance
Add repeaters
Repeaters become latches (with clock frequency
scaling)
Latches evolve to NoC routers
23
1(b) Wire design for NoC
  • NoC links
  • Regular
  • Point-to-point (no fanout tree)
  • Can use transmission-line layout
  • Well-defined current return path
  • Can be optimized for noise / speed / power
  • Low swing, current mode, .

24
1(c) NoC scalability
  • For Same Performance, compare the wire area and
    power

Simple Bus O(n3 vn) O(nvn)
NoC O(n) O(n)
Segmented Bus O(n2 vn) O(nvn)
Point to-Point O(n2 vn) O(n vn)
25
1(d) NoC and communication reliability
  • Fault tolerance error correction

Router
n

Input buffer
Error correction
Synchronization
ISI reduction
m
Parallel to Serial Convertor
UMODEM
Router
U MO D E M
U MO D E M
Modulation
Link Interface
UMODEM
Interconnect
A. Morgenshtein, E. Bolotin, I. Cidon, A.
Kolodny, R. Ginosar, Micro-modem reliability
solution for NOC communications, ICECS 2004
26
1(e) NoC and GALS
  • Modules in NoC System use different clocks
  • May use different voltages
  • NoC can take care of synchronization
  • NoC design may be asynchronous
  • No waste of power when the links and routers are
    idle

27
2 NoC and engineering productivity
  • NoC eliminates ad-hoc global wire engineering
  • NoC separates computation from communication
  • NoC supports modularity and reuse of cores
  • NoC is a platform for system integration,
    debugging and testing

28
3 NoC and CMP
  • Uniprocessors cannot provide Power-efficient
    performance growth
  • Interconnect dominates dynamic power
  • Global wire delay doesnt scale
  • Instruction-level parallelism is limited
  • Power-efficiency requires many parallel local
  • computations
  • Chip Multi Processors (CMP)
  • Thread-Level Parallelism (TLP)

Gate
Interconnect
Diff.
Uniprocessor dynamic power (Magen et al., SLIP 200
Uniprocessir Performance
Die Area (or Power)
29
3 NoC and CMP
  • Uniprocessors cannot provide Power-efficient
    performance growth
  • Interconnect dominates dynamic power
  • Global wire delay doesnt scale
  • Instruction-level parallelism is limited
  • Power-efficiency requires many parallel local
    computations
  • Chip Multi Processors (CMP)
  • Thread-Level Parallelism (TLP)
  • Network is a natural choice for CMP!

30
3 NoC and CMP
Network is a natural choice for CMP
  • Uniprocessors cannot provide Power-efficient
    performance growth
  • Interconnect dominates dynamic power
  • Global wire delay doesnt scale
  • Instruction-level parallelism is limited
  • Power-efficiency requires many parallel local
    computations
  • Chip Multi Processors (CMP)
  • Thread-Level Parallelism (TLP)
  • Network is a natural choice for CMP!

31
Why Now is the time for NoC?
Difficulty of DSM wire design
Productivity pressure
CMPs
32
Traffic abstractions
  • Traffic model are generally captured from actual
    traces of functional simulation
  • A statically distribution is often assumed for
    message

33
Data abstractions
34
Layers of abstraction in network modeling
  • Software layers
  • Application, OS
  • Network transport layers
  • Network topology e.g. crossbar, ring, mesh,
    torus, fat tree,
  • Switching Circuit / packet switching(SAF, VCT),
    wormhole
  • Addressing Logical/physical, source/destination,
    flow, transaction
  • Routing Static/dynamic, distributed/source,
    deadlock avoidance
  • Quality of Service e.g. guaranteed-throughput,
    best-effort
  • Congestion control, end-to-end flow control
  • Data link layer
  • Flow control (handshake)
  • Handling of contention
  • Correction of transmission errors
  • Physical layer
  • Wires, drivers, receivers, repeaters, signaling,
    circuits,..

35
How to select architecture ?
  • Architecture choices depends on system needs.

Reconfiguration Rate During run time At
boot time At design time
CMP/ Multicore
ASSP
FPGA
ASIC
Flexibility
Single application
General purpose or Embedded systems
36
How to select architecture ?
  • Architecture choices depends on system needs.

Reconfiguration Rate During run time At
boot time At design time
A large range of solutions!
CMP/ Multicore
ASSP
FPGA
ASIC
Flexibility
Single application
General purpose or Embedded systems
37
Example OASIS
  • ASIC assumed
  • Traffic requirement are known a-priori
  • Features
  • Packet switching wormhole
  • Quality of service e
  • Mesh topology

K. Mori, A. Ben Abdallah, and K. Kuruda, Design
and Evaluation of a Complexity Effective
Network-on-Chip Architecture on FPGA", The 19th
Intelligent System Symposium (FAN 2009),
pp.318-321, Sep. 2009. S. Miura, A. Ben Abdallah,
and K. Kuroda, "PNoC - Design and Preliminary
Evaluation of a Parameterizable NoC for
MCSoCGeneration and Design Space Exploration",
The 19th Intelligent System Symposium (FAN 2009),
pp.314-317, Sep. 2009.
38
Perspective 1 NoC vs. Bus
NoC
Bus
  • Aggregate bandwidth grows
  • Link speed unaffected by N
  • Concurrent spatial reuse
  • Pipelining is built-in
  • Distributed arbitration
  • Separate abstraction layers
  • However
  • No performance guarantee
  • Extra delay in routers
  • Area and power overhead?
  • Modules need NI
  • Unfamiliar methodology
  • Bandwidth is limited, shared
  • Speed goes down as N grows
  • No concurrency
  • Pipelining is tough
  • Central arbitration
  • No layers of abstraction
  • (communication and computation are coupled)
  • However
  • Fairly simple and familiar

39
Perspective 2 NoC vs. Off-chip Networks
Off-Chip Networks
NoC
  • Cost is in the links
  • Latency is tolerable
  • Traffic/applications unknown
  • Changes at runtime
  • Adherence to networking
  • standards
  • Sensitive to cost
  • area
  • power
  • Wires are relatively cheap
  • Latency is critical
  • Traffic may be known a-priori
  • Design time specialization
  • Custom NoCs are possible

40
VLSI CAD problems
  • Application mapping
  • Floorplanning / placement
  • Routing
  • Buffer sizing
  • Timing closure
  • Simulation
  • Testing

41
VLSI CAD problems in NoC
  • Application mapping (map tasks to cores)
  • Floorplanning / placement (within the network)
  • Routing (of messages)
  • Buffer sizing (size of FIFO queues in the
    routers)
  • Timing closure (Link bandwidth capacity
    allocation)
  • Simulation (Network simulation,
    traffic/delay/power modeling)
  • Other NoC design problems (topology synthesis,
    switching, virtual channels, arbitration, flow
    control,)

42
Typical NoC design flow
Place Modules
Determine routing and adjust link capacities
43
Timing closure in NoC
  • Too long capacity results in poor QoS
  • Too high capacity wastes area
  • Uniform link capacities are a waste in ASIP
    system

44
Network delay modeling
  • Analysis of mean packet delay us wormhole network
  • Multiple Virtual-Channels
  • Different link capacities
  • Different communication demands

45
NoC design requirements
  • High-performance interconnect
  • High-throughput, latency, power, area
  • Complex functionality (performance again)
  • Support for virtual-channels
  • QoS
  • Synchronization
  • Reliability, high-throughput, low-laten

46
ISO/OSI network protocol stack model
47
Part IINoC topologies Switching
strategiesRouting algorithmsFlow control
schemesClocking schemesQoSBasic Building
Blocks Status and Open Problems
48
NoC Topology
The connection map between PEs
  • Adopted from large-scale networks and parallel
    computing
  • Topology classifications
  • Direct topologies
  • Indirect topologies

49
Direct topologies
  • Each switch (SW) connected to a single PE
  • As the of nodes in the system increases, the
    total bandwidth also increases

50
Direct topologiesMesh
  • 2D mesh is most popular
  • All links have the same length
  • Eases physical design
  • Area grows linearly with the the of nodes

4x4 Mesh
51
Direct topologiesTorus and Folded Torus
  • Overcomes the long link limitation of a 2-D
    torus
  • Links have the same size
  • Similar to a regular Mesh
  • Excessive delay problem due to long-end-around
    connection

52
Direct topologiesOctagon topology
  • Messages being sent between any 2 nodes require
    at most two hops
  • More octagons can be tiled together to
    accommodate larger designs

53
Indirect topologies
A set of PEs are connected to a switch (router).
  • Fat tree topology
  • Nodes are connected only to the leaves of the
    tree
  • More links near root, where bandwidth
    requirements are higher

54
Indirect topologiesk-ary n-fly butterfly network
  • Blocking multi-stage network packets may be
    temporarily blocked or dropped in the network if
    contention occurs

Example 2-ary 3-fly butterfly network
55
Indirect topologies(m, n, r) symmetric Clos
network
  • 3-stage network in which each stage is made up of
    a number of crossbar switches
  • m number of middle-stage switches
  • n number of input/output nodes on each
    input/output switch
  • r number of I and O switches
  • Example (3, 3, 4) Clos network
  • Non-blocking network
  • Expensive (several full crossbars)

56
Indirect topologiesBenes network
  • Rearrangeable network in which paths may have to
    be rearranged to provide a connection, requiring
    an appropriate controller
  • Clos topology composed of 2 x 2 switches

Example (2, 2, 4) re-arrangeable Clos network
constructed using two (2, 2, 2) Clos networks
with 4 x 4 middle switches.
57
Irregular TopologiesCustomized
  • Customized for an application
  • Usually a mix of shared bus, direct, and indirect
    network topologies

PE
PE
PE
sw
sw
sw
sw
sw
sw
PE
PE
PE
sw
sw
sw
sw
sw
sw
sw
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
sw
sw
sw
PE
sw
sw
sw
sw
PE
PE
sw
sw
sw
sw
sw
sw
PE
PE
PE
PE
PE
PE
Example 2 Cluster-based hybrid topology
Example1 Reduced mesh
58
Example 1 Partially irregular 2D-Mesh topology
  • Contains oversized rectangularly shaped PEs.

59
Example 2 Irregular Mesh
  • This kind of chip does not limit the shape of the
    PEs or the placement of the routers. It may be
    considered a "custom" NoC

60
How to Select a Topology ?
  • Application decides the topology type
  • If PEs few tens
  • Star, Mesh topologies are recommended
  • If PEs 100 or more
  • Hierarchical Star, Mesh are recommended
  • Some topologies are better for certain designs
    than others
  • Most of the times, when one topology is better in
    performance, it is worse in power consumption!!

61
Part IINoC topologies NoC Switching
strategiesRouting algorithmsFlow control
schemesClocking schemesQoSBasic Building
Blocks Status and Open Problems
62
NoC Switching Strategies
Switching determines how flits and packets flows
through routers in the network
  • There are two basic modes
  • Circuit switching
  • Packet switching

63
Circuit Switching
  • Network resources (channels) are reserved before
    a packet is sent
  • Entire path must be reserved first
  • The packets do not contain routing information,
    but rather data and information about the data.
  • Circuit-switched networks require no overhead for
    packetisation, packet header processing or packet
    buffering

64
Circuit Switching
Header
ACK
Data
R1
R2
R3
Router Delay
Routing switching delay
Setup time
Transfer time
65
Circuit Switching
  • Once circuit is setup, router latency and control
    overheads are very low
  • Very poor use of channel bandwidth if lots of
    short packets must be sent to many different
    destinations
  • More commonly seen in embedded SoC applications
    where traffic patterns may be static and involve
    streaming large amounts of data between different
    IP blocks

66
Packet Switching
  • We can aim to make better use of channel
    resources by buffering packets. We then
    arbitrate for access to network resources
    dynamically.
  • We distinguish between different approaches by
    the granularity at which we reserve resources
    (e.g. channels and buffers) and conditions that
    must be met for a packet to advance to the next
    node

67
Packet Switching
Advance when entire packet is buffered L free
flit buffers at next node
Store-and-forward (SaF)
Packet-Buffer Flow Control
Advance when L free flit buffers at the next
node
Cut-through
Can advance when at least one flit buffer is
available
Flit-Buffer Flow Control
Wormhole
L Packet Length
68
Packet Switching Store and Forward (SAF)
  • Packet is sent from one router to the next only
    if the receiving router has buffer space for
    entire packet
  • Buffer size in the router is at least equal to
    the size of a packet

Forward packet by packet



Buffer
Buffer
Buffer
Switch
Switch
Switch
packet
Store and Forward switching

data flit header flit
69
Packet switching Wormhole (WH)
  • Flit is forwarded to a router if space exists for
    that flit
  • Parts of the packet can be distributed among two
    or more routers
  • Buffer requirements are reduced to one flit,
    instead of an entire packet

Forward flit by flit



Buffer
Buffer
Buffer
packet
Switch
Switch
Switch

WH switching technique
data flit header flit
70
Packet switching Virtual Channel (VC)
  • Improve performance of WH routing, prevent a
    single packet blocking a free channel
  • e.g. if the green packet is blocked, the red
    packet may still make progress through the
    network
  • We can interleave flits from different packets
    over the same channel

71
Part IINoC topologies NoC Switching
strategiesRouting algorithmsFlow control
schemesClocking schemesQoSBasic Building
Blocks Status and Open Problems
Write a Comment
User Comments (0)
About PowerShow.com