A%20High%20Throughput%20Network-on-Chip%20Architecture%20for%20System-on-Chip%20Interconnect - PowerPoint PPT Presentation

About This Presentation
Title:

A%20High%20Throughput%20Network-on-Chip%20Architecture%20for%20System-on-Chip%20Interconnect

Description:

Most Straightforward Crossbar. Good Throughput (maxes at 66%) Non Scalable (Quadratic) ... No Crossbar. No Buffers (Pushed to the Clients) ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 31
Provided by: bouhr
Category:

less

Transcript and Presenter's Notes

Title: A%20High%20Throughput%20Network-on-Chip%20Architecture%20for%20System-on-Chip%20Interconnect


1
A High Throughput Network-on-Chip Architecture
for System-on-Chip Interconnect
  • Abdelhafid Bouhraoua and M.E.S El-Rabaa
  • Computer Engineering Department (COE)
  • College of Computer Science and Engineering
    (CCSE)
  • King Fahd University of Petroleum and Minerals
    (KFUPM)
  • Dhahran, Eastern Province, Saudi Arabia

2
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

3
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

4
Networks-on-Chips
  • Route Packets NOT Wires, William J. Dally
  • Idea Build a Complete on-Chip Network
  • Unified Communication Model (Similar to OSI
    Stack)
  • No Ad-hoc Effort
  • Standardized Interfacing (May be provided by IP
    Vendors)
  • Unified Network Elements (Routers, Link
    Interfaces)
  • No Design required by the SoC Teams
  • Flexible Interconnect and Reduced Global Wiring

5
NoC Requirements
  • Performance
  • How fast packets are moved across the network?
  • How much traffic is carried at the same time and
    for how long?
  • Overhead
  • How Big is its required Size (in Gates) ?
  • Adaptivity
  • Does it Adapt Easily to new Designs ?
  • Complexity
  • How Easy is Interfacing to it ?

6
Previous Work
  • Majority directly derived from other research
    (Interconnection Networks for Parallel
    Architectures)
  • Router architectures directly derived from
    inter-chip architectures where the routers were
    implemented on a single chip ? substantial
    overhead.
  • Focus on the router architecture alone to achieve
    certain goals in latency
  • Circuit switching techniques introduced to
    provide a certain guarantee for the latency.
  • Added complexity to achieve guaranteed latency is
    an overkill in the on-chip context.
  • Did not fully take advantage of the fact that the
    network is on-chip where the main gain is no-pin
    limitation.

7
Which Network?
  • Most Straightforward ? Crossbar
  • Good Throughput (maxes at 66)
  • Non Scalable (Quadratic)
  • Complexity Of Implementation for Higher Number of
    I/Os.

8
2-D Mesh
  • Very Popular Topology in NoCs.
  • Very Suitable for the 2D nature of Chip
    Floorplanning (Tiling)
  • Very High Constraints
  • Inefficient routing algorithms (deadlock-free by
    construction)
  • Efficient routing algorithms (Complex
    implementation)
  • Poor performance Saturation reached at 30 .

9
Analysis
  • Low throughput. Means latency cannot be
    guaranteed above the maximum throughput levels
  • Low throughput cause by contention over the
    output ports of routers among several incoming
    packets
  • Cannot prevent contention from happening.
    Contention makes router architectures more
    complex because they need to integrate buffering
    and prioritization logic.
  • Routers that implement both packet and circuit
    switching makes the architecture even more
    complex.

10
Methodology
  • Take advantage of the On-Chip Context
  • Design frozen before tape out
  • No internal IO limitations
  • Aim for a High Throughput Architecture
  • Circuitry used at 30 of its maximum is NOT an
    optimal Solution (Clock frequency, power).
  • Reduced router size
  • Integrate a large number of routers
  • Wormhole routing vs. Store and Forward
  • Reduce required buffers in routers

11
Fat Tree
What topology resembles a crossbar? Banyans or
Multistage Interconnection Networks.
  • Bidirectional multistage or folded multistage
    networks
  • Bidirectional multistage are two entities
  • The Fat Tree (FT)
  • The butterfly.
  • Fat Tree better than butterfly (previous work)
  • n1 Stages (or rows)
  • Size is
  • Routers n x 2n
  • Clients 2n1
  • Diameter 2logk 1 n log k

12
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

13
Routing in Fat Tree
  • Routing reduced to routing in a binary tree.

Binary Trees
  • Three Routing Directions
  • UP
  • RIGHT
  • LEFT

14
Routing in Fat Tree
  • Matrix n rows x 2(n-1) columns.
  • Router (r,c)
  • r row index (rows are indexed from 0 to n-1)
  • c column index (columns are indexed from 0 to
    2(n1) -1)
  • Size of the clients address space reachable
    using the downside ports is equal to 2r
  • It is always a continuous interval of addresses
    of the form l, u.
  • Lower bound l smallest address reached from the
    router (r,c).
  • Smallest address within the range obtained by
    clearing the lowest r bits of the column c.
  • l (c/2r) x 2r.
  • Upper bound u largest address reached from the
    router (r,c).
  • Largest address obtained by adding 2r to the
    lower bound l.
  • u l 2r.

15
Routing In Fat Tree
Summit Routers
Alternate Paths
Routing UP Adaptive Routing Down Deterministic
16
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

17
Contention in Fat Tree
UP
Contention on the way down
Many Choices for Going UP
LEFT
RIGHT
  • Packets coming from the UP links are never routed
    up
  • Only packets coming from the bottom links are
    routed up.
  • Since the number of UP links is equal to the
    number of bottom links, there cannot be any
    contention when routing up.
  • Contention occurs only when going down.
  • Bottom links are split in RIGHT and LEFT links,
    deterministic routing of packets will lead to
    contention.

18
Modified Fat Tree
  • Doubling of downward links eliminates contention

19
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

20
Router Architecture
  • No Crossbar
  • No Buffers (Pushed to the Clients)
  • Every downstream input simultaneously connected
    to two outputs.
  • Contention eliminated between the inputs going
    downstream.
  • Number of outputs is 2k2 for k inputs (case of
    when the router is a summit)
  • Router models differ from each other only by two
    items
  • Number of input and output ports on the down link
  • Routing function constants (r,c)

21
Routing Circuitry
  • All network elements are constants and frozen at
    design time.
  • All lower bound and upper bound values, used to
    generate the routing functions, are constants for
    each router.
  • These constants are entered as inputs into the
    routing function
  • Routing Function implemented using comparators.
  • Constants needed by the routing function are
  • l
  • L l 2r-1
  • u

22
Client Interface
Up Link
Down Links (from router)
  • Buffers pushed to the Client Interfaces
  • Each incoming link is terminated with a FIFO
    memory.
  • The different FIFO memories connected to the
    client through a single shared bus.

Client/IP Block
  • Bus can be wider to perform data transfers faster
    than what is received in the FIFOs.
  • The size of FIFOs customizable by design team
    according to the specifications

23
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

24
Simulation Conditions
  • Uniform Traffic Generation
  • Uniform Distribution of Destinations
  • Traffic Rate constant fraction of Maximum Link
    Bandwidth
  • Variable Packet Size (within a predetermined
    range eg. 64 bytes /- 10)
  • Simulation Platform Cycle-based C-based.
    Developed for this purpose

25
Throughput
  • More than 90 Throughput achieved
  • Compare with Regular Fat Tree

26
Latency
  • Latency has linear progression
  • Large component of Latency spent in Receivers
    FIFOs

27
Area and Speed
Buffer-less architecture less costly
28
Client Buffer Utilization
  • Buffers pushed to the client interfaces.
  • Considerable number of buffer lanes is necessary
    for every client interface.
  • Simulations shows a linear progression of
  • the maximum number of lanes used during
    operation.
  • Obtained figures are an order of magnitude lower
    than the number required by the architecture.
  • Number of buffer lanes in the client interface
    can be tailored to suit the class of applications
    at hand while reducing buffering area.

29
Outline
  • Overview and Motivation
  • Fat Tree Network Properties
  • Modified Fat Tree
  • Router Architecture
  • Performance Evaluation
  • Conclusion

30
Conclusion
  • A contention-free modified FT architecture is
    proposed.
  • Proposed architecture achieves maximum
    theoretical throughput and has smaller latency
    than conventional FTs.
  • Latency increases linearly with input load.
  • Achieved performance is actual performance using
    a contention-free network.
  • The area of the network is kept small because of
    the absence of buffers in the router
    architecture.
  • Number of buffer lanes in the client interfaces
    can be tailored for a specific platform to suit
    the class of applications at hand while reducing
    buffering area.
Write a Comment
User Comments (0)
About PowerShow.com