EnergyAware Mapping for Tilebased NoC Architectures - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

EnergyAware Mapping for Tilebased NoC Architectures

Description:

Advancement in semi-conductor technology has made ... Crossbar Switch. Buffer. Buffer. Buffer. Buffer. Buffer. To processor. Inter-connect. Inter-connect ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 20
Provided by: CSU153
Category:

less

Transcript and Presenter's Notes

Title: EnergyAware Mapping for Tilebased NoC Architectures


1
Energy-Aware Mapping for Tile-based NoC
Architectures
  • Based on Jingcao Hu and Radu Marculescus Paper
  • Ankit Mathur
  • 2000101

2
Motivation
  • Advancement in semi-conductor technology has made
    packing of multiple processors on single chip
    possible
  • CPU, DSP cores, video stream processors etc.
    could be placed on single chip
  • The multiple cores on single chip need to
    communicate and transfer data packets amongst
    themselves
  • The mapping and placement of cores on the chip
    can drastically affect the performance of the
    system
  • Placing large number of inter-connections brings
    in problems like crosstalk etc.

3
Solutions to Packing Problem
  • The processor cores can be placed on a chip in a
    regular tile form
  • Instead of dedicated inter-connects, each core
    also has a router and direct communication occurs
    with immediate neighbours

View of the chip
4
The Architecture
  • The chip is composed of n x n tiles
  • The inter-connect is a 2D mesh network
  • Each tile is composed of a processor
  • and router

5
Routers
  • Routers in the tiles use registers for buffer
    instead of RAM
  • They have four external ports for inter-connects
  • They use static routing for packet switching
  • Adapting routing would require lot of logic and
    require larger buffer sizes to accommodate out of
    order arrival

6
The Energy Model
  • Ebit the energy to transport one bit through a
    router
  • which are energy spent in switch, buffering,
    inter- connects
  • In case of routers on tile, link delay is also
    included
  • Delay due to link gtgt buffer and interconnect
    delays

7
Problem Formulation
  • Application Characterization Graph (APCG) is a
    directed graph with each vertex ci for processor
    core and arc ai,j showing communication between
    ci and cj.
  • Architecture Characterization Graph (ARCG) is a
    directed graph with each vertex ti for one tile
    and arc pi,j representing routing path from ti to
    tj
  • b(ai,j) is the bandwidth of the arc
  • v(ai,j) is the volume of communication over the
    arc
  • e(pi,j) is the avg. energy consumed by the link
  • L(pi,j) is the set of links

8
Optimization function
  • Given size(APCG) ? size(ARCG)
  • Find a function map( ) such that
  • with the conditions that one tile can hold only
    one processor core and the load on any link is
    less than its bandwidth

9
Significance of the Problem
  • Experiments were conducted for various Task
    Graphs
  • The control set was simulated annealing (SA)
  • The ratio of results is taken with the result
    from SA

10
The Algorithm
  • Branch and Bound algorithm approach has been
    adopted
  • The data-structure is a tree where
  • root ?no core mapped,
  • internal?partial mapping done,
  • leaf?complete mapping

The data-structure
11
Algorithm (contd.)
  • Search is done over the tree and unmapped
    processor cores are enumeratively assigned newly
    generated child nodes
  • For Bound step Lowest Bound Cost (LBC) and Upper
    Bound Cost (UBC) are estimated and branches are
    trimmed
  • Tighter bound of LBC and UBC leads to better
    solution
  • There is a trade-off between avg. time per node
    and number of nodes to be processed

12
UBC calculation
  • Legal descendent leaf node can be taken as UBE of
    the node
  • Using greedy approach, node with largest
    communication is mapped to (x,y) which are
    estimated as

13
LBC calculation
  • LBC is estimated in three parts
  • which are cost of communication between mapped
    cores, unmapped cores and mapped to unmapped

14
Speed-up Techniques
  • IP-ordering The processor cores can be sorted
    in the order of their communication demand
  • Priority Queue It is used to sort nodes waiting
    to be branched based on cost. For lower cost
    nodes it is more likely the minimum UBE would
    decrease
  • Symmetry Exploitation Only a sample of similar
    nodes be computed as others would be mirror cases
    if there is symmetry

15
Experimental Results
  • Comparison was done with the Simulated Annealing
    process

Ratios of performance over SA for 10 different
applications
16
Results
  • Variation over number of tiles

17
Results with Multimedia Application
Comparison of power consumption
18
Conclusions
  • The algorithm provides an automated approach for
    processor core mapping to tiles
  • Comparison to simulated annealing shows that the
    results are of good quality
  • Computation time is much less as compared to SA
  • As future work, the 2D mesh can be extended to
    other topologies

19
Thank You
Write a Comment
User Comments (0)
About PowerShow.com