Title: Performance and Power Efficient OnChip Communication Using Adaptive Virtual PointtoPoint Connections
1Performance and Power Efficient On-Chip
Communication Using Adaptive Virtual
Point-to-Point Connections
- M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol
- Computer Engineering Department, Sharif
University of Technology, Tehran, Iran - modarressi_at_ce.sharif.edu
2Outline
- Introduction and Motivations
- Virtual Point-to-Point (VIP) Connections
- Static VIP Construction Scheme
- Dynamic VIP Construction Scheme
- Setup Network
- Evaluation Results
- Conclusions and Future Work
2
Sharif University of Technology
3On-Chip Communication Mechanisms
- Packet-Switched NoCs
- Good Resource Utilization
- Modest Design Effort/Time Due to Structured and
Predictable Links - Some Power and Performance Overheads Due to
Multi-Stage Pipelined Routers - Dedicated Point-to-Point Links
- Ideal Power and Performance
- Poor Scalability Significant Area Overhead for
Large Systems - Significant Design Effort/Time Due to
Non-Predictable Link Properties
?Virtual Point-to-Point Connections in a
Packet-Switched NoC
4VIP Connections
- VIP VIrtual Point-to-point Connections
- Over One VC (Virtual Channel) of Each Physical
Channel - Bypass Some Router Pipeline Stages
- Inexpensive Extensions to a Traditional Wormhole
Router - Router Control Unit, Arbiter, Buffer of the VIP
Virtual Channels
5Router Architecture
- Buffer at the VIP Virtual Channels Is Replaced by
a Register (1-Flit Buffer)
- VIP Paths Are Kept by VIP Allocator Units at
Output Ports - ?Determines Which Input Is Connected to This
Port Along the VIP
- Allocates Output Port to VIP When Control Signals
Indicate That the VIP Has an Incoming Flit to
Forward - ?A Flow-Control Mechanism Prevents Starvation in
Packet-Switched Flits
6VIP Connections
- A VIP Is Constructed by Chaining the VIP
Registers in the Routers Between the Source And
Destination Nodes of a Communication Flow - Provides a Virtual Dedicated Pipelined Link With
1-flit VIP Buffers as Staging Registers - Flits Only Travel Over the Crossbars and Links
Which Cover the Actual Physical Distance Between
Their Source and Destination Nodes - Skip Through Buffer Read, Buffer Write, and
Allocation Operations
6
7VIP Connections
- VIPs Are Not Allowed to Share a Common Link
- To Remove Buffering, Arbitration,
- A Limited Number of VIPs in a Network
- But VIPs Cover a Significant Portion of On-Chip
Traffic Due to Communication Locality - In Most Multi-Core SoC Applications Each Core
Communicates With a Few Other Cores - In CMP Workloads Each Node Tends to Have a Small
Number of Favored Destinations for Its Messages
8VIP Construction Algorithm - Static
- Based on Application Traffic Pattern
- Input Applications Are Described by a Task-Graph
(TG) - A Heuristic Algorithm
- Map the TG Cores into the Nodes of a Mesh-based
NoC - Construct VIP for TG Edges in Order of Their
Communication Volumes - Find a Path Through Packet-Switched Network for a
TG Edge If There Are Not Sufficient Free
Resources to Build a VIP for It
9VIPs for the VOPD Application
- VIPs Cover 100 of the On-Chip Traffic for This
Application - Static VIP Construction Scheme
- Benchmarks VOPD, MWD, MPEG, MP3H263
- Up to 58 Reduction in Message Latency (39 on
Average) - Up to 65 Reduction in Power Consumption (49 on
Average)
10VIPs vs. Physical Point-to-Point Connections
- VIPs Offer
- Power and Performance Close to Dedicated Physical
Point-to-Point Connections - More Flexibility
- Dynamically Reconfigurable Based on the Traffic
Pattern of the Running Application - Less Design Effort
- Customized Dedicated Connections Over Regular
Components
11Dynamic VIP Construction
- An Alternative VIP Construction Scheme
- Dynamically Changes the VIP Connections in
Response to Communication Requirements Imposed By
the Running Application - Monitoring the NoC Traffic
- Detecting High-Volume Communications and
Constructing a VIP for Them - Select the Best Route for a VIP Using a Simple
Setup Network
12Setup Network
- Setup Network Structure
- A Light-Weight Control Network
- Simple Node Structure and Small Bit-Width
- The Same Topology as the Main Data Network
- Setup Network Operation
- Keep the Track of the Number and Destination of
Packets Sent by Each Node - Select Traffic Flows Weighting Higher Than a
Threshold (Bit/Sec.) - Finds a Path Along One of the Shortest Paths
Between the Source and Destination Nodes of the
Traffic Flow to Construct a VIP
13Dynamic VIP Construction
- Establishing a New VIP May Tear Down Some
Existing VIPs - Cost of a VIP The Cumulative Weight (bit/sec.)
of the VIPs That Will Be Torn Down By This New
VIP - Setup Network
- Finds the Path With Minimum Cost
- Sends the Cost to the Source Node to Decide on
Establishing the New VIP - A New VIP Is Established If the Cumulative Weight
of the Torn Down VIPs Is Less Than the Weight of
the Requesting Traffic Flow
13
14Setup Network
- VIP Setup Procedure
- Arbitrating Among VIP Setup Requests
- Running the Distributed VIP Setup Algorithm
- Setting Up a VIP in the Data Network By
Configuring the VIP Allocator of the Nodes Along
the VIP Path - Tearing Down Conflicting VIPs
- Each Setup Network Node Contains the
Configuration Information of Its Corresponding
Data Network Node - Due to the Distributed Nature of the Algorithm
- ?Short Reconfiguration Time
14
15Select the Minimum Cost and Keep the Port from
Which the Smaller Cost Is Received
12
21
9
2
10
9
15
4
5
7
5
8
0
3
5
9
12
1. Add the Received Cost (4) to the Weight of
Ports Along the Shortest Path (the W and N Ports)
toward the Destination Node 2. Send the New
Costs (9 and 12) to the Neighboring Nodes Along
the Destination Node
5
0
5
4
12
4
8
Port Cost ( Weight of the VIP Using It )
15
16Dynamic VIP Construction
- The Setup Network Operates in Parallel with
Packet Transmission in Packet-switched Network - Hide the Setup Time
- The Setup Network Has a Small Bit-width and
Operates Infrequently (Only When a High-volume
Flow Is Detected) - Negligible Power and Area Overhead
16
17Evaluation Results
- XMulator NoC Simulator (www.xmulator.org)
- A C -based Simulator
- Orion Power Library
- Comparison with a Conventional NoC (5-Stage
Pipelined Wormhole Switch) - Multi-Core SoC Traffic
- H.263 DecoderMP3 Decoder, H.263 Decoder MP3
Encoder, MP3 Decoder MP3 Encoder - ?38 Reduction in Message Latency, 46
Reduction in Power Consumption
17
18Evaluation Results
- Synthetic Traffic
- N-Hot Traffic 80 of Messages to Exactly N
Destination, 20 to Randomly Chosen Nodes
18
19Summary and Future Work
- Adaptable Virtual Point-to-Point Connections in a
Packet-Switched NoC - Benefit from the Advantages of Both Communication
Methods - Two Static and Dynamic VIP Construction Schemes
- Significant Power/Latency Reduction
- Future Work
- Comparing the Method with Related Work Express
Virtual Channels, Single-Cycle Routers, - Precise Area/Power Results by Implementing the
NoC in Hardware - Analytical Models Show Small Area Overhead
19
20Thank You
- Questions?
- modarressi_at_ce.sharif.edu
20