VEAL: Virtualized Execution Accelerator for Loops - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

VEAL: Virtualized Execution Accelerator for Loops

Description:

VEAL: Virtualized Execution Accelerator for Loops. Nate Clark1, Amir Hormati2, Scott Mahlke2 ... Design a generalized loop accelerator. Not covered in this talk ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 19
Provided by: Nat9165
Category:

less

Transcript and Presenter's Notes

Title: VEAL: Virtualized Execution Accelerator for Loops


1
VEAL Virtualized Execution Accelerator for Loops
  • Nate Clark1, Amir Hormati2, Scott Mahlke2
  • 1 Georgia Tech., 2U. Michigan

2
How to get Efficiency?
  • Microarchitecture changes
  • Multi- / many-core
  • Heterogeneity

STI Cell
Core2 Duo
3
How is Heterogeneity Used?
Program
Hetero.
GPP
Control Statically Placed in Binary
4
Problem With Static Control
CPU
Program
  • Not forward/backward compatible

5
Solution Virtualization
  • Abstract accelerator features
  • Reexamine compiler algorithms
  • Key do the hard stuff offline

Offline
Online
6
This Paper
  • Examines loops as heterogeneity target
  • ASICs often implement loops
  • Design a generalized loop accelerator
  • Not covered in this talk
  • Explore how to virtualize loop accelerators
  • I.e. abstract the accelerator interface

7
Loop Accelerator Template
8
Why More Efficient Than GPP?
  • Simple control flow
  • Decoupled memory accesses
  • I-Cache unnecessary
  • Customize execution resources for loops

9
Proposed Loop Accelerator
  • 1 CCA
  • 2 Int units
  • 16 regs
  • Memory (4x)
  • 16 Input streams
  • 8 Output streams
  • 0.8 mm2, 90nm

10
Modulo Scheduling
  • High quality software pipelining technique
  • Simple control structure (low HW cost)
  • - Can be slow, i.e., hard to do dynamically
  • - Loops no side exits, no while, if convertible

11
Benchmark Execution Time
12
Modulo Scheduling Basics
FU C












Kernel


13
Modulo Scheduling Example
1. CCA Mapping 2. II Calculation 3. Priority 4.
Scheduling 5. Reg. assignment/
communication
CCA Int Int



CCA Int Int

2
3
4
5
7
Time
6
Priority 2, 4, 6 3, 5 7
14
Measured Scheduling Overhead
70 Priority, 19 CCA
15
Supporting Hybrid Compilation
Loop 1 ld 2 add 3 sub 4 brl CCA 5 or 6 or 7
add 8 str CCA and sub xor ret
Data 0 1 4 6 3 Loop 1 ld 2 add 3 sub 4 brl
CCA 5 or
Loop 1 ld 2 add 3 sub and sub xor 5 or 6
or 7 add 8 str
16
Speedups
17
Summary
  • Virtualization key to heterogeneity
  • VEAL speedup 2.54
  • 2.63 w/o translation (i.e., not binary
    compatible)
  • 2.17 fully dynamic
  • CCA and priority 89 overhead
  • mpeg2dec 2.1 vs. 1.15

18
Thank you!
  • Questions?
  • http//www.cc.gatech.edu/ntclark
  • http//cccp.eecs.umich.edu/
Write a Comment
User Comments (0)
About PowerShow.com