Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine


1
Analysis of QuasiStaticScheduling Techniques in
aVirtualized Reconfigurable Machine
  • Yury Markovskiy, Eylon Caspi, Randy Huang,
  • Joseph Yeh, Michael Chu, John Wawrzynek
  • UC Berkeley
  • BRASS Group
  • AndrĂ© DeHon
  • California Institute of Technology

2
Outline
  • Hardware Virtualization
  • SCORE model
  • Run-time scheduler
  • Fully Dynamic
  • Quasi-Static
  • Results
  • 7x reduction in scheduling overhead
  • App performance improved by a factor of 2-7.
  • Conclusion

3
Hardware Virtualization
  • Traditional Mapping Tools
  • Expose resource constraints to designer
  • HW virtualization enables
  • App compatibility/longevity across a device
    family
  • Automatic performance scaling on larger devices

4
Stream Computation Organized for Reconfigurable
Execution (SCORE) (1)
  • Data-flow based framework
  • Programming Model
  • Execution Environment
  • Hardware Platform

5
Stream Computation Organized for Reconfigurable
Execution (SCORE) (2)
  • Array Reconfiguration

6
Run-time Scheduler
  • Run-time scheduling (late binding of resources)
  • Benefit automatic performance scaling
  • Extra burden scheduler
  • Complex optimization with multiple simultaneous
    constraints(CPs, CMBs, and network) ? NP-hard
    problem
  • Space of scheduling solutions
  • Range in quality and complexity
  • Tradeoffs timeslice vs asynchronous or dynamic
    vs static
  • What is the right timeslice size?
  • Depends on an applications run-time behavior
  • Affected by the scheduler overhead (lower bound)

7
Problem Statement
  • SCORE Micro-architecture
  • Parallel reconfiguration of independent CPs/CMBs
  • Reconfiguration time is thousands of cycles
  • Problem
  • Investigate scheduling cost
  • Reduce it to a minimum (comparable to
    reconfiguration time)
  • Understand its effect on application run-times.

8
Initial Scheduling Solution
  • Fully Dynamic Scheduler
  • Perform scheduling operation each timeslice

9
Fully Dynamic Scheduler (1)
  • Two types of overhead
  • Scheduler (avg. 124 Kcycles)
  • Reconfiguration array global controller (avg.
    3.5 Kcycles)
  • Average overhead per timeslice gt 127 Kcycles

10
Fully Dynamic Scheduler (2)
  • Total Execution Time
  • Scheduler Overhead is on average 36 of execution
    time
  • Timeslice Size 250Kcycles.

11
Quasi-Static Scheduler
  • Timeslice size
  • Dynamically controlled by array hardware stall
    detect.
  • Hardware continuously (or at small intervals)
    monitors array activity.

12
Results (1)
  • A low overhead scheduling solution
  • Scheduler overhead (avg. 14Kcycles)
  • Reconfiguration (avg. 4Kcycles)
  • 7x average reduction in overhead

13
Results (2)
  • 4.5x average application speedup
  • Reduction in overhead AND
  • Improvement in scheduling quality

14
Results Summary
  • Tested applications
  • Image de/compression consist of both dynamic
    and static rate operators.
  • All demonstrate similar speedups under
    Quasi-Static scheduler.
  • Performance improvements can be attributed to
  • Reduced scheduler overhead
  • Improved scheduling quality
  • Global rather than local (BFS) view as in dynamic
    scheduler
  • Reduction of the lower bound of timeslice size
  • Expands the space of apps well suited for
    execution under a virtualized hardware
  • Retained powerful semantics of dynamic
    data-dependent dataflow

15
Conclusion
  • Run-time scheduler
  • Required for automatic scaling under hardware
    virtualization
  • Run-time overhead sets lower bound on the size of
    scheduling step (response time)
  • Restricting applicability of virtualized hardware
  • Makes this model impractical for some apps
  • Low overhead run-time scheduling is achievable
  • Without semantic restrictions
  • With higher (or comparable) scheduling quality.
  • 7x reduction in overhead and simultaneous
  • Performance improvement of 2-7x.
  • OS is a viable alternative to manual scheduling.

16
Thank You
  • Thanks to
  • DARPA, Xilinx and STMicro
  • For more information
  • http//brass.cs.berkeley.edu/SCORE
Write a Comment
User Comments (0)
About PowerShow.com