ClockFrequency Assignment for Multiple Clock Domain SystemsonaChip - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ClockFrequency Assignment for Multiple Clock Domain SystemsonaChip

Description:

Clock-Frequency Assignment for Multiple Clock Domain Systems-on-a-Chip ... Algorithm runtimes. Application runtimes. 3 clocks. 6 clocks. 9 clocks ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 16
Provided by: scotts97
Category:

less

Transcript and Presenter's Notes

Title: ClockFrequency Assignment for Multiple Clock Domain SystemsonaChip


1
Clock-Frequency Assignment for Multiple Clock
Domain Systems-on-a-Chip
  • Scott Sirowy, Yonghui Yu, Stefano Lonardi, Frank
    Vahid
  • Department of Computer Science and Engineering
  • University of California, Riverside
  • ssirowy, yonghui, stelo, vahid_at_cs.ucr.edu
  • Also with the Center for Embedded Computer
    Systems at UC Irvine
  • This work was supported in part by the National
    Science Foundation and the Semiconductor Research
    Corporation

2
IntroductionHW/SW Partitioning
  • Speedups of 2X to 10X common
  • Balboni, Fornaciari, Sciuto CODES96 Eles, Peng,
    Kuchchinski, Doboli DAES97 Gajski, Vahid,
    Narayan, many others
  • Speedups of 1000X possible
  • E.g., Cameron project, FCCM02

SW ______ ______ ______ ______ ______
Accelerator A
Accelerator B
Accelerator C
3
IntroductionMultiple Clock Domains
ASIC/FPGA
4
IntroductionBut ASIC/FPGAs have limited clock
resources
  • Accelerators may have to share clock
  • Some may run slower than their max frequency

500 MHz
233 MHz
178 MHz
145 MHz
ASIC/FPGA
- Clock Frequency Module
5
Clock Frequency AssignmentProblem Definition
Accelerator A
Accelerator B
Accelerator C
Given a set of Accelerators
1000 MHz
500 MHz
200 MHz
Each with its own maximum frequency
5
10
2
And total clock cycles for each
Also given of available clock frequencies 2

For every accelerator, find frequency that is
accelerator's max freq, with number of distinct
frequency values available freq, such that
execution time E is minimized ? Clock Frequency
Assignment problem
6
Heuristics vs. Optimal Solution
  • Could develop heuristic, but
  • Clock Partitioning likely is subpart of larger
    (iterative) exploration
  • Suboptimal solutions to sub-parts could
    accumulate
  • Is there a fast optimal solution?
  • We developed a fast dynamic programming approach

Accelerator A
Accelerator B
Accelerator C
Clock Frequency Assignment
Optimal Mapping
7
Dynamic Programming Solution
Clocks
Accelerators
8
Dynamic Programming Solution
Clocks
Accelerators
X(1,1) 5/1000 .005
.005
X(2,1) (510)/500 .030
.030
X(3,1) (5102)/200 .085
.085
9
Dynamic Programming Solution
Clocks
Accelerators
X(1,1) 5/1000 .005
.005
X(2,1) (510)/500 .030
.030
X(3,1) (5102)/200 .085
.085
10
Dynamic Programming Solution
Clocks
Algorithm Time Complexity O(nF2)
Accelerators
.005
.005
X(1,2) 5/1000 .005
X(2,2) 5/1000 10/500 .025
.025
.030
X(3,2) Min of X(2,1) 2/200 .040 X(1,1)
(102)/200 .065
.040
.085
E .040 seconds
11
Dynamic Programming Solution
Clocks
Accelerators
.005
.005
.025
.030
.040
.085
E .040 seconds
12
Example H.264 Decoder
  • Function SW Time(s) hw cycles
    hw max clk freq (MHz)
  • ltMotionComp_00gt 0.040733 1 281
  • ltInvTransform4x4gt 0.034787 8 194
  • ltFindHorizontalBSgt 0.025026 1 140
  • ltGetBitsgt 0.024681 1 200
  • ltFindVerticalBSgt 0.02366 1 140
  • ltMotionCompChromaFullXFullYgt 0.023577 1 285
  • ltFilterHorizontalLumagt 0.023559 4 134
  • ltFilterVerticalLumagt 0.020008 4 138
  • ltFilterHorizontalChromagt 0.018803 4 134
  • ltCombineZerosInvQuantScangt 0.018438 1 120
  • ltMotionCompensategt 0.016822 10 40
  • ltFilterVerticalChromagt 0.016035 4 138
  • ltMotionChromaFracXFracYgt 0.016023 32 78
  • ltReadLeadingZerosAndOnegt 0.015665 1 106

13
H.264 Results
Over 2.7x speedup over a design with only one
clock frequency
Only a few distinct frequencies give most speedup
14
Results- Synthetic Benchmarks
lt1 sec even for largest examples
Over 3.5x Speedup
Algorithm runtimes
Application runtimes
15
Conclusions
  • Multiple clock domains can yield significant
    speedups during partitioning
  • 1.5x to 3x speedups using just three clock
    frequencies
  • On top of already gained from hw/sw partitioning
  • Efficient optimal algorithm is possible
  • Dynamic programming approach
  • Likely applies to other clock-frequency
    assignment problems
Write a Comment
User Comments (0)
About PowerShow.com