Highlevel Partitioning Of Discrete Signal Transforms For Distributed Hardware Architectures PowerPoint PPT Presentation

presentation player overlay
1 / 42
About This Presentation
Transcript and Presenter's Notes

Title: Highlevel Partitioning Of Discrete Signal Transforms For Distributed Hardware Architectures


1
High-level Partitioning Of Discrete Signal
Transforms For Distributed Hardware Architectures
  • Rafael Arce Nazario, PhD
  • Department of Physics and Electronics
  • University of Puerto Rico, Humacao
  • University of Puerto Rico, Rio Piedras Campus
  • November 13, 2007

2
Presentation Overview
  • Background / Motivation
  • Problem statement
  • Related Work
  • Research methodology
  • Partitioning Tools
  • Formulation Exploration
  • Results and discussion
  • Conclusions, Contributions, and Future Work

3
Motivation and Objective
4P DFT size 16
  • Discrete Signal Transforms (DSTs)
  • DFT, DCT, lots of applications
  • Hardware accelerated but at high area cost
  • Example 4P DFT formulation by Dr. Rodríguez
  • Distributed (dedicated) hardware architectures
    (DHAs)
  • Partitioning plays key role
  • Partitioning beyond multi-device
  • Multi-core GPPs IBM CELL BE, Intel Core Duo
  • System-on-Chip Network-On-Chip
  • Need tools to aid in partition exploration,
    mapping, implementation ? design automation

Virtex Family FPGA
4
Philosophy
  • Hypothesis
  • Automated partitioning of DSTs can be improved by
    introducing high-level DST considerations, such
    as graphic and algorithmic properties.

5
Problem statement - Inputs
  • Given
  • T a high-level description of a DST,
    T(T0,T1,..,TM-1)
  • N points, R resolution, F numerical format
  • A description of the target architecture as a
    hypergraph H(D,C)
  • D Set of devices,
  • Wd(di) Weight of device
  • B(ci) Bitwidth of channel bi
  • C Set of communication channels
  • Wc(ci) Weight of each channel

Time from beginning of computation until last
data point is processed.
  • Determine a mapping function fT?D such that
  • Minimize
  • Constraints

At any time, bits from task i to task j dont
exceed communication resources.
The resources to implement tasks to a given
device dont exceed device resources.
  • Minimize Latency of the overall transform
    implementation.

6
Background Partitioning in CAD
High-level / Behavioral
Design Idea
  • partitions a high-level specification which has
    been represented as a flow-graph

High-level synthesis
More information
More structure
Logic Design
Structural
  • partitions a netlist at the Register Transfer
    Level or gate-level

Technology Mapping
Place and Route
Fabrication or Bit-stream Transfer
  • We choose HLP
  • More information for functional aspects
  • CAD Higher Level Higher Benefits

DHA
6
7
Previous Work DHA Partitioning
  • Partitioning is an NP problem Garey76
  • Use heuristics stochastic and/or deterministic
  • Main limitations in previous strategies
  • Exploration limited to graph-level and below
  • Most available methods are structural or pure DFG
  • Functional properties - not appropriately
    considered
  • Generic methodology
  • Good for common case, not excellent
  • Representation granularity
  • Either finest-granularity or user-specified
    coarse nodes
  • Comparison to other multi-device implementations
  • Lack of accepted benchmarks algorithms and
    architectures.

8
DMAGIC Partitioning Methodology
DMAGIC DST Mapping using Algorithmic and Graph
Interaction and Computation
DST-features are introduced in two abstraction
levels as part of our methodology at the graph
level and the algorithmic-level.
8
9
Inputs
Distributed Hardware Architecture (DHA)
optional ring connection
  • Representative of commercial and academic
    platforms
  • Practically scalable
  • Discrete Signal Transforms
  • Discrete Fourier Transform (DFT) Discrete
    Cosine Transform (DCT)
  • Focus on one-dimensional
  • Extensible to multidimensional transforms through
    (anti)lexicographically-ordering into a vector.
  • Kronecker Products Algebra (KPA) used for
    algorithmic representation
  • Compact framework Formulation implies
    structure
  • Exploration of formulations for various
    architectures VanLoan92 Johnson90

10
Research methodology
10
11
Kronecker to Graph Conversion Tool
KTG
Operator matrices Identity, Transform,
Permutation, Unitary, Unitary Transpose,
Twiddle KPA operations Tensor product (?),
Direct Sum (? ), Matrix Multiplication
  • Output Weighted graph with level information

12
Graph Partitioning/Placement Heuristic
Partition/ Placement
Unpartitioned DFG
Min Cost Function
Partitioned DFG
  • P/P inspired by Kernighan Lin bipartition
    heuristic
  • Extended to k-way partitioning for heterogenous
    channels
  • Cost function sensible to DHA main concerns

Previous impl.
Our Part/Place
weight of channel i
required communications through i
  • Better reflects the impact on DHA resources
  • Communication channels are heterogeneous in cost
    and given

13
Partition/Placement Engine
DST considerations
  • Initial solution balanced horizontal linear
    partitioning
  • Scheduling consideration swap nodes from same
    computational stages.

14
Research methodology
14
15
Formulation Exploration
  • Use DST rules to explore space of equivalent
    formulations in search for one that better suits
    the target architecture.
  • Combinatorial explosion of the exploration space.
  • Find rules amenable to hardware implementation.
  • Conducted experiments to assess the impact of
    transformations on partition quality. Results
    used to devise exploration strategy.

Objective
Challenges
Approach
16
Experiment 1 2
  • Effect of inter-column permutations (ICP)

Cooley-Tukey G. Sande
Stockaham T. Stockham
Pease
ICP and granularity have effect on solution
quality, yet hard to establish heuristic.
17
Experiment 3 Breakdown strategy
  • Breakdown strategy order and divisors with
    which a transform is decomposed using a rule such
    as Cooley-Tukey factorization algorithm
    ,where nmp
  • Split trees common graphical representation of
    breakdown strategy
  • Example Two split trees for a DFT size 64.

(a)
(b)
(a)
(b)
18
Experiment 3 Results
  • Procedure
  • Exhaustive generation of split trees for DFT
    sizes n16 to 256.
  • Formulations partitioned for various topologies
    using tools.
  • Results represented as mega-trees
  • Observation of split tree decisions that lead to
    partition friendly formulations ? heuristics

Mega tree for 32-point FFT
19
Formulation Exploration Heuristic
Greedy top-down formulation exploration using
breakdown strategy.
Start
Gen. Initial Splits
DFG Partition
Latency Improvement?
  • Results compared against best results of
    Simulated Annealing heuristic.
  • Latency reduction Average 10.8 , Peak 13.3
  • Run time reduction Average 96.3 , Peak
    99.4

F
End
T
Det. Next Split Leaf
Reformulate
20
Discrete Cosine Transform
  • Encouraging results obtained from FFT formulation
    exploration with CT-factorization.
  • DCT is not as regular as FFT.
  • No Cooley-Tukey equivalent for DCT.
  • Study existing DCT formulations, appropriateness
    for distributed implementation.
  • Derived formulation that will allow proper
    exploration via CT-like decomposition.

Motivation
Challenges
Approach
21
Regular CT-like DCT Formulation
  • Nikara04 formulation

Permutation Rules
CT-like Formulation
  • arbitrary decomposition of size 2n DCT onto m and
    k sized components

22
DCT Experiment and Results
  • Compared latency and run-time for DCT
    formulations

Log scale
Up to 18.3 latency reduction vs. best of the
rest formulations. Average 10.8
Up to 83.3 run-time reduction vs. best of the
rest formulations. Average 47.7
23
Research methodology
23
24
Results Comparison with Previous Methodology
  • Latency Reduction Average 23.3, Peak 34.1
  • Run-Time Reduction Average 98.0, Peak 99.9

25
Results Comparison with Previous Methodology
  • Previous methodology Srinivasan01
  • Features
  • Generic partitioning methodology for multi-FPGA
    boards
  • Reported results of partitioning small (16-point)
    DFT and DCT
  • Well documented ? allows 3rd party implementation
  • Methodology
  • Input (fully expanded) dataflow graph
  • Partition heuristic Genetic Algorithms
    optimizing objective funct.

26
Conclusions
  • Hypothesis was proven correct!
  • Multiple opportunities to improve partitioning by
    taking advantage of DST and DHA features.
  • Graph level regularity of permutations and
    operations.
  • Graph partitioning and area estimator can be made
    more sensible to DHA and DST concerns
  • Algorithmic level
  • Reformulation has significant impact on partition
    quality
  • Improvements over generic methodology
  • Latency reduction (23.3 avg, 34.1 max)
  • Runtime (98.8 avg, 99.9 max)

27
Main Contributions
  • Automated high-level partitioning methodology for
    DSTs onto distributed hardware architectures
  • Fusion of CAD / DSP knowledge
  • KTG automated and extensible methodology for
    conversion of Kronecker product algebra (KPA)
    formulations into DFGs
  • Architectural model and high-level estimator for
    the implementation of distributed DSTs
  • Graph partitioning heuristic for k-way for DSTs
    to DHAs
  • New knowledge of effect of DST formulations on
    their distributed implementation
  • New heuristic for exploring DST formulation space
  • New arbitrary decomposition DCT formulation

28
Future Work
  • Study partitioning in alternated architectures
    and for diverse objectives
  • Network-on-chip architectures
  • Power efficiency objective
  • Computer-driven search of heuristics
  • Computer Learning / Genetic programming
  • Development of a production-quality tool
  • Study of further DSTs
  • Partition-aware scheduling heuristics

28
29
Other future work
  • Application of EDA algorithms to bioinformatics
    problems
  • Analogy of abstraction levels electronics vs.
    molecules
  • EDA abstraction levels for transistors
  • differential equations at the electronic level
  • boolean equations for digital electronics
  • Biological abstraction models molecular
    reactions
  • differential equations at the physical chemistry
    level
  • discrete models of molecular behavior with
    probabilistic considerations
  • Research intention Discover novel ways in which
    these two fields may benefit from each other.
  • Data structures
  • Algorithms

30
Examples
Discrete genetic modeling can benefit from
representation and manipulation techniques
commonly used for logic optimization and
minimization in digital circuits
Ideker00Riedel.
Cartoon of a Boolean network with two inputs per
node. Colors represent the state of a node ("on"
or "off"). At each time step, the each color is
updated according to the node's truth table and
the states of its input nodes.
? Binary Decision Diagram
  • .
  • Protein structure prediction from amino acid
    sequence information has been improved by using
    heuristics common in graph partitioning.

31
Examples (cont.)
Protein structure prediction from amino acid
sequence information has been improved by using
heuristics common in graph partitioning.
Previously use of stochastic methods (S.A. ,
G.A). Recently use of deterministic methods
inspired in EDA tasks DeRonne07
32
Electronic device security
  • Electronic devices being used in security
    sensitive situations
  • Be able to authenticate devices to prevent
    identity theft.

33
Traditional Authentication
  • Store secret key inside device.

Send a random number
Private Key
  • Encrypted Number
  • Only the valid secret key can generate a
  • valid resonse

Public Key
  • Expensive in terms of energy, fabrication
  • Key may be read using electromagnetic, microscopy
    methods.

34
Physically Unclonable Functions
  • Use electrical characteristics of device as a
    digital fingerprint.
  • No two electronic devices are exactly alike.
  • Implement digital circuits that take advantage of
    this to distinguish one chip from the other
    (authenticate)

35
Physically Unclonable Functions
  • Use electrical characteristics of device as a
    digital fingerprint.
  • No two electronic devices are exactly alike.
  • Implement digital circuits that take advantage of
    this to distinguish one chip from the other
    (authenticate)

36
Physically Unclonable Functions (PUFs)
  • Only the authentic device will have a given
    response to a series of challenges.
  • Currently a few logic circuits have been
    proposed to implement PUFs
  • Conjecture Electronic Design Automation
    techniques can be applied to find improved PUFs
    for various types of architectures, and
    optimizing for various characteristics.

37
Foreseeable tasks
  • Define models to describe physical
    characteristics.
  • Objective functions
  • Optimization methodology
  • Heuristics
  • Stochastic optimization algorithms

38
Publications
Journal Articles
  • R. Arce Nazario, M. Jiménez, D. Rodríguez.
    Mapping of Discrete Cosine Transforms onto
    Distributed Hardware Architectures. Submitted
    Journal of VLSI Signal Processing. April 2007.
    Springer. Status Under revision.
  • R.. Arce Nazario, M. Jiménez, D. Rodriguez.
    Algorithmic-level Exploration of Discrete Signal
    Transforms for Partitioning to Distributed
    Hardware Architectures. Accepted for publication
    on IET Computers Digital Techniques. April
    2007.

Articles in peer-reviewed IEEE ACM conferences
  • R. Arce Nazario, M. Jiménez, D. Rodríguez.
    Partitioning Exploration for Automated Mapping
    of Discrete Cosine Transforms onto Distributed
    Hardware Architectures. Accepted to the 50th
    IEEE Midwest Symposium on Circuits and Systems.
    August 2007. Montreal, Canada.
  • R. Arce Nazario, M. Jiménez, D. Rodríguez.
    High-level Partitioning of Discrete Signal
    Transforms for Multi-FPGA Architectures. 16th
    IEEE International Conference on Field
    Programmable Logic and Applications. August
    2006. Madrid, Spain.
  • R. Arce Nazario, M. Jiménez, D. Rodríguez.
    Functionally-aware Partitioning of Discrete
    Signal Transforms for Distributed Hardware
    Architectures. 49th IEEE Midwest Symposium on
    Circuits and Systems. August 2006. San Juan, PR.
  • R. Arce Nazario, M. Jiménez, D. Rodríguez.
    Effects of High-Level Discrete Signal Transform
    Formulations on Partitioning for Distributed
    Hardware Architectures. IEEE on Symposium
    Field-Programmable Custom Computing Machines.
    April 2006. Napa, CA

39
Publications
Articles in peer-reviewed IEEE ACM conferences
(continued)
  • R. Arce Nazario, M. Jiménez, D. Rodríguez. An
    assessment of high-level partitioning techniques
    for implementing discrete signal transforms on
    distributed hardware architectures. 48th IEEE
    Midwest Symposium on Circuits and Systems.
    August 2005. Cincinnati, Ohio.
  • R. Lembach, R. Arce-Nazario, D. Eisenmenger, and
    C. Wood. A diagnostic method for detecting and
    assessing the impact of physical design
    optimizations on routing. ACM International
    Symposium on Physical Design. April 2005. San
    Francisco, CA.
  • R. Arce Nazario, M. Jiménez, Integer Pair
    Representation for Multiple Output Logic, 47th
    IEEE Midwest Symposium on Circuits and Systems.
    December 2003, Cairo, Egypt.

Other publications, posters and presentations
  • Rafael A. Arce-Nazario and Manuel Jimenez.
    High-Level Partitioning of Discrete Signal
    Transforms for Distributed Hardware
    Architectures . Poster presentation Puerto Rico
    Interdisciplinary Scientific Meeting. February
    2007. Bayamón, Puerto Rico
  • Rafael A. Arce-Nazario and Manuel Jimenez.
    High-Level Partitioning of Discrete Signal
    Transforms for Distributed Hardware
    Architectures . Poster presentation Workshop on
    Grid Services, Automated Information Processing,
    and Wireless Sensor Networks. February 2007. San
    Juan, Puerto Rico
  • Rafael A. Arce-Nazario and Manuel Jimenez.
    High-Level Partitioning of Discrete Signal
    Transforms for Distributed Hardware
    Architectures . Poster presentation Puerto Rico
    Interdisciplinary Scientific Meeting. March 2006.
    Cayey, Puerto Rico
  • Rafael A. Arce-Nazario, Manuel Jimenez, and
    Domingo Rodriguez. High-level Partitioning of
    Discrete Signal Transforms for Multi-FPGA
    Architectures. Poster presented at WALSAIP
    Project HP Labs research visit. October 2006.
    Mayagüez, Puerto Rico.
  • R. Arce Nazario, M. Jimenez, D. Rodriguez.
    High-Level Partitioning Techniques For
    Implementing Discrete Signal Transforms On
    Distributed Hardware Architectures. Poster
    presentation in Annual EPSCoR conference.
    September 2005. Rio Grande, PR.

39
40
Publications
Other publications, posters and presentations
(continued)
  • R. Arce Nazario, M. Jimenez, High-Level
    Partitioning Of DSP Algorithms For Multi-FPGA
    Systems. Poster presentation. GEM Consortium
    Future Faculty and Professionals Symposium. Las
    Vegas, NV, June 2004.
  • R. Arce Nazario, M. Jiménez, High-Level
    Partitioning Of DSP Algorithms For Multi-FPGA
    Systems - A First Approach, Proceedings of the
    Computing Research Conference, Mayagüez, PR,
    April 2004
  • R. Arce Nazario, M. Jimenez, Integer Pair
    Representation for Multiple Output Logic, PRSGC
    Second Congress on Integrating NASA Research and
    Education Projects in Puerto Rico, San Juan, PR,
    November 2003
  • R. Arce Nazario, M. Jiménez, Integer Pair
    Representation for Multiple Output Logic,
    Proceedings of the Computing Research Conference,
    Mayagüez, PR, April 2003.

41
References cited in presentation
  • Garey76 M. Garey, D. Johnson, and L.
    Stockmeye. Some simplified NP-complete graph
    problems. Theoretical Computer Science,
    (1)237267, December 30, 1976.
  • VanLoan92 Charles VanLoan. Computational
    frameworks for the fast Fourier transform.SIAM,
    1992.
  • Johnson90 J. Johnson and R. Johnson and D.
    Rodriguez and R. Tolimieri, "A Methodology for
    Designing, Modifying, and Implementing Fourier
    Transform Algorithms on Various Architectures.
    Circuits, Systems, and Signal Processing 9,
    449500. 1990
  • Sriniva01 Vinoo Srinivasan, Sriram
    Govindarajan, and Ranga Vemuri. Fine-grained and
    coarse-grained behavioral partitioning with
    effective utilization of memory and design space
    exploration for multi-FPGA Architectures. IEEE
    Trans. Very Large Scale Integr. Syst.,
    9(1)140159, 2001.
  • SPIRAL05 Püeschel, et al. SPIRAL Code
    Generation for DSP Transforms. Proceedings of the
    IEEE special issue on "Program Generation,
    Optimization, and Adaptation," Vol. 93, No. 2,
    2005, pp. 232-275
  • Hagen95 Lars W. Hagen, Dennis J. H. Huang, and
    Andrew B. Kahng. Quantified suboptimality of VLSI
    layout heuristics. In Proceedings of the 32nd
    ACM/IEEE conference on Design automation, pages
    216221, New York, NY, USA, 1995. ACM Press.

42
Questions
Write a Comment
User Comments (0)
About PowerShow.com