Title: Syllabus Summary
1Syllabus Summary
Microelectronics for the Global World
Collaborative Engineering ECE992/777
2Lecture 1. Introduction
- Overview of Field Programmable Gate Arrays
(FPGAs) design development within the appropriate
software/firmware components development
environment. - In the global design world, we will have to deal
with Intellectual Property (IP), IP testing,
trust and design efficiency. - Differences in technology status, design
environments and proficiency will lead to the
need for tools for design-space excursions and
optimizations. - As a result of the taught design elements,
globally distributed engineering can be
accomplished.
3- The course is divided into three parts,
-
- collaborative design elements,
- the collaborative development process and
- the subsequent approach for integration and
optimization.
4Part I.
Collaborative Design Elements
5Lecture 2. Outsourcing Economy
- Outsourcing has been approached fearfully but can
also be approached as an opportunity to
innovation - We will review traditional versus outsourcing
driven design methodology - Security/Trust issues related to foreign killer
chips will also be discussed.
6Global Collaboration in Outsourcing
7Disruptive Technologies
Performance trajectory of present technology
driven by sustaining technological improvements
Performance that customers can absorb or utilize
Performance
New performance trajectory
Disruptive Technology
Time
Clayton M. Christensen, The Innovators Dilemma
When New Technologies Cause Great Firms to
Fail, HarperBusiness, 2000 (Revised Edition)
8Lecture 3. Reconfigurable Computing
- Positioned in computing densities between
Application-Specific Integrated Circuits (ASICs)
and Digital Signal Processors (DSPs), FPGAs
provide increased flexibility in computational
details such as degrees of parallelism and
pipelining, as well as real-estate and power
consumption over DSPs and General-Purpose (GP)
microprocessors.
9Estrins Fixed Plus Variable Structure Computer
Organization of Computer Systems - The Fixed Plus
Variable Structure Computer, Gerald Estrin, Proc.
WJCC, 1960
10FPGA Architectural Developments
- Traditional Sea-of-CLBs (Xilinx, Altera)
- Extreme-DSP, FPGA with embedded 192 18x18
multipliers (Xilinx, Altera), with embedded
PowerPC cores, RapidIO cells (Xilinx) - Fixed-plus-Variable, that is core processor with
FPGA (Quicksilver, Stretch) - Macro-Pipeline Processor (PipeRench)
- Sea-of-ALUs, chunky arrays (MorphoSys,
MathStar) - Dynamic Reconfiguration (IPFlex)
- DARPA-sponsored Polymorphous Computing
Architectures (PCA) developments
11Xilinx Virtex-4 FPGA Family
12MathStar FPOA
- Chunky, gross-grain array
- Five Silicon Object types
- Arithmetic Logic Unit (ALU)
- Content Addressable Memory (CAM)
- Cyclic Redundancy Check (CRC)
- Multiply Accumulator (MAC)
- Register File (RF).
- In addition RAM memory resources are distributed
in the array. - The function and ratio of these different Silicon
Objects are chosen based on detailed study of
applications space for the product offerings.
13Processing Spectrum Continuum
ASIC
FPGA
DSP
GPU
Sea-of-CLBs
lt
-------
gt
Sea-of-ALUs
lt
----------------
gt
Fixed-plus-Variable
lt
---------------------------------------
gt
Macro-Pipeline
lt
-----------------------------------------------
gt
Dynamic Reconfiguration
lt
-------
gt
VHDL/Verilog
C/C
lt
-----
-------
gtlt
----------
-------
gt
lt
---------------------------
----------------------
gt
SystemC
14Efficiency versus Application Space
PCA
ASIC
FPGA
GP
SWEPT Efficiency
Vectors/ Streaming
Structured Bit-operations
Symbolic Operations
Application Types
Optimized Performance Over Broad Application Space
15Native Stream Mode
16Native Threaded Mode
17Application Flow
Control
1
2
3
StreamProcessing
MC-SM
MC-SM
MC-SM
Inter-chip I/O(crossbar)
Inter-chipMemoryTransfer
ThreadedProcessing
MC-TM
MC-TM
5
4
ParcelInterface
18Lecture 4. Levels of Abstraction
- It is a misconception to expect to be able to use
FPGA personalization bit-level code, in order to
update/upgrade. - Too many technology-specific design decisions
have been made to get to that particular
synthesized code pattern. - Only optimization at higher levels of abstraction
will payoff in the long run. - Liev01 P. Lieverse, P. van der Wolf, E.
Deprettere, K. Vissers, A Methodology for
Architecture Exploration of Heterogeneous Signal
Processing Systems, Journal of VLSI Signal
Processing, 29, 197206 (2001), Kluwer Academic
Publishers, Boston
19The Design Pyramid LIEV01
20Effect of Abstraction Level
Relative Efficiency
Compiler Performance
Tradeoff Curve Optimization Potential
VHDL
SystemC
UML
Abstraction Level
21Lecture 5. Design Flow
- Design elements from UML down to VHDL, including
SystemC, MathWorks Simulink and Xilinx
SystemBuilder will be reviewed, as well as
general design/test flows.
22(No Transcript)
23SystemC-based Hardware/Software Co-Design
System Behavior
System Architecture
Mapping
Performance Simulation
Refine
Implementation
Software
Hardware
Keutzer, K., Malik, S., Newton, R., Rabacy, J.,
Sangiovanni-Vincentelli, A., System Level
Design Orthogonalization of Concerns and
Platform Based Design, IEEE Transactions on
Computer-Aided Design of Circuits and Systems,
2000, 19(12)
24Lecture 6. Tools for Design
- The state-of-the-art of design elements needed
for collaborative design development, including
verification, trade-off and optimization tools
will be described and evaluated.
25MILAN
- MILAN is a model-based, extensible simulation
framework that provides a unified environment
capable of - modeling a large class of embedded systems and
applications - seamlessly integrating different widely-used
simulators into a single framework - enabling rapid evaluation of performance metrics
such as power, latency, and throughput - facilitating simulation at various levels of
granularity - rapid evaluation of a large design space
MILAN, Institute for Software Integrated Systems,
Vanderbilt University, Nashville
26The MILAN Architecture
GME 2000
Design Space Exploration Tools
Functional Simulators
High-level Power Estimators
Cycle-Accurate Power Simulators
System Generation and Synthesis Tools
Target System
Model interpreter feeding-back results
Model interpreter driving simulators/tools
i
i
MILAN, Institute for Software Integrated Systems,
Vanderbilt University, Nashville
27Part II.
Collaborative Development
28Lecture 7. Intellectual Property (IP)
- The IP business model and some of its limitations
will be reviewed, several other business
propositions such open model and fabless design
companies will be analyzed. - Business Proposition, Cost Model
- Re-use Potential, Patentable
- Hardcore or Softcore IP
- Hardware versus Software Components
29Lecture 8. Open Standards VIA, OCP, VSIA
- Interface standards defined and developed for, in
particular, System-on-Chip design will be
reviewed and analyzed for compatibility to IP
component development. - Open Core Protocol (OCP)-IP www.ocpip.org
- Virtual Interface Architecture (VIA)
- Virtual Sockets Interface Alliance (VSIA)
www.vsia.org
30Lecture 9. Component/System Testing
- Testability aspects of firmware components,
including generation of test-vectors, assessment
of coverage, JTAG testing and test monitor
concept will be illustrated. -
- Intellitech (Durham, NH) TEST-IP
- Plug and Play Scan Components
- Boundary Scan
- Self-Test
- Observability
31Lecture 10. Trusted Circuits
- The use of more globally developed ICs has
increased the need for tools to support the
trustable development of complex and
performance sensitive applications. - ..develop enabling trusted assembly,
integration, and test technologies that verify
the correctness, reliability, and functionality
of designed Integrated Circuits (ICs), i.e.,
approaches that enable IC users to fully trust
the ICs they employ. DARPA SBIR 2005.2
32Part III.
Collaborative Integration and Optimization
33Lecture 11. Component Tradeoffs
- In heterogeneous computing environments, the
constituting functions and subsystems can be
implemented at various points along their
respective design space tradeoff curves.
34Performance/Cost Tradeoffs
The Analysis of Processor-Time Trade-Off
Opportunities in a Reconfigurable Multi-Processor
System, H.A.E. Spaanenburg, Syracuse University,
1979
35Lecture 12. Design Excursions (SPADE)
- In the University of Leiden STEF02 approach
particular computational instances have been
transformed by small perturbations in the
design space. These techniques support a system
designer in exploring alternative instances of an
application mapped onto an architecture template.
- STEF02 T. Stefanov, B. Kienhuis, E. Deprettere,
Algorithmic Transformation Techniques for
Efficient Exploration of Alternative Application
Instances, Proceedings 10th International
Symposium on Hardware/Software Codesign
(CODES02), Estes Park, Colorado, May 6-8, 2002
36The Y-chart extended with the Application
TransformationLayer STEF02.
37Alternative instances of the application have to
begenerated, mapped onto the architecture
template and exploredin order to evaluate the
performance of the Application-Architecture pair
STEF02.
38Simple example illustrating the unfolding and
skewingtransformations STEF02.
39Lecture 13. Optimization (SPIRAL)
- A Carnegie Mellon University developed SPIRAL
PUCH05 program technique automatically
generates high performance code that is tuned to
the given platform. SPIRAL generates code for a
broad set of DSP transforms including the
discrete Fourier transform, other trigonometric
transforms, filter transforms, and discrete
wavelet transforms. - PUCH05 M. Püschel, J. Moura, J. Johnson, D.
Padua, M. Veloso, B. Singer, J. Xiong, F.
Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.
W. Johnson, and N. Rizzolo, SPIRAL Code
Generation for DSP Transforms, Proceedings of
the IEEE Special Issue on Program Generation,
Optimization, and Adaptation, Vol. 93, No. 2,
2005, pp. 232-275
40SPIRAL
Automates the
A library generator for highly optimized,
platform-adapted signal processing transforms
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
41SPIRAL Methodology
given
DSP Transform (DFT, DCT, Wavelets etc.)
given
Computer Architecture
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
42SPIRAL vs. FFTW (lower better)
Pentium III/Linux/gcc
Athlon/Linux/gcc
comparable performance
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
Pentium III/Win2000/Intel compiler
43Lecture 14. System Optimization
- The total system solution can be evaluated for
the right combination of design space points for
their constituting elements. - This procedure within the total system constraint
allows for an efficient process for increasing
benefits for the least incremental cost. - These procedures especially facilitate the
introduction of technology updates, since it
allows for the reestablishment of the proper
computational operating point for the combination
of the old and new technology.
44Processor-Time System Tradeoffs
The Analysis of Processor-Time Trade-Off
Opportunities in a Reconfigurable Multi-Processor
System, H.A.E. Spaanenburg, Syracuse University,
1979
45Order-of-Magnitude Improvements
Insertion of a next-level processor into an
embedded heterogeneous environment needs to
present an order-of-magnitude improvement
potential
MOPS Kg.Watt
ASIC
FPGA
X
DSP
1000
RISC
100
10
gt3.3x18 months 5 years
time
46Lecture 15. Heterogeneous Systems
- Heterogeneous processing systems currently
contain a continuum of processing alternatives
from general-purpose processors (GPP), to digital
signal processors (DSP), to Field-Programmable
Gate Arrays (FPGA) and Application-Specific
Integrated Circuits (ASIC). - Especially the FPGA domain has recently produced
its own range of architectural alternatives along
that processing continuum spectrum.
47Not One Machine Does Everything
Since no single architecture can satisfy the
needs of all users, it has been desirable to have
compute system whose architecture can be defined
and varied dynamically S.S. Reddi and E.A.
Feustal, A Conceptual Framework for Computer
Architecture, Computing Surveys, Vol. 8, No. 2,
June 1976
Top of Empire State Building in New York
Top of Foshay Tower in Minneapolis
Airport
Airport
48Performance-Flexibility Trades
1000
Dedicated ASICs
100
Energy Efficiency MOPS/mW (or MIPS/mW)
10
1
0.1
Flexibility (Coverage)
Pleiades Ultra-Low Power Hybrid and
Reconfigurable Computing, Jan Rabaey, UC
Berkeley, 1999
49Lecture 16. Upgrade/Updates, Technology
Transparency
- System developers must continue to reevaluate
which combination of implementation alternatives
will best meet their overall system requirements.
- This question is not only important for the
initial design, but also for subsequent
technology updates and upgrades, especially when
they have to be implemented in the same
constrained real estate.
50Upgrade/Update Approach
UML
UML-to-SystemC Front-end
SystemC
SystemC-to-VHDL Compiler
VHDL
VHDL
VHDL-to-FPGA Synthesizer
Design in Technology 1, e.g. Xilinx Virtex-4
Design in Technology 2, e.g. MathStar FPOA
51Lecture 17. Virtualization
- A virtual middleware architecture can be
carefully mapped onto an FPGA architecture. - This approach results in effective performance of
the virtual architecture, with maximum
parallelism and throughput. - To the system programmer the virtual
(middleware) machine will become its programming
environment. - Programming and code generation of the actual
virtual machine will make use of conventional
software tools, such as compilers and assemblers.
52Virtual Middleware Concept
53Virtual PSP Middleware Concept
54Conclusion
- In a recent interview with Electronics Weekly (9
May 2005), Wim Roelandts, president and CEO of
Xilinx made the following observation - The next step is really to make FPGAs disappear.
Today our customers are hardware engineers. But
FPGAs are programmable devices. If we can create
a level of abstraction that appeals to software
engineers, we can increase our customer base by
at least 10x. That's really where our future is.
As long as you have a set of interfaces that you
can programme to, you don't have to know what the
hardware looks like.