Title: Configurable Computing for Mainstream Software Applications
1Configurable Computing for Mainstream Software
Applications
- William D. Bishop
- wdbishop_at_computer.org
2Presentation Outline
- Introduction to configurable computing
- Motivation
- Definitions and concepts
- Niche applications
- Research into configurable computing for
mainstream software - Configurable computing experiments
- Test results
- Observations
- Conclusions
3Motivation
"For a given class of problems, one set of basic
instructions may be more efficient than another
such set" John von Neumann, 1958
- In other words
- Application-specific computer hardware may be
more efficient than general-purpose computer
hardware for solving a given class of problems
4Introduction to Configurable Computing
- Definition of a configurable computer
- Configurable computers offer the following
advantages - Increased control logic (a.k.a. processing units)
flexibility - Increased datapath (a.k.a. wiring) flexibility
- Ability to specialize the computer hardware at
runtime
A configurable computer is a computing device
that provides hardware that may be modified at
runtime to efficiently compute of a set of tasks.
5Types of Configurable Computers
- Loosely-Coupled
- Configurable coprocessor connected to a host
computer via a peripheral bus - Tightly-Coupled
- Configurable coprocessor connected directly to
the system bus of a host computer - Configurable Processing Unit
- Processor utilizes configurable hardware to
implement a dynamic instruction set
6Niche Applications of Configurable Computers
- What are niche applications of configurable
computers? - Applications that use bit-wise computations or
integer arithmetic - Applications with course-grain computations
- Examples of niche applications include the
following - Image processing Athanas, 1995 (138? 236?)
- Cryptography Vuillemin, 1996 (10? 1000?)
- Hardware emulation Dubois, 1995 (123? 207?)
- Performance improvements of 10? to 1000? are
typical.
7Research Goals
- Develop a model of a configurable computer
- Conduct experiments to quantify key factors that
influence the performance of configurable
computers - Use the model to predict the performance of a
configurable computer for mainstream software
applications - Propose a configurable computer architecture for
mainstream software applications
8Platform I PC ARC-PCI Board
- Processor Pentium III
- 450 MHz Pentium III
- 512 MB of SDRAM (10 ns)
- L1 and L2 Cache
- Coprocessor ARC-PCI
- Three FLEX 10K50 Devices
- 8640 LEs (Logic Elements)
- 60KB SRAM ( 20 ns)
- Operating System Windows NT 4.0
- Custom ARC-PCI Device Driver
Loosely-Coupled Configurable Computer
Photo Courtesy of Altera
9Platform II Excalibur Development Board
- Processor 32-Bit Nios
- 33 MHz Nios 2.0
- Optimized for speed
- Hardware multiplication
- 256 KB SRAM ( 30 ns)
- Coprocessor APEX 20K200E
- One APEX 20K200E Device
- 8,320 LEs
- 104 KB SRAM ( 10 ns)
- Operating System NONE
Tightly-Coupled Configurable Computer
Figure Courtesy of Altera
10Configurable Computing Experiments
- The following experiments were conducted
- Platform I Tests
- CSIM Coprocessor Tests
- Hardware Timer (SPEED) Tests
- Platform I and II Tests
- Pseudo-Random Number Generation (RAND) Tests
- Min Heap Insertion and Deletion (MIN) Tests
11Hardware Timer (SPEED) Tests
- Hardware timer specifications
- Synthesizable VHDL design implemented on Platform
I only - Hardware timer with a 30 ns resolution
- Simple control / status register interface
- Implemented on Platform I only
- Developed application software to investigate the
actual time required to transfer data - Computes transfer time between the application
and the hardware - Computes transfer time between the device driver
and the hardware
12SPEED Test Results
NOTE These test results were obtained using
Windows NT 4.0
13Pseudo-Random Number Generation (RAND) Tests
- Pseudo-random number generator specifications
- Synthesizable VHDL design suitable for both
platforms - Linear Congruential Generator (LCG)
- Generates 100 streams of 32-bit unsigned values
- Exploits parallelism through pre-calculation
- Developed application software to test the
generator - Computes a total of 500,000,000 pseudo-random
numbers per test
14RAND Test Results
15Min Heap Insertion and Deletion (MIN) Tests
- Min heap specifications
- Synthesizable VHDL design suitable for both
platforms - Maximum heap size of 1000 entries
- Supports insertion and deletion
- Exploits parallelism
- Developed application software to test the min
heap - Performs a total of 5,000,000 insertions and
deletions per test
16MIN Test Results
NOTE These test results were obtained using a
heap with a maximum of 1000 entries
17Observations
- Context-switching of the operating system
- Approximately 2 us for Windows NT 4.0
- No operating system executing on Platform II
- Memory utilization and bandwidth
- PC can read its memory at least twice as fast as
the FLEX 10K50 - Nios and APEX 20K200E read memory at the same
speed - Bus utilization and bandwidth
- Under light loads, PCI bus reads take
approximately twice as long on average as they
should theoretically (544 ns vs. 300 ns) - Bus contention doesnt occur in Platform II
18Observations (cont.)
- Processing power
- Pentium III is approximately 75x to 100x faster
than Nios processor - Clock speed only accounts for a factor of 15x
- Super-scalar architecture and cache subsystem
result account for additional processing power of
Pentium III - Exploitation of parallelism
- Depends upon the application and its granularity
- Can recover time lost to configuration,
context-switching, memory utilization, and bus
utilization
19Conclusions
- Loosely-coupled configurable computers are not
suitable for mainstream software applications due
to operating system overhead, communication
latency and poor memory bandwidth. - Tightly-coupled configurable computers are
suitable for mainstream software applications. - Configurable computing may be useful for embedded
systems.
20Selected Configurable Computing References
- Katherine Compton and Scott Hauck, Reconfigurable
Computing A Survey of Systems and Software. ACM
Computing Surveys, Vol. 34, No. 2. pp. 171-210.
June 2002. - Peter M. Athanas and A. Lynn Abbott. Addressing
the Computational Requirements of Image
Processing with a Custom Computing Machine An
Overview. In Proceedings of the Ninth
International Parallel Processing Symposium
Special Workshop on Reconfigurable Architectures
and Algorithms, Santa Barbara, California, April
1995. - Jean E. Vuillemin, Patric Bertin, Didier Roncin,
Mark Shand, Hervé H. Touati, and Philippe
Boucard. Programmable Active Memories
Reconfigurable Systems Come of Age. IEEE
Transactions on Very Large Scale Integration
(VLSI) Systems, 4(1)56-69, March 1996. - Michel Dubois, Alain Gefflaut, Jaeheon Jeong,
Adrian Moga, and Koray Oner. Multiprocessor
Emulation with RPM Early Experience. Technical
Report CENG95-23, University of Southern
California, Los Angeles, California, December
1995. - William Bishop, ARC-PCI Website,
http//www.pads.uwaterloo.ca/wdbishop/arc-pci.htm
l.