Configurable Computing for Mainstream Software Applications - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Configurable Computing for Mainstream Software Applications

Description:

SPEED Test Results. NOTE: These test results were obtained using Windows NT 4.0. October 4, 2002 ... Clock speed only accounts for a factor of 15x ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: william155
Category:

less

Transcript and Presenter's Notes

Title: Configurable Computing for Mainstream Software Applications


1
Configurable Computing for Mainstream Software
Applications
  • William D. Bishop
  • wdbishop_at_computer.org

2
Presentation Outline
  • Introduction to configurable computing
  • Motivation
  • Definitions and concepts
  • Niche applications
  • Research into configurable computing for
    mainstream software
  • Configurable computing experiments
  • Test results
  • Observations
  • Conclusions

3
Motivation
"For a given class of problems, one set of basic
instructions may be more efficient than another
such set" John von Neumann, 1958
  • In other words
  • Application-specific computer hardware may be
    more efficient than general-purpose computer
    hardware for solving a given class of problems

4
Introduction to Configurable Computing
  • Definition of a configurable computer
  • Configurable computers offer the following
    advantages
  • Increased control logic (a.k.a. processing units)
    flexibility
  • Increased datapath (a.k.a. wiring) flexibility
  • Ability to specialize the computer hardware at
    runtime

A configurable computer is a computing device
that provides hardware that may be modified at
runtime to efficiently compute of a set of tasks.
5
Building a Configurable Computer
  • The basic building block of a configurable
    computer is the High-Density Programmable Logic
    Device (HDPLD).
  • Suitable HDPLDs have the following features
  • Large capacity for digital hardware designs
  • Electrically programmable in-system
  • Support for high-speed reconfiguration
  • SRAM-based device

A Modern HDPLD The Altera 10K100 CPLD
Photo Courtesy of Altera
6
Types of Configurable Computers
  • Loosely-Coupled
  • Configurable coprocessor connected to a host
    computer via a peripheral bus
  • Tightly-Coupled
  • Configurable coprocessor connected directly to
    the system bus of a host computer
  • Configurable Instruction Set Computer
  • Processor utilizes configurable hardware to
    implement instructions

7
Niche Applications of Configurable Computers
  • What are niche applications of configurable
    computers?
  • Applications that use bit-wise computations or
    integer arithmetic
  • Applications with course-grain computations
  • Examples of niche applications include the
    following
  • Image processing Athanas, 1995 (138? 236?)
  • Cryptography Vuillemin, 1996 (10? 1000?)
  • Hardware emulation Dubois, 1995 (123? 207?)
  • Performance improvements of 10? to 1000? are
    typical.

8
Research Goals
  • Develop a model of a configurable computer
  • Conduct experiments to quantify key factors that
    influence the performance of configurable
    computers
  • Use the model to predict the performance of a
    configurable computer for mainstream software
    applications
  • Propose a configurable computer architecture for
    mainstream software applications

9
Platform I PC ARC-PCI Board
  • Processor Pentium III
  • 450 MHz Pentium III
  • 512 MB of SDRAM (10 ns)
  • L1 and L2 Cache
  • Coprocessor ARC-PCI
  • Three FLEX 10K50 Devices
  • 8640 LEs (Logic Elements)
  • 60KB SRAM ( 20 ns)
  • Operating System Windows NT 4.0
  • Custom ARC-PCI Device Driver

Loosely-Coupled Configurable Computer
Photo Courtesy of Altera
10
Platform II Excalibur Development Board
  • Processor 32-Bit Nios
  • 33 MHz Nios 2.0
  • Optimized for speed
  • Hardware multiplication
  • 256 KB SRAM ( 30 ns)
  • Coprocessor APEX 20K200E
  • One APEX 20K200E Device
  • 8,320 LEs
  • 104 KB SRAM ( 10 ns)
  • Operating System NONE

Tightly-Coupled Configurable Computer
Figure Courtesy of Altera
11
Configurable Computing Experiments
  • The following experiments were conducted
  • Platform I Tests
  • CSIM Coprocessor Tests
  • Hardware Timer (SPEED) Tests
  • Platform I and II Tests
  • Pseudo-Random Number Generation (RAND) Tests
  • Min Heap Insertion and Deletion (MIN) Tests

12
Hardware Timer (SPEED) Tests
  • Hardware timer specifications
  • Synthesizable VHDL design implemented on Platform
    I only
  • Hardware timer with a 30 ns resolution
  • Simple control / status register interface
  • Implemented on Platform I only
  • Developed application software to investigate the
    actual time required to transfer data
  • Computes transfer time between the application
    and the hardware
  • Computes transfer time between the device driver
    and the hardware

13
SPEED Test Results
NOTE These test results were obtained using
Windows NT 4.0
14
Pseudo-Random Number Generation (RAND) Tests
  • Pseudo-random number generator specifications
  • Synthesizable VHDL design suitable for both
    platforms
  • Linear Congruential Generator (LCG)
  • Generates 100 streams of 32-bit unsigned values
  • Exploits parallelism through pre-calculation
  • Developed application software to test the
    generator
  • Computes a total of 500,000,000 pseudo-random
    numbers per test

15
RAND Test Results
16
Min Heap Insertion and Deletion (MIN) Tests
  • Min heap specifications
  • Synthesizable VHDL design suitable for both
    platforms
  • Maximum heap size of 1000 entries
  • Supports insertion and deletion
  • Exploits parallelism
  • Developed application software to test the min
    heap
  • Performs a total of 5,000,000 insertions and
    deletions per test

17
MIN Test Results
NOTE These test results were obtained using a
heap with a maximum of 1000 entries
18
Observations
  • Context-switching of the operating system
  • Approximately 2 us for Windows NT 4.0
  • No operating system executing on Platform II
  • Memory utilization and bandwidth
  • PC can read its memory at least twice as fast as
    the FLEX 10K50
  • Nios and APEX 20K200E read memory at the same
    speed
  • Bus utilization and bandwidth
  • Under light loads, PCI bus reads take
    approximately twice as long on average as they
    should theoretically (544 ns vs. 300 ns)
  • Bus contention doesnt occur in Platform II

19
Observations (cont.)
  • Processing power
  • Pentium III is approximately 75x to 100x faster
    than Nios processor
  • Clock speed only accounts for a factor of 15x
  • Super-scalar architecture and cache subsystem
    result account for additional processing power of
    Pentium III
  • Exploitation of parallelism
  • Depends upon the application and its granularity
  • Can recover time lost to configuration,
    context-switching, memory utilization, and bus
    utilization

20
Conclusions
  • Loosely-coupled configurable computers are not
    suitable for mainstream software applications due
    to operating system overhead, communication
    latency and poor memory bandwidth.
  • Tightly-coupled configurable computers are
    suitable for mainstream software applications.
  • Configurable computing may be useful for embedded
    systems.

21
Selected Configurable Computing References
  • Katherine Compton and Scott Hauck, Reconfigurable
    Computing A Survey of Systems and Software. ACM
    Computing Surveys, Vol. 34, No. 2. pp. 171-210.
    June 2002.
  • Peter M. Athanas and A. Lynn Abbott. Addressing
    the Computational Requirements of Image
    Processing with a Custom Computing Machine An
    Overview. In Proceedings of the Ninth
    International Parallel Processing Symposium
    Special Workshop on Reconfigurable Architectures
    and Algorithms, Santa Barbara, California, April
    1995.
  • Jean E. Vuillemin, Patric Bertin, Didier Roncin,
    Mark Shand, Hervé H. Touati, and Philippe
    Boucard. Programmable Active Memories
    Reconfigurable Systems Come of Age. IEEE
    Transactions on Very Large Scale Integration
    (VLSI) Systems, 4(1)56-69, March 1996.
  • Michel Dubois, Alain Gefflaut, Jaeheon Jeong,
    Adrian Moga, and Koray Oner. Multiprocessor
    Emulation with RPM Early Experience. Technical
    Report CENG95-23, University of Southern
    California, Los Angeles, California, December
    1995.
  • William Bishop, ARC-PCI Website,
    http//www.pads.uwaterloo.ca/wdbishop/arc-pci.htm
    l.
Write a Comment
User Comments (0)
About PowerShow.com