The SHARC - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The SHARC

Description:

Clare Smtih SHARC Presentation. 1. The SHARC. Super Harvard Architecture Computer ... Clare Smtih SHARC Presentation. 25. How optimal is the SHARC for non-DSP ... – PowerPoint PPT presentation

Number of Views:540
Avg rating:3.0/5.0
Slides: 27
Provided by: clare57
Category:
Tags: sharc | clare

less

Transcript and Presenter's Notes

Title: The SHARC


1
The SHARC
  • Super Harvard Architecture Computer

2
The SHARC
  • Developed by Analog Devices
  • Optimized for demanding DSP and imaging
    applications.
  • 32 Bit floating point, with 40 bit extended
    floating point capabilities.
  • Large on-chip memory.
  • Ideal for scalable multi-processing applications.

3
Harvard Architecture
  • Program memory can store data.
  • Able to simultaneously read or write data at one
    location and get instructions from another place
    in memory.
  • 2 buses
  • Data memory bus.
  • Program bus.
  • Either two separate memories or a single
    dual-port memory.

4
Super Harvard Architecture
  • Many processor employ Harvard Architecture by
    having two separate memories or caches integrated
    into the processor chip
  • The SHARC is unique in that its internal memory
    is capable of holding a large program as well a
    large amount of data. This is what makes it
    SUPER!!!

5
DSP
  • Digital Signal Processor.
  • High speed, low overhead data movement and rapid
    computations required.
  • Usually has a small on-board ROM, RAM and single
    cycle multiply.
  • Designed to run single line, serial in, serial
    out, signal processing applications very fast.

6
DSP Computations
  • The inner product of two vectors is a common
    computation for determining energy or
    correlation.
  • The following C code is an example
    for (n0 nltlength n) result
    xn yn
  • The process which has the lowest instruction time
    will have the best performance.

7
SHARC DSP
  • The SHARC incorporates features aimed at
    optimizing such loops.
  • High-Speed Floating Point Capability
  • Extended Floating Point
  • These features are DSP specific.
  • Meaning, when applied to a non-DSP application
    performance may not be as optimal.

8
Floating Point and Extended Floating Point
  • The SHARC supports floating, extended-floating
    and non-floating point.
  • No additional clock cycles for floating point
    computations.
  • Data automatically truncated and zero padded when
    moved between 32-bit memory and internal
    registers.
  • Not accurate enough for scientific algorithms.
    Excellent signal to noise ratio.

9
SHARCs Internal Memory
  • Makes SHARC unique.
  • Size
  • Allows many complex functions to be preformed
    on-chip. Eliminating the need to move data
    between internal and external memory.
  • Memory size is significantly larger then most
    other high speed computational devices.
  • Dual-block, Dual-port
  • Optimizes the Harvard Architecture by allowing
    the fetch of instructions while performing data
    memory accesses.

10
Multiply and Accumulate Instructions on the SHARC
  • Like most DSPs the SHARC is able to compute a
    product and add the product to a running total in
    a single clock cycle.
  • The SHARCs super instruction is that it can
    multiply and accumulate while adding,
    subtracting, or averaging data in two other
    registers.
  • These instructions give the SHARC its 120
    megaflop rating.

11
Zero Overhead Loopingon the SHARC
  • A single instruction outside the loop performs
    loop set-up. Informing the SHARC that there is a
    loop approaching.
  • The instruction also includes the iteration count
    and termination condition.
  • This causes the pipeline to remain full during
    loop execution and also allows the termination
    condition to be tested in parallel.

12
DAGs on the SHARC
  • Data Address Generators are integer computation
    units that manage the indexing of registers.
  • Allows the SHARC to to fetch a value and update
    the index value.
  • If the updated value exceeds a limit, the DAB
    adjusts the index so that it wraps.
  • This occurs in the same clock cycle as the read
    or write.

13
DAG Capabilities
  • Circular Buffering
  • Rather then actually moving data in and out of a
    vector, circular buffers are used.
  • Updating the index modulo, the oldest entry can
    be conveniently replaced by the newest entry.
  • Bit Reverse Addressing
  • The bit pattern of a vector index is reversed.
  • Done automatically by the SHARC.
  • Required for Fast Fourier Transform (FFT), which
    is often critical to DSP applications.

14
SHARC DSP
  • What Makes the SHARC unique?
  • It also has some features not related directly
    related to optimizing numeric computations.
  • Pipelining
  • Handling Branches
  • Why has this not emerged sooner?
  • Technology has only recently become available to
    make it economical to integrate general single
    computing devices.

15
SHARCs Pipeline
  • 3 stages
  • Instruction Fetch
  • Decode
  • Execution
  • Takes three clock cycles for an instruction to
    propagate through the pipeline.
  • The processor execution speed is one instruction
    per clock cycle even though each instruction
    requires three clock cycles.

16
SHARCs Handling BranchesDelayed Branching
  • When a branch instruction is encountered the two
    instructions which have been loaded and decoded
    are executed before the branch.
  • This keeps the pipeline full and avoids junking
    those two instructions and reloading the
    pipeline.
  • Beneficial in situations such as a few
    instruction loops. When the ratio of wasted
    clock cycles to instructions is significant.

17
SHARCs Handling BranchesNon-delayed Branching
  • Traditional branching.
  • If the pipeline cannot be reordered to use
    delayed branching, non-delayed branching is space
    saving.
  • Uses only one word of storage.
  • Although, it takes three cycles as the pipeline
    gets reloaded.

18
Multi-processing
  • SHARC is uniquely equipped for multi-processing.
  • Links to ports are very powerful multi-processing
    capabilities.
  • Two main program models depending on the
    application.
  • Adapts well to different multi-processing
    architectures.

19
Multi-processingSHARC Links
  • SHARC has 6 link ports that can transport data at
    rates up to 40Mbytes/sec.
  • Links designed for point-to-point connections.
  • Data can be transmitted in either direction but
    not both simultaneously.

20
Multi-processing Program ModelMIMD
  • Multiple instruction, multiple data.
  • Good for applications that require multiple
    instruction threads to execute concurrently.
  • Processors operate individually.
  • Each processor executes different code.
  • Typically used for image reconstruction and
    multi-channel DSP.

21
Multi-processing Program ModelSIMD
  • Single instruction, multiple data.
  • Works best when all processors execute identical
    instruction sequences.
  • Do not require overhead for inter-processor
    synchronization.
  • Typically used for synthetic aperture radar and
    automatic target recognition.

22
Multi-processing ArchitecturesCluster Design
  • Groups of up to 6 in a cluster
  • Most common for joining multiple SAHRC's
  • All processors, global I/O and global memory
    connected to a common Cluster bus.
  • Each SHARC can drive the bus.

23
Multi-processing ArchitecturesMesh Design
  • All SHARCs joined by their link ports and are
    connected to a common bus.
  • In SIMD mode one single master SHARC drives the
    bus.
  • In MIMD mode mesh architecture cannot function if
    data is lager then on chip available memory.
  • Advantageous scalability over a wider range of
    applications.

24
Summary of what makes the SHARC Super
  • It performs excellently for DSP applications.
  • Employs a Harvard Architecture with very large on
    chip memory.
  • Respectable Megaflop rating.
  • Its multiprocessing capabilities.

25
How optimal is the SHARC for non-DSP Applications?
  • It is obviously geared for DSP applications.
  • While it may fare better then other processors it
    is still behind those which are designed
    specifically for non-DSP applications.

26
Sources
  • www.alacron.com/news/tp_mimd_simd.htm
  • www.analog.com
  • www.cs.seas.gwu.edu/cs339/cs339-lecture2.pdf
  • www.ixthos.aa.psiweb.com/technical/notes_articles/
    articles
Write a Comment
User Comments (0)
About PowerShow.com