CPU And DSP - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

CPU And DSP

Description:

The ORF has sixteen 16-bit registers, r0 r15. The ORF registers can be used in pairs to form eight 32-bit operands, as shown in Figure 2.5. The odd numbered ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 66
Provided by: QWX
Category:
Tags: cpu | dsp | processor

less

Transcript and Presenter's Notes

Title: CPU And DSP


1
CPU And DSP
  • ???

2
First Problem
  • Purpose Processor
  • ARM,MIPS and Others
  • Configurable Processor
  • Tensilica,ARC and Others
  • CPU with configurable function
  • ARM,MIPS
  • DSP
  • ZSP,CEVA,Starcore
  • SIMD , VLIW , Superscaler

3
Superscalar vs. VLIW
  • Goal CPI lt 1
  • Superscalar
  • ???????????????(1 -8?)
  • ???????????
  • VLIW
  • ????????????(??)?? (4-16)
  • ??????,???????????????????

4
Superscalar vs. VLIW
  • ?????
  • ???????
  • ?????
  • ??????????????
  • ????????
  • ??????,??????????????????????

5
Analysis
  • Instructions
  • MUL/MAC
  • Load/Store
  • Acceleration Instructions
  • Accesses
  • Registers(number,width,group)
  • Addressing Mode
  • Memory Accesses (number,width,group)
  • Others Acceleration
  • SIMD
  • VLIW
  • Supper-Scalar

6
Analysis
  • Base
  • Structure
  • Pipeline
  • Detail

7
Tensilica
  • Xtensa
  • Xtensa Core Vectra DSP Xtensa
  • Xtensa is high configurable RISC CPU
  • Diamond is based on Xtensa
  • Tensilica serial processors are based Diamond
  • Most Tensilica CPUs have SIMD and VLIW
    instructions

8
Xtensa LX architecture
9
Xtensa Pipeline
10
Vectra
  • the Vectra LX engine is a three-slot FLIX
    machine.
  • 210 instructions in the Vectra LX engine
  • Vector load/store operations
  • Real and complex vector operations
  • Circular buffer management
  • Vector element select operations

11
Vectra VLIW
  • SIMD VLIW

12
Vectra SIMD
  • Vectra LX engine supports 128-bit loads and
    stores
  • Because a 128-bit interface to memory is
    required, it is necessary to configure the Xtensa
    core with a processor interface (PIF) width of
    128 when using the Vectra LX DSP engine.
  • The Vectra LX engine includes four 128-bit
    Alignment registers, and four 32-bit specialized
    Select registers. It also includes the following
    four special state registers
  • 5-bit Variable Shift Amount (VSAR)
  • 40-bit rounding register (ROUND)
  • 32-bit circular buffer support begin address
    (CBEGIN)
  • 32-bit circular buffer support end address
    (CEND).

13
570T
Diamond 570T Block Diagram
14
Core architecture
  • 5-stage pipeline with 64/24/16-bit ISA
  • 3-issue VLIW also operates as a 2-issue VLIW
  • Base ALU, barrel shifter
  • 32x32 integer single-cycle multiplier
  • 32-bit integer divider
  • Miscellaneous (NSA, MIN/MAX, SignExtend)
    instructions
  • 32 entry AR register file
  • Integrated interrupt controller with 22
    interrupts with 6 priority levels
  • Integrated timer with 3 timers

15
Memory architecture
  • 16K, 2-way associative instruction cache
  • 16K, 2-way associative write-back data cache
  • Single local, tightly-coupled instruction RAM
    (can be populated from 0 to 128KB)
  • Single local, tightly-coupled instruction RAM
    (can be populated from 0 to 128KB)
  • Single local, tightly-coupled data RAM (can be
    populated from 0 to 128KB)
  • 64-bit XLMI general purpose memory and device
    interface

16
System
  • 64-bit PIF system interface with 8 write buffer
    entries
  • Inbound DMA on PIF
  • PIF to AHB-lite and PIF to AXI bridges available
  • On-chip debug (OCD) with JTAG interface
  • Trace port (version 3.0) with PC trace only

17
(No Transcript)
18
Slot Instructions
19
Slot Instructions
20
330HIFI
21
HiFi 2 Audio Engine
22
330HiFi Slot Instructions
23
HiFi2 ISA
  • Register
  • Instruction
  • Load and store
  • MUL and MAC
  • General Arithmetic Instructions
  • General Logic Instructions
  • Others Instructions

24
HiFi2 Load/Store
  • Four types of Addressing Mode
  • Can Load/Store a pair of 16-bit or 24-bit vector
    data

25
HiFi2 Arithmetic Instructions
  • 24x24 or 16x16 single MUL/MAC
  • 24x24 dual MUL/MAC
  • 32x16 single or dual MUL/MAC
  • Add, subtract, advanced compare
  • (all are scalar, more are 24-bit, some can have
    Q)

26
Bit Stream and Variable-Length Encode and Decode
Instructions
  • 32-bit Variable-Length Decode Table Entry
  • 16-bit Variable-Length Decode Table Entry

27
Other Instructions
  • Normalize Shift Amount Operations
  • Truncate, Round, Saturate, Convert, and Move
    Operations
  • Selection and Permutation Operations
  • Bitwise Logical Operations
  • Zero Operations

28
HiFi2 Instructions Sample
29
330HiFi architecture
  • 5-stage pipeline with 64/24/16-bit ISA
  • 2-issue VLIW
  • Base ALU, barrel shifter
  • 32x32 integer single-cycle multiplier
  • 32-bit integer divider
  • Miscellaneous (NSA, MIN/MAX, SignExtend)
    instructions
  • 32 entry AR register file
  • Integrated interrupt controller with 22
    interrupts with 6 priority levels
  • Integrated timer with 3 timers

30
Memory architecture
  • 4K, 2-way associative instruction cache
  • 8K, 2-way associative write-back data cache
  • Single local, tightly-coupled instruction RAM
    (can be populated from 0 to 128KB)
  • Dual local, tightly-coupled instruction RAM (can
    be populated from 0 to 128KB)
  • ??64bit load

31
(No Transcript)
32
ZSP500
  • The ZSP500 core is a four-way superscalar, dual
    MAC digital signal processor
  • RISC-based superscalar architecture The ZSP500
    architecture can prefetch eight instruction words
    each clock cycle. Running at 250 MHz, a
    ZSP500-based device can execute 1000 MIPS.
  • The ZSP500 Core has three main types of
    registers
  • operand registers,
  • address registers,
  • control registers.

33
ZSP500
34
(No Transcript)
35
General Purpose (ORF) Registers and Guard
Registers
  • The ORF has sixteen 16-bit registers, r0r15. The
    ORF registers can be used in pairs to form eight
    32-bit operands, as shown in Figure 2.5. The odd
    numbered register in a register pair is the upper
    half of the 32-bit word. ORF register pairs can
    be extended to 40 bits by using an 8-bit guard
    register.

36
address registers
  • The ARF contains eight 32-bit address registers,
    a0a7. Each of these registers has an associated
    16-bit index register, n0n7, shown in Figure
    2.6. These registers are intended for data
    pointers, but they can also be used for temporary
    storage. Although the ZSP500 address range is 24
    bits, the additional bits in the a0a7 registers
    allow for compatibility with future 32-bit ZSP
    cores1.

37
Instruction Grouping
  • A group, consisting of up to four instructions,
    can be passed from the GR stage to the RD stage
    in one processor cycle

38
CEVA TeakliteIII
39
Parallel Instructions
  • CBU and DAAU
  • Logic/Arithmetic operation and load/store
  • DAAU
  • Parallel load/store
  • CBU

40
Pipeline
  • The core has a 10-stage pipeline, arranged in
    four groups. The stages are
  • IF Instruction Fetch, which is divided into
    two substages, IF1 and IF2
  • D Decode, which is divided into two
    substages, D1 and D2
  • OF Operand Fetch, which is divided into three
    substages, OF1, OF2 and OF3
  • E Execute, which is divided into three
    substages, E1, E2 and E3

41
Instruction Flow
  • One-Stage CBU Pipeline Flow
  • Three-Stage CBU Pipeline Flow

42
Instruction Flow
  • Scalar Unit Instruction Flow
  • Load Instruction Flow

43
PCU
44
DAAU
  • 8x32 General registers
  • 5x16 Special registers
  • 1x32 SP,2x32 Configuration Registers
  • 2 AGUs

45
AGU
  • Two AGUs
  • Address generation mechanism
  • Pointer modification mechanism
  • Addressing Mode
  • Indirect Addressing Mode
  • Indexed Addressing Mode
  • Direct Addressing Mode
  • Stack Addressing Mode
  • Memory Access
  • ?????

46
CBU
  • four 36-bit accumulators
  • Each 36-bit accumulator is organized as two
    16-bit parts (marked as abXh and abXl) and four
    extension bits (marked as abXe). The extension
    bits are also referred to as guard bits.

47
MUL MAC
48
Starcore
49
Other Acceleration
  • Exponent Acceleration (EXP)
  • Saturation Acceleration
  • Division Acceleration
  • FFT Acceleration
  • Viterbi Decoder Acceleration

50
Architecture
  • Arithmetic Unit
  • Data arithmetic and logic unit (DALU)
  • Address generation unit (AGU)
  • Program sequencer unit (PSEQ)
  • DATA IO Unit
  • Two data memory buses (address and data pairs
    XABA and XDBA, XABB and XDBB) 128 bits data width
    and 64 bits address.
  • Program data and address buses (PDB and PAB) for
    carrying program words from the memory to the
    core.
  • Special buses to support tightly coupled external
    user-definable instruction set accelerators.

51
Pipeline
  • Pre-fetch stage
  • Fetch stage
  • Dispatch stage
  • Address generation stage
  • Execution stage

52
  • The first three stages are implemented in the
    program sequencer unit (PSEQ). The last two
    stages are implemented in the AGU and DALU,
    respectively.
  • To support parallel execution, the core uses a
    variable length execution set (VLES) architecture
    with a static grouping mechanism. Several
    instructions can be grouped together to form an
    execution set, which is dispatched to the
    execution units in parallel. The core contains
    four ALUs and two AAUs. An execution set can
    contain up to six instructions with a maximum of
    eight words. For many instructions, an execution
    set takes only one clock cycle.

53
DALU
54
  • A register file of sixteen 40-bit registers, with
    Arithmetic Saturation Mode.
  • Four parallel ALUs, each containing a MAC unit
    and a BFU with a 40-bit barrel shifter
  • Eight data bus shifter/limiters that allow
    scaling and limiting of up to four 32-bit
    operands transferred over each of the XDBA and
    XDBB buses in a single cycle

55
Bit-Field Unit (BFU)
  • Multi-bit left/right shift (arithmetic or
    logical)
  • One-bit rotate (right or left)
  • Bit-field insert and extract
  • Count leading bits (ones or zeros)
  • Logical operations
  • Sign or zero extension operations

56
AGU
57
AGU Register
  • Eight low bank address registers (R0R7)
  • Eight high bank address registers (R8R15), or
    alternatively, eight base address registers
    (B0B7)
  • Two stack pointers (NSP, ESP), only one of
    which is active at a time (SP)
  • Four offset registers (N0N3)
  • Four modifier registers (M0M3)
  • A modifier control register (MCTL)
  • Two address arithmetic units (AAU)
  • One bit mask unit (BMU)

58
Variable Length Execution Set (VLES)
59
MIPS24K
60
Summery
  • 330HiFi
  • 32 registers,3 MUL/MAC,SIMD,VLIW,Advanced
    Instructions, 32 bus cache
  • ZSP500
  • 168 registers,2MUL/MAC, SIMD, SuperScalar, 4
    instructions group ,128-bit address bus,64x2 data
    bus TCM
  • TeakliteIII
  • 88 registers, 2MUL/MAC, SIMD, SuperScalar, 23
    instructions group, 32x2 address bus,32x2 data
    bus TCM
  • Sc140e
  • 16844 registers, 4MUL/MAC, SIMD, SuperScalar,
  • MIPS24K
  • RISC

61
BDTI Mark
62
BDTI Mark
63
Xtensa Features
64
Scalar Unit (SC)
65
Multiply-Accumulate (MAC) Unit
Write a Comment
User Comments (0)
About PowerShow.com