Hitachi SuperH SH-4 - PowerPoint PPT Presentation

About This Presentation
Title:

Hitachi SuperH SH-4

Description:

... because this is the processor used in the Sega Dreamcast video game system. I own a Dreamcast and after being assigned this project I became very interested ... – PowerPoint PPT presentation

Number of Views:354
Avg rating:3.0/5.0
Slides: 29
Provided by: Her19
Category:

less

Transcript and Presenter's Notes

Title: Hitachi SuperH SH-4


1
Hitachi SuperH SH-4
  • By Herman Sheremetyev
  • 5/10/2002

2
Inspiration
I was inspired to do this presentation on the
Hitachi SH-4 processor because this is the
processor used in the Sega Dreamcast video game
system. I own a Dreamcast and after being
assigned this project I became very interested in
its internal workings. As a result of my
research I found that there was quite a bit of
software ported to this platform, starting with a
NetBSD port and followed by a Linux port which
can actually transform the Dreamcast into a
usable X terminal. These ports were largely
possible due to the fact that Hitachi released
the complete specifications as well as a
Programmers Manual for the processor. What
follows are excerpts from the Hitachi Hardware
Manual that briefly describe SH-4s most
interesting aspects which I loosely tailored to
the Dreamcast implementation.
3
Sources
  • Most of the information in this presentation is
    taken from the Hitachi Hardware Manual on the SH4
    family of processors
  • The manual can be found at http//www.julesdcdev.
    com/ and probably on the Hitachi website

4
Features Summary
  • The SH-4 (SH7750 Series (SH7750, SH7750S)) has
    been developed as the top-end model in the
    SuperH RISC engine family, featuring a 128-bit
    graphic engine for multimedia applications and
    360 MIPS performance.

5
Features
  • In addition to single- and double-precision
    floating-point operation capability, the on-chip
    FPU has a 128-bit graphic engine that enables
    32-bit floating-point data to be processed 128
    bits at a time.
  • It also supports 4 4 array operations and inner
    product operations, enabling a performance of 1.4
    GFLOPS to be achieved.

6
Features
  • Operating frequency is 200Mhz
  • A superscalar architecture is employed that
    enables simultaneous execution of two
    instructions (including FPU instructions)
  • An 8-kbyte instruction cache and 16-kbyte data
    cache are also provided, and the on-chip memory
    management unit (MMU) handles translation from
    the 4-Gbyte virtual address space to the physical
    address space.

7
Registers
  • Sixteen 32-bit general registers (and eight
    32-bit shadow registers)
  • Seven 32-bit control registers
  • Four 32-bit system registers
  • Register operands are always longwords (32 bits).
    When a memory operand is only a byte (8 bits)or a
    word (16 bits), it is sign-extended into a
    longword when loaded into a register.

8
Data Formats in Memory
  • Memory data formats are classified into bytes,
    words, and longwords. Memory can be accessed in
    8-bit byte, 16-bit word, or 32-bit longword form.
    A memory operand less than 32 bits in length is
    sign-extended before being loaded into a
    register.
  • A word operand must be accessed starting from a
    word boundary (even address of a 2-byte unit
    address 2n), and a longword operand starting from
    a longword boundary (even address of a 4-byte
    unit address 4n). An address error will result
    if this rule is not observed.
  • A byte operand can be accessed from any address.

9
Endianess
  • Big endian or little endian byte order can be
    selected for the data format. Big endian is the
    preferred method of operation.
  • The endian cannot be changed dynamically.
  • Bit positions are numbered left to right from
    most-significant to least-significant. Thus, in a
    32-bit longword, the leftmost bit, bit 31, is the
    most significant bit and the rightmost bit, bit
    0, is the least significant bit.

10
Operand and Instruction Caches
  • The operand cache consists of 512 cache lines,
    each composed of a 19-bit tag, validity bit(V),
    dirty bit(U), and 32-byte data.
  • The instruction cache consists of 256 cache
    lines, each composed of a 19-bit tag, validation
    bit (V), and 32-byte data (16 instructions).
  • (Tag - stores the upper 19 bits of the 29-bit
    external memory address of the data line to be
    cached.)

11
Cache-Memory coherence
  • Coherency between cache and external memory
    should be assured by software.
  • Several cache operations instructions are
    provided, including a prefetch instruction

12
Cache operations (operand cache only)
  • Invalidate instruction OCBI _at_Rn Cache
    invalidation (no write-back)
  • Purge instruction OCBP _at_Rn Cache invalidation
    (with write-back)
  • Write-back instruction OCBWB _at_Rn Cache
    write-back
  • Allocate instruction MOVCA.L R0,_at_Rn Cache
    allocation

13
Floating Point Unit (FPU)
  • Conforms to IEEE754 standard
  • 32 single-precision floating-point registers (can
    also be referenced as 16 double-precision
    registers)
  • Two rounding modes Round to Nearest and Round to
    Zero
  • Two denormalization modes Flush to Zero and
    Treat Denormalized Number
  • Six exception sources FPU Error, Invalid
    Operation, Divide By Zero, Overflow, Underflow,
    and Inexact
  • Comprehensive instructions Single-precision,
    double-precision, graphics support, system
    control

14
FPU Data Formats
  • A floating-point number consists of the following
    three fields
  • Sign (s)
  • Exponent (e)
  • Fraction (f)
  • 32 bit Single-Precision (s1,e8,f23)
  • 64 bit Double-Precision (s1,e11,f52)

15
FPU Rounding
  • Round to Nearest The value is rounded to the
    nearest expressible value. If the unrounded
    value is 2Emax (2 2(P)) or more, the result
    will be infinity with the same sign as the
    unrounded value.
  • Round to Zero The digits below the round bit of
    the unrounded value are discarded. If the
    unrounded value is larger than the maximum
    expressible absolute value, the value will be the
    maximum expressible absolute value.

16
FPU Graphics Support
  • The SH7750 Series supports two kinds of graphics
    functions
  • instructions for geometric operations
  • pair single-precision transfer instructions that
    enable high-speed data transfer.

17
FPU Geometric functions
  • Geometric operation instructions perform
    approximate-value computations. To enable
    high-speed computation with a minimum of
    hardware, the SH7750 Series ignores comparatively
    small values in the partial computation results
    of four multiplications.

18
FPU Pair Single-Precision Data Transfer
  • In addition to the geometric operation
    instructions, the SH7750 Series also supports
    high-speed data transfer instructions.
  • These instructions enable two single-precision (2
    32-bit) data items to be transferred that is,
    the transfer performance of these instructions is
    doubled.

19
Instruction Format
  • the instruction set is implemented with 16-bit
    fixed length instructions.
  • operations are basically executed using
    registers.
  • Except for bit-manipulation operations such as
    logical AND that are executed directly in memory,
    operands in an operation that requires memory
    access are loaded into registers and the
    operation is executed between the registers.

20
Instruction Format (contd)
  • Delayed Branches Except for the two branch
    instructions BF and BT, branch instructions and
    RTE are delayed branches. (In a delayed branch,
    the instruction following the branch is executed
    before the branch destination instruction.)
  • Constant Values An 8-bit constant value can be
    specified by the instruction code and an
    immediate value. 16-bit and 32-bit constant
    values can be defined as literal constant values
    in memory

21
Addressing Modes
  • Register direct
  • Register indirect (supports post and pre
    decrement and increment as well as displacement)
  • Indexed register indirect, i.e. the effective
    address is sum of register Rn and R0 contents.
  • Immediate

22
Instruction Set
  • Over 100 different instructions including FP,
    mostly variations on MOV, ADD, etc. to
    accommodate different addressing modes.
  • Instruction mnemonic
  • OP, Sz, SRC, DEST
  • OP Operation code
  • Sz Size
  • SRC Source
  • DEST Source and/or destination operand

23
Instruction Level Parallelism
  • The SH7750 Series is a 2-ILP (instruction-level-pa
    rallelism) superscalar pipelining microprocessor.
  • Instruction execution is pipelined, and two
    instructions can be executed in parallel.
  • Parallel execution depends on the instructions
    not all instructions can be executed in parallel
    with all others

24
Pipelining
  • The instruction pipeline has 5 stages
  • Instruction fetch (I)
  • decode and register read (D)
  • execution (EX/SX/F0/F1/F2/F3)
  • data access (NA/MA)
  • write-back (S/FS)

25
ILP Illustration
http//www.hitachisemicond
uctor.com/sic/jsp/japan/eng/products/
mpumcu/32bit/image/2_way.gif
26
Direct Memory Access
  • The SH7750 Series includes an on-chip
    four-channel direct memory access controller
    (DMAC).
  • The DMAC can be used in place of the CPU to
    perform high-speed data transfers among external
    devices equipped with DACK (DMA transfer end
    notification), external memories, memory mapped
    external devices, and on-chip peripheral modules
    (except the DMAC, BSC, and UBC).
  • Using the DMAC reduces the burden on the CPU and
    increases the operating efficiency of the chip.

27
Serial Communication Interface (SCI)
  • The SH7750 is equipped with a single-channel
    serial communication interface (SCI) and a single
    channel serial communication interface with
    built-in FIFO registers (SCI with FIFO SCIF).
  • The SCI can handle both asynchronous and
    synchronous serial communication. A function is
    also provided for serial communication between
    processors (multiprocessor communication
    function).

28
Smart Card Interface
  • An IC card (smart card) interface conforming to
    ISO/IEC 7816-3 (Identification Card) is supported
    as a serial communication interface (SCI)
    extension function.
  • Switching between the normal serial communication
    interface and the smart card interface is carried
    out by means of a register setting.
Write a Comment
User Comments (0)
About PowerShow.com