Embedded System HW - PowerPoint PPT Presentation

1 / 137
About This Presentation
Title:

Embedded System HW

Description:

Title: SHARC programming model Author: Wayne Wolf Last modified by: Created Date: 6/17/1995 11:31:02 PM Document presentation format: – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 138
Provided by: wayne74
Category:

less

Transcript and Presenter's Notes

Title: Embedded System HW


1
Embedded System HW
  • Processor Technology

2
Processor Technology
  • General Purpose (software)
  • Application Specific
  • Single Purpose (Hardware)
  • IC technology
  • Full Custom/VLSI
  • Semi-custom ASIC (gate-array, standard cell)
  • PLD

3
Custom single-purpose processors Hardware
4
Outline
  • Introduction
  • Combinational logic
  • Sequential logic
  • Custom single-purpose processor design
  • RT-level custom single-purpose processor design
  • Read chapter 2 in Embedded System Design A
    unified Hardware/Software Introduction, Frank
    Vahid and Tony Givargis.

5
Introduction
  • Processor
  • Digital circuit that performs a computation tasks
  • Controller and datapath
  • General-purpose variety of computation tasks
  • Single-purpose one particular computation task
  • Custom single-purpose non-standard task
  • A custom single-purpose processor may be
  • Fast, small, low power
  • But, high NRE, longer time-to-market, less
    flexible

6
Custom single-purpose processor basic model
7
Example greatest common divisor
  • First create algorithm
  • Convert algorithm to complex state machine
  • Known as FSMD finite-state machine with datapath
  • Can use templates to perform such conversion

(c) state diagram
(b) desired functionality
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
8
State diagram templates
9
Creating the datapath
  • Create a register for any declared variable
  • Create a functional unit for each arithmetic
    operation
  • Connect the ports, registers and functional units
  • Based on reads and writes
  • Use multiplexors for multiple sources
  • Create unique identifier
  • for each datapath component control input and
    output

10
Creating the controllers FSM
  • Same structure as FSMD
  • Replace complex actions/conditions with datapath
    configurations

11
Splitting into a controller and datapath
go_i
Controller
!1
1
0000
1
!(!go_i)
2
0001
!go_i
2-J
0010
x_sel 0 x_ld 1
3
0011
y_sel 0 y_ld 1
4
0100
x_neq_y0
5
0101
x_neq_y1
6
0110
x_lt_y1
x_lt_y0
y_sel 1 y_ld 1
x_sel 1 x_ld 1
7
8
0111
1000
6-J
1001
5-J
1010
d_ld 1
9
1011
1-J
1100
12
Controller state table for the GCD example
13
Completing the GCD custom single-purpose
processor design
  • We finished the datapath
  • We have a state table for the next state and
    control logic
  • All thats left is combinational logic design
  • This is not an optimized design, but we see the
    basic steps

14
Summary
  • Custom single-purpose processors
  • Straightforward design techniques
  • Can be built to execute algorithms
  • Typically start with FSMD
  • CAD tools can be of great assistance

15
General-Purpose Processors Software
16
Introduction
  • General-Purpose Processor
  • Processor designed for a variety of computation
    tasks
  • Low unit cost, in part because manufacturer
    spreads NRE over large numbers of units
  • Motorola sold half a billion 68HC05
    microcontrollers in 1996 alone
  • ARM processors 1.5 billion processors
  • Carefully designed since higher NRE is acceptable
  • Can yield good performance, size and power
  • Low NRE cost, short time-to-market/prototype,
    high flexibility
  • User just writes software no processor design
  • a.k.a. microprocessor micro used when they
    were implemented on one or a few chips rather
    than entire rooms

17
Why use microprocessors?
  • Alternatives field-programmable gate arrays
    (FPGAs), custom logic, etc. (Custom
    Single-purpose Processor or HW Logic)
  • Microprocessors are often very efficient can use
    same logic to perform many different functions.
  • Microprocessors simplify the design of families
    of products.

18
The performance paradox
  • Microprocessors use much more logic to implement
    a function than does custom logic.
  • But microprocessors are often at least as fast
  • heavily pipelined
  • large design teams
  • aggressive VLSI technology.

19
Power
  • Custom logic is a clear winner for low power
    devices.
  • Modern microprocessors offer features to help
    control power consumption.
  • Software design techniques can help reduce power
    consumption.

20
Basic Architecture
21
Basic Architecture
  • Control unit and datapath
  • Note similarity to single-purpose processor
  • Key differences
  • Datapath is general
  • Control unit doesnt store the algorithm the
    algorithm is programmed into the memory

22
Superscalar and VLIW Architectures
  • Performance can be improved by
  • Faster clock (but theres a limit)
  • Pipelining slice up instruction into stages,
    overlap stages
  • Multiple ALUs to support more than one
    instruction stream
  • Superscalar
  • Scalar non-vector operations
  • Fetches instructions in batches, executes as many
    as possible
  • May require extensive hardware to detect
    independent instructions
  • VLIW each word in memory has multiple
    independent instructions
  • Currently growing in popularity
  • Relies on the compiler to detect and schedule
    instructions

23
Pipelining Increasing Instruction Throughput
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Wash
Non-pipelined
Pipelined
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Dry
Time
Time
non-pipelined dish cleaning
pipelined dish cleaning
Fetch-instr.
1
2
3
4
5
6
7
8
Decode
1
2
3
4
5
6
7
8
Fetch ops.
1
2
3
4
5
6
7
8
Pipelined
Execute
1
2
3
4
5
6
7
8
Instruction 1
Store res.
1
2
3
4
5
6
7
8
Time
pipelined instruction execution
24
Two Memory Architectures
  • Princeton
  • Fewer memory wires
  • Harvard
  • Simultaneous program and data memory access

25
Princeton vs. Harvard
  • Harvard cant use self-modifying code.
  • Harvard allows two simultaneous memory fetches.
  • Most DSPs use Harvard architecture for streaming
    data
  • greater memory bandwidth
  • more predictable bandwidth.

26
Cache Memory
  • Memory access may be slow
  • Cache is small but fast memory close to processor
  • Holds copy of part of memory
  • Hits and misses

27
Application-Specific Instruction-Set Processors
(ASIPs)
28
Application-Specific Instruction-Set Processors
(ASIPs)
  • General-purpose processors
  • Sometimes too general to be effective in
    demanding application
  • e.g., video processing requires huge video
    buffers and operations on large arrays of data,
    inefficient on a GPP
  • But single-purpose processor has high NRE, not
    programmable
  • ASIPs targeted to a particular domain
  • Contain architectural features specific to that
    domain
  • e.g., embedded control, digital signal
    processing, video processing, network processing,
    telecommunications, etc.
  • Still programmable

29
Microprocessor varieties
  • Microcontroller includes I/O devices, on-board
    memory.
  • Digital signal processor (DSP) microprocessor
    optimized for digital signal processing.
  • Typical embedded word sizes 8-bit, 16-bit,
    32-bit.

30
Embedded Processors
  • ???? ????
  • ??? ????????? ??
  • ????????? ??? ????? ??
  • CPU ??, ???, ?? ??, ?????? ??? ??? ???? ??? ????
    ??

Netsilicon NETARM Embedded Processor
31
Many Types of Programmable Processors
  • Past
  • Microprocessor
  • Microcontroller
  • DSP
  • Graphics Processor
  • Now / Future
  • Network Processor
  • Sensor Processor
  • Cryptoprocessor
  • Game Processor
  • Wearable Processor
  • Mobile Processor

32
A Common ASIP Microcontroller
  • For embedded control applications
  • Reading sensors, setting actuators
  • Mostly dealing with events (bits) data is
    present, but not in huge amounts
  • e.g., VCR, disk drive, digital camera (assuming
    SPP for image compression), washing machine,
    microwave oven
  • Microcontroller features
  • On-chip peripherals
  • Timers, analog-digital converters, serial
    communication, etc.
  • Tightly integrated for programmer, typically part
    of register space
  • On-chip program and data memory
  • Direct programmer access to many of the chips
    pins
  • Specialized instructions for bit-manipulation and
    other low-level operations

33
Another Common ASIP Digital Signal Processors
(DSP)
  • For signal processing applications
  • Large amounts of digitized data, often streaming
  • Data transformations must be applied fast
  • e.g., cell-phone voice filter, digital TV, music
    synthesizer
  • DSP features
  • Several instruction execution units
  • Multiple-accumulate single-cycle instruction,
    other instrs.
  • Efficient vector operations e.g., add two
    arrays
  • Vector ALUs, loop buffers, etc.

34
Trend Even More Customized ASIPs
  • In the past, microprocessors were acquired as
    chips
  • Today, we increasingly acquire a processor as
    Intellectual Property (IP)
  • e.g., synthesizable VHDL model
  • Opportunity to add a custom datapath hardware and
    a few custom instructions, or delete a few
    instructions
  • Can have significant performance, power and size
    impacts
  • Problem need compiler/debugger for customized
    ASIP
  • Remember, most development uses structured
    languages
  • One solution automatic compiler/debugger
    generation
  • e.g., www.tensillica.com
  • Another solution retargettable compilers
  • e.g., www.improvsys.com (customized VLIW
    architectures)

35
Reconfigurable SoC
Other Examples Atmels FPSLIC(AVR
FPGA) Alteras Nios(configurable RISC on a PLD)
  • Triscends A7 CSoC

36
Selecting a Microprocessor
  • Issues
  • Technical speed, power, size, cost
  • Other development environment, prior expertise,
    licensing, etc.
  • Speed how evaluate a processors speed?
  • Clock speed but instructions per cycle may
    differ
  • Instructions per second but work per instr. may
    differ
  • Dhrystone Synthetic benchmark, developed in
    1984. Dhrystones/sec.
  • MIPS 1 MIPS 1757 Dhrystones per second (based
    on Digitals VAX 11/780). A.k.a. Dhrystone MIPS.
    Commonly used today.
  • So, 750 MIPS 7501757 1,317,750 Dhrystones
    per second
  • SPEC set of more realistic benchmarks, but
    oriented to desktops
  • EEMBC EDN Embedded Benchmark Consortium,
    www.eembc.org
  • Suites of benchmarks automotive, consumer
    electronics, networking, office automation,
    telecommunications

37
Processors ??
Sources Intel, Motorola, MIPS, ARM, TI, and IBM
Website/Datasheet Embedded Systems Programming,
Nov. 1998
38
Summary
  • General-purpose processors
  • Good performance, low NRE, flexible
  • Controller, datapath, and memory
  • Structured languages prevail
  • But some assembly level programming still
    necessary
  • Many tools available
  • Including instruction-set simulators, and
    in-circuit emulators
  • ASIPs
  • Microcontrollers, DSPs, network processors, more
    customized ASIPs
  • Choosing among processors is an important step
  • Designing a general-purpose processor is
    conceptually the same as designing a
    single-purpose processor

39
Instruction Sets
40
RISC vs. CISC
  • Complex instruction set computer (CISC)
  • many addressing modes
  • many operations.
  • Reduced instruction set computer (RISC)
  • load/store
  • pipelinable instructions.

41
CISC ????
  • Intel ?? ????????? ?? ? ??

?? ???? ?? ????? ?? ??
1971 4004 2,250 ??? ? ???? ????, Busicom ???? ??
1972 8008 2,500 Mark-8?? ??, ??? ??? ???
1974 8080 5,000 Altair?? ??
1978 8086/8088 29,000 IBM-PC XT?? ??, ??? ????? ??
1982 80286 120,000 IBM-PC AT?? ??, 6?? ? 5??? ??
1985 80386 275,000 32?? ?? ??? ??
1989 80486 1,180,000 ?? ?? ???? ??
1993 Pentium 3,100,000 ??, ??? ?? ?? ??
1995 Pentium Pro 5,500,000 Dynamic Execution ?? ??
1997 Pentium 2 7,500,000 MMX ?? ??
1999 Pentium 3 24,000,000 SIMD ??, 12 ???? ?????
2001 Itanium 25,000,000 64??, Explicitly Parallel Instruction Computing(EPIC)
2002 Pentium 4 55,000,000 20 ???? ??? ?????, ??? ???
2003 Itanium 2 410,000,000 Machine Check Architecture, EPIC, 6MB L3 ??
42
CISC - History Packaging?? ??
43
CISC - History
44
Instruction set characteristics
  • Fixed vs. variable length.
  • Addressing modes.
  • Number of operands.
  • Types of operands.

45
ARM data processing Instruction Format(RISC)
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
46
Intel IA-32 Instruction Format (CISC)
47
Programming model
  • Programming model registers visible to the
    programmer.
  • Some registers are not visible (IR).

48
Multiple implementations
  • Successful architectures have several
    implementations
  • varying clock speeds
  • different bus widths
  • different cache sizes
  • etc.

49
ARM Architecture
  • Advanced RISC Machines(1990)
  • (ACORN and Apple Computer)

50
ARM Architecture
  • ARM versions.
  • ARM assembly language.
  • ARM programming model.

51
ARM versions
  • ARM architecture has been extended over several
    versions.
  • We will concentrate on ARMv5

52
Evolution of the ARM architecture versions
53
ARMv6 Improvement
  • Memory management
  • Multiprocessing
  • Multimedia support SIMD capability

54
Evolution of the ARM architecture
ARM11
55
Introduction
  • To allow very small, yet high-performance
    implementations
  • RISC
  • Large uniform register file
  • Load/store architecture
  • Simple addressing modes
  • Uniform and fixed-length instr fields
  • Auto-increment and auto-decrement addr modes
  • Conditional execution of all instrcutions

56
Extension of the RISC rules
  • High code density, low power, and small die size
  • Variable cycle execution
  • Multiple load and store
  • Improve code density, reduce Ifs, and reduces
    overall power consumption
  • Inline barrel shifter
  • Conditional execution
  • 16-bit Thumb instruction set
  • Enhance DSP instructions
  • 16X16 multiply, arithmetic saturation
  • DSP-specific routines

57
ARM assembly language
  • Fairly standard assembly language
  • LDR r0,r8 a comment
  • label ADD r4,r0,r1

58
Programming Model
59
ARM data types
  • Byte
  • Halfword 16 bits
  • Must be aligned to two-byte boundaries
  • Word 32 bits
  • Must be aligned to four-byte boundaries
  • ARM addresses can be 32 bits long.
  • Address refers to byte.
  • Address 4 starts at byte 4.
  • Can be configured at power-up as either little-
    or bit-endian mode.

60
Processor modes
  • User usr Normal program execution modes
  • FIQ fiq Supports a high-speed data transfer or
    channel process
  • IRQ irq Used for general-purpose interrupt
    handling
  • Supervisor svc A protected mode for OS
  • Abort abt Implements VM and/or memory
    protection
  • Undefined und Supports software emulation of
    HW coprocessors
  • System sys Runs privileged OS tasks
  • fiq, irq, svc, abt, und exception modes

61
Registers
r0
r8
r1
r9
0
31
r2
r10
CPSR
r3
r11
r4
r12
r5
r13
r6
r14
r7
r15 (PC)
Link register
unbanked registers
banked registers
62
(No Transcript)
63
Endianness
  • Relationship between bit and byte/word ordering
    defines endianness

bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
64
ARM status bits
  • Every arithmetic, logical, or shifting operation
    may set CPSR (current program statues register)
    bits
  • N (negative), Z (zero), C (carry), V (overflow).
  • Examples
  • -1 1 0 NZCV 0110.
  • 231-11 -231 NZCV 0101.

65
ARM data processing operand addressing
  • Instruction syntax
  • ltopcodegtltcondgtS ltRdgt, ltRngt, ltshifter-operandgt
  • ltshifter-operandgt has 11 options

66
Condition field
  • Almost all ARM instrs. conditionally executed

67
ARM data processing operand addressing
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
68
Shifter operand
  • Immediate
  • 8-bit constant and a 4-bit rotate (0,2,4,8,,30)
  • mov r0, 0
  • add r9, r9,1
  • Register operand
  • mov r2, r0
  • Shifted register operand
  • ASR, LSL, LSR, ROR, RRX (by one bit)
  • mov r2, r0, LSL 2 shift r0 left by 2, write
    to r2 (r2r0x4)
  • sub r10,r9,r8, LSR 4 r10 r9 - r8/16
  • sov r10,r9,r8, ROR r3 r10 r9 - (r8 rotated by
    value of r3)

69
ARM data-processing
  • AND
  • EOR
  • SUB Rd Rn - shifter operand
  • RSB Rd shifter operand - Rn
  • ADD
  • ADC (with carry)
  • SBC
  • RSC (reverse SBC)
  • TST update flags after Rn AND shifter operand
  • TEQ
  • CMP
  • CMN copmare negated
  • ORR (logical OR)
  • MOV
  • BIC
  • MVN (mov not)

70
ARM data-processing
  • Shift, Rotate ? shifter-operand
  • LSL, LSR logical shift left/right
  • ASR arithmetic shift left/right
  • ROR rotate right
  • RRX rotate right extended with C

71
Data operation varieties
  • Logical shift
  • fills with zeroes.
  • Arithmetic shift
  • fills with sign extension
  • RRX performs 33-bit rotate, including C bit from
    CPSR above sign bit.

72
Load and Store instructions
  • Two types
  • 32-bit word or an 8-bit unsigned byte
  • Load and store halfword and load signed byte
  • Addressing modes
  • Base register
  • Any one of GPR (including the PC)
  • Offset
  • Three format

73
Addressing modes
  • Offset
  • Immediate unsigned number (12 bits or 8 bits)
  • Register GPR (not the PC)
  • Scaled register shifted by an immediate value
  • LSL, LSR, ASR, ROR, RRX
  • Three ways to form the memory address
  • EA Base register or Offset
  • Offset
  • Pre-indexed
  • Post-indexed

74
Addressing modes
  • Base-plus-offset addressing
  • LDR r0,r1,16
  • Loads from location r116
  • Pre-indexing increments base register
  • LDR r0,r1,16!
  • Post-indexing fetches, then does offset
  • LDR r0,r1,16
  • Loads r0 from r1, then adds 16 to r1.

75
Load and store
  • LDR
  • LDRB
  • LDRH
  • LDRSB (signed byte)
  • LDRSH (signed halfw)
  • STR
  • STRB
  • STRH

76
Examples
  • LDR R1, R0 load R1 from the address in R0
  • LDR R8, R3, 4 EA R3 4
  • LDR R8, R3, -4 EA R3 4
  • STRB R10, R7, -R4 EA R7 R4
  • LDR R11, R3, R5, LSL 2 EA R3 (R5x4)
  • LDR R3, R9, 4 EA R9, R9 R9 4
    post-indexed
  • LDR R1, R0, 2 ! EA R02, R0R02
    pre-indexed
  • LDR R0, PC, 40 load R0 from PC0x40 (
    address of the instruction 8 0x40)

77
Load and store multiple
  • Addressing modes
  • IA increment after
  • IB increment before
  • DA decrement after
  • DB decrement before

78
Load and store multiple
  • LDM
  • STM
  • Examples
  • LDMIA r0, r5 r8
    load multiple r5-r8 from
    the
    address in r0
  • STMDA r1!, r2, r5, r7 r9, r11
    update r1

79
Branch instructions
  • Conditional branch forwards or backwards up to 32
    MB
  • Sign-extending the 24-bit imm_data to 32 bits
  • Shifting the result left two bits
  • Adding this to the PC (the addr of branch 8)
  • Approximately 32MB
  • B, BL

80
Examples
  • B label
  • BCC label branch if carry flag is clear
  • BEQ label if zero flag is set
  • MOV PC, 0 branch to location zero
  • BL func subroutine call
  • MOV PC,LR return
  • MOV LR, PC
  • LDR PC, func

81
ARM ADR pseudo-op
  • Cannot refer to an address directly in an
    instruction.
  • Generate value by performing arithmetic on PC.
  • ADR pseudo-op generates instruction required to
    calculate address
  • ADR r1,FOO

82
Examples
  • start MOV r0, 10
  • ADR r4, start gt SUB r4,pc,0xc
  • start pc - 4 - 8 pc - 12 pc - 0xc

83
Example C assignments
  • C
  • x (a b) - c
  • Assembler
  • ADR r4,a get address for a
  • LDR r0,r4 get value of a
  • ADR r4,b get address for b, reusing r4
  • LDR r1,r4 get value of b
  • ADD r3,r0,r1 compute ab
  • ADR r4,c get address for c
  • LDR r2r4 get value of c

84
C assignment, contd.
  • SUB r3,r3,r2 complete computation of x
  • ADR r4,x get address for x
  • STR r3r4 store value of x

85
Example C assignment
  • C
  • y a(bc)
  • Assembler
  • ADR r4,b get address for b
  • LDR r0,r4 get value of b
  • ADR r4,c get address for c
  • LDR r1,r4 get value of c
  • ADD r2,r0,r1 compute partial result
  • ADR r4,a get address for a
  • LDR r0,r4 get value of a

86
C assignment, contd.
  • MUL r2,r2,r0 compute final value for y
  • ADR r4,y get address for y
  • STR r2,r4 store y

87
Example C assignment
  • C
  • z (a ltlt 2) (b 15)
  • Assembler
  • ADR r4,a get address for a
  • LDR r0,r4 get value of a
  • MOV r0,r0,LSL 2 perform shift
  • ADR r4,b get address for b
  • LDR r1,r4 get value of b
  • AND r1,r1,15 perform AND
  • ORR r1,r0,r1 perform OR

88
C assignment, contd.
  • ADR r4,z get address for z
  • STR r1,r4 store value for z

89
Example if statement
  • C
  • if (a lt b) x 5 y c d else x c - d
  • Assembler
  • compute and test condition
  • ADR r4,a get address for a
  • LDR r0,r4 get value of a
  • ADR r4,b get address for b
  • LDR r1,r4 get value for b
  • CMP r0,r1 compare a lt b
  • BGE fblock if a gt b, branch to false block

90
If statement, contd.
  • true block
  • MOV r0,5 generate value for x
  • ADR r4,x get address for x
  • STR r0,r4 store x
  • ADR r4,c get address for c
  • LDR r0,r4 get value of c
  • ADR r4,d get address for d
  • LDR r1,r4 get value of d
  • ADD r0,r0,r1 compute y
  • ADR r4,y get address for y
  • STR r0,r4 store y
  • B after branch around false block

91
If statement, contd.
  • false block
  • fblock ADR r4,c get address for c
  • LDR r0,r4 get value of c
  • ADR r4,d get address for d
  • LDR r1,r4 get value for d
  • SUB r0,r0,r1 compute a-b
  • ADR r4,x get address for x
  • STR r0,r4 store value of x
  • after ...

92
Example Conditional instruction implementation
  • true block
  • MOVLT r0,5 generate value for x
  • ADRLT r4,x get address for x
  • STRLT r0,r4 store x
  • ADRLT r4,c get address for c
  • LDRLT r0,r4 get value of c
  • ADRLT r4,d get address for d
  • LDRLT r1,r4 get value of d
  • ADDLT r0,r0,r1 compute y
  • ADRLT r4,y get address for y
  • STRLT r0,r4 store y

93
Conditional instruction implementation, contd.
  • false block
  • ADRGE r4,c get address for c
  • LDRGE r0,r4 get value of c
  • ADRGE r4,d get address for d
  • LDRGE r1,r4 get value for d
  • SUBGE r0,r0,r1 compute a-b
  • ADRGE r4,x get address for x
  • STRGE r0,r4 store value of x

94
Example FIR filter
  • C
  • for (i0, f0 iltN i)
  • f f cixi
  • Assembler
  • loop initiation code
  • MOV r0,0 use r0 for I
  • MOV r8,0 use separate index for arrays
  • ADR r2,N get address for N
  • LDR r1,r2 get value of N
  • MOV r2,0 use r2 for f

95
FIR filter, cont.d
  • ADR r3,c load r3 with base of c
  • ADR r5,x load r5 with base of x
  • loop body
  • loop LDR r4,r3,r8 get ci
  • LDR r6,r5,r8 get xi
  • MUL r4,r4,r6 compute cixi
  • ADD r2,r2,r4 add into running sum
  • ADD r8,r8,4 add one word offset to array
    index
  • ADD r0,r0,1 add 1 to i
  • CMP r0,r1 exit?
  • BLT loop if i lt N, continue

96
Nested subroutine calls
  • Nesting/recursion requires coding convention
  • f1 LDR r0,r13 load arg into r0 from stack
  • call f2()
  • STR r14,r13! store f1s return adrs
  • STR r0,r13! store arg to f2 on stack
  • BL f2 branch and link to f2
  • return from f1()
  • SUB r13,4 pop f2s arg off stack
  • LDR r15,r13! restore register and return

97
Summary
  • Load/store architecture
  • Most instructions are RISCy, operate in single
    cycle.
  • Some multi-register operations take longer.
  • All instructions can be executed conditionally.

98
MPC850
99
Reference Manuals
  • MPC850 Family User Manual
  • PowerPC Programming Environment Manual
  • Course Home Page http//calab.kaist.ac.kr/maeng/c
    s310/micro02.htm
  • Motorola Home Page
  • http//e-www.motorola.com

100
Overview
  • Versatile, one-chip, integrated communication
    processor
  • Embedded PowerPC core
  • Versatile memory controller
  • Communication processor module (CPM)
  • Serial communication controllers (SCCs)
  • One USB
  • Etc.

101
(No Transcript)
102
Embedded PowerPC core
  • Single issue, 32-bit version
  • Branch folding and prediction
  • 2-K byte I-cache, 1K byte D-cache
  • 2-way set-associative
  • Physical
  • MMUs with 8-entry TLBs
  • 4K, 16K, 256K, 512K, and 8MB page sizes

103
Other Features
  • Dynamic data bus sizing 8-, 16-, 32-bit
  • CPU clock 0-80MHz
  • System Integration Unit (SIU)
  • Memory Controller
  • General Purpose timer
  • CPM, SCCs, SMCs, etc.

104
PowerPC Architecture
105
PowerPC instruction set
  • Overview
  • Operand Conventions
  • PowerPC Registers and programming model
  • Addressing Modes
  • Instruction Set
  • Cache model
  • Exception Model
  • Memory management model

106
PowerPC Architecture
  • Motorola, IBM, Apple computer
  • Power Architecture RS/6000 family
  • 64-bit architecture with a 32-bit subset
  • Three Levels of the architecture
  • Flexibility degrees of SW compatibility
  • UISA (User instruction set architecture)
  • VEA (Virtual environment architecture)
  • OEA (Operating environment architecture)

107
Features not defined by the PowerPC Architecture
  • For flexibility
  • System bus interface signals
  • Cache design
  • The number and the nature of execution units
  • Other internal micro-architecture issues

108
Endianness
  • Relationship between bit and byte/word ordering
    defines endianness

bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
PowerPC, IBM, Motorola
ARM, Intel
109
Programming Model Registers
110
(No Transcript)
111
PowerPC programming model - Register Set
  • User Model UISA (32-bit architecture)

Condition register
GPR0(32)
FGPR0(64)
CR(32)
GPR1(32)
FGPR1(64)


FP status and control register
GPR31(32)
FPSCR(32)
FGPR31(64)
XER register
Link register
Count register
CTR(64/32)
XER(32)
LR(64/32)
112
Condition Registers (CR)
  • For testing and branching

CR0
CR1
CR7
CR6
CR5
CR4
CR3
CR2
0
31
FP
Condition register CRn Field Compare Instruction
For all integer instrs. Bit0 Negative(LT) Bit1
Positive(GT) Bit2 Zero (EQ) Bit3 Summary
Overflow(SO)
back
113
XER Register (XER)
back
114
XER Register (XER), contd
115
Link Register (LR), Count Register (CTR)
bclrx (bc to link register) Branch with link
update
116
Counter Register
  • Loop count

117
VEA Register Set Time Base
118
OEA Register Set
119
Machine State Register (MSR)
120
(No Transcript)
121
(No Transcript)
122
Addressing Modes
  • Effective Address Calculation
  • Register indirect with immediate index mode
  • Register indirect with index mode
  • Register indirect mode

123
Register Indirect with Immediate Index Addressing
back
124
Register Indirect with Index
back
125
Register Indirect
back
126
Instruction Formats
  • 4 bytes long and word-aligned
  • Bits 0-5 always specify the primary opcode
  • Extended opcode

127
Instruction set
  • Integer
  • Floating-point
  • Load and store
  • Flow control
  • Processor control
  • Memory synchronization
  • Memory control
  • External control

128
Summary
  • UISA, VEA, OEA
  • Register set
  • Fixed size instruction - RISC
  • Load and store architecture
  • 3 addressing modes
  • Condition Register Update Rc field
  • 8 condition registers
  • Branch addressing modes
  • BO, BI fields
  • Relative, absolute, LR, CTR

129
RISC Xscale Microarchitecture Features
  • Arm Architecture Version 5TE ISA ??
  • ??? ???(?? 400MHz)
  • Modified Harvard Architecture
  • instruction cache? data cache? ??(2 caches)
  • 32KB Instruction Cache
  • 32KB Data Cache
  • Intel Media Processing Technology
  • Instruction and Data Memory Management Unit
  • Branch Target Buffer
  • Debug Capability via JTAG Port
  • 0.35µm 3 Layer metal CMOS, 2.6 million transistor
  • 256 PBGA package (17 x 17mm)

130
RISC Xscale System Integration Features
  • Memory controller
  • Power management controller
  • Normal, idle, sleep mode ??
  • USB client
  • Multi channel DMA controller
  • ????? ???? ??, ?? DMA ??
  • LCD controller
  • AC97 codec
  • Multimedia card serial interface to standard
    memory card, FIFO ??
  • FIR communication ??? ?? ??
  • Synchronous serial protocol port
  • I2C

131
RISC Xscale System Integration Features
  • 85 GPIO ports
  • irq, wake up interrupt ??
  • UART
  • Real-time clock and timer
  • 32?? ???, 32.7kHz ????, ??? /- 5sec/mon
  • OS timer with alarm register
  • Pulse width modulation
  • Interrupt controller
  • ?? ??? ????? ???

132
RISC XScale ???
  • Architecture V5TE? ??

133
Internal Structure
134
RISC - Xscale ??
  • Palm size device - Example

135
PXA255 Pin
UDC-
L_DD(150)
Serial Channel 0 (USB)
UDC
L_FCLK
RXD_1
L_LCLK
LCDControl
Serial Channel 1
TXD_1
L_PCLK
RXD_2
L_BIAS
Intel? XScale PXA250 256-pins
Serial Channel 2 (IrDA)
TXD_2
GP(270)
GPIO Ports
RXD_3
nCAS/ DQM(30)
Serial Channel 3 (UART)
TXD_3
nRAS/ nSDCS(30)
TXD_C
nOE
RXD_C
nWE
Serial Channel 4(CODEC)
SFRM_C
nCS(50)
Memory Control
SCLK_C
RDY
BATT_FAULT
nSDRAS
VDD_FAULT
nSDCAS
Power Management
PWR_EN
SDCKElt10gt
SDCLKlt20gt
TCK_BYP
RD/nWR
Transceiver Control
TESTCLK
nPOE
PEXTAL
nPWE
PXTAL
nPIOR
nPIOW
TEXTAL
nPCElt21gt
PCMCIA Bus Signals
Clocks, Reset and Test
TXTAL
PSKTSEL
nPREG
nRESET
nPWAIT
nRESET_OUT
nIOIS16
SMROM_EN
Address Bus
Alt250gt
ROM_SEL
TCK
Dlt310gt
Data Bus
TDI
TDO
VDD
JTAG
TMS
VDDX
Supply
nTRST
VSS/VSSX
136
RISC Xscale running modes
  • PXA255 ????

137
PXA255 Processor
  • XScale Core
  • 32Bit RISC
  • 32Bit registers
  • 32Bit instructions
  • Longword aligned
  • 32Bit datapaths
  • 78 stage pipeline
Write a Comment
User Comments (0)
About PowerShow.com