Title: Embedded System HW
1Embedded System HW
2Why use microprocessors?
- Alternatives field-programmable gate arrays
(FPGAs), custom logic, etc. - Microprocessors are often very efficient can use
same logic to perform many different functions. - Microprocessors simplify the design of families
of products.
3The performance paradox
- Microprocessors use much more logic to implement
a function than does custom logic. - But microprocessors are often at least as fast
- heavily pipelined
- large design teams
- aggressive VLSI technology.
4Power
- Custom logic is a clear winner for low power
devices. - Modern microprocessors offer features to help
control power consumption. - Software design techniques can help reduce power
consumption.
5Microprocessor varieties
- Microcontroller includes I/O devices, on-board
memory. - Digital signal processor (DSP) microprocessor
optimized for digital signal processing. - Typical embedded word sizes 8-bit, 16-bit,
32-bit.
6Many Types of Programmable Processors
- Past
- Microprocessor
- Microcontroller
- DSP
- Graphics Processor
- Now / Future
- Network Processor
- Sensor Processor
- Cryptoprocessor
- Game Processor
- Wearable Processor
- Mobile Processor
7Application-Specific Instruction Processors
(ASIPs)
- Processors with instruction-sets tailored to
specific applications or application domains - instruction-set generation as part of synthesis
- Pluses
- customization yields lower area, power etc.
- Minuses
- higher h/w s/w development overhead
- design, compilers, debuggers
- higher time to market
8Reconfigurable SoC
Other Examples Atmels FPSLIC(AVR
FPGA) Alteras Nios(configurable RISC on a PLD)
9Instruction Sets
10von Neumann architecture
- Memory holds data, instructions.
- Central processing unit (CPU) fetches
instructions from memory. - Separate CPU and memory distinguishes
programmable computer. - CPU registers help out program counter (PC),
instruction register (IR), general-purpose
registers, etc.
11CPU memory
memory
address
CPU
PC
200
data
IR
ADD r5,r1,r3
ADD r5,r1,r3
200
12Harvard architecture
address
CPU
data memory
PC
data
address
program memory
data
13von Neumann vs. Harvard
- Harvard cant use self-modifying code.
- Harvard allows two simultaneous memory fetches.
- Most DSPs use Harvard architecture for streaming
data - greater memory bandwidth
- more predictable bandwidth.
14RISC vs. CISC
- Complex instruction set computer (CISC)
- many addressing modes
- many operations.
- Reduced instruction set computer (RISC)
- load/store
- pipelinable instructions.
15Instruction set characteristics
- Fixed vs. variable length.
- Addressing modes.
- Number of operands.
- Types of operands.
16Programming model
- Programming model registers visible to the
programmer. - Some registers are not visible (IR).
17Multiple implementations
- Successful architectures have several
implementations - varying clock speeds
- different bus widths
- different cache sizes
- etc.
18ARM Architecture
- Advanced RISC Machines(1990)
- (ACORN and Apple Computer)
19ARM Architecture
- ARM versions.
- ARM assembly language.
- ARM programming model.
20ARM versions
- ARM architecture has been extended over several
versions. - We will concentrate on ARMv5
21Evolution of the ARM architecture versions
22ARMv6 Improvement
- Memory management
- Multiprocessing
- Multimedia support SIMD capability
23Evolution of the ARM architecture
ARM11
24Introduction
- To allow very small, yet high-performance
implementations - RISC
- Large uniform register file
- Load/store architecture
- Simple addressing modes
- Uniform and fixed-length instr fields
- Auto-increment and auto-decrement addr modes
- Conditional execution of all instrcutions
25ARM assembly language
- Fairly standard assembly language
- LDR r0,r8 a comment
- label ADD r4,r0,r1
26Programming Model
27ARM data types
- Byte
- Halfword 16 bits
- Must be aligned to two-byte boundaries
- Word 32 bits
- Must be aligned to four-byte boundaries
- ARM addresses can be 32 bits long.
- Address refers to byte.
- Address 4 starts at byte 4.
- Can be configured at power-up as either little-
or bit-endian mode.
28Processor modes
- User usr Normal program execution modes
- FIQ fiq Supports a high-speed data transfer or
channel process - IRQ irq Used for general-purpose interrupt
handling - Supervisor svc A protected mode for OS
- Abort abt Implements VM and/or memory
protection - Undefined und Supports software emulation of
HW coprocessors - System sys Runs privileged OS tasks
- fiq, irq, svc, abt, und exception modes
29Registers
r0
r8
r1
r9
0
31
r2
r10
CPSR
r3
r11
r4
r12
r5
r13
r6
r14
r7
r15 (PC)
Link register
unbanked registers
banked registers
30(No Transcript)
31Endianness
- Relationship between bit and byte/word ordering
defines endianness
bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
32ARM status bits
- Every arithmetic, logical, or shifting operation
may set CPSR (current program statues register)
bits - N (negative), Z (zero), C (carry), V (overflow).
- Examples
- -1 1 0 NZCV 0110.
- 231-11 -231 NZCV 0101.
33ARM data processing operand addressing
- Instruction syntax
- ltopcodegtltcondgtS ltRdgt, ltRngt, ltshifter-operandgt
- ltshifter-operandgt has 11 options
34Condition field
- Almost all ARM instrs. conditionally executed
35ARM data processing operand addressing
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
36Shifter operand
- Immediate
- 8-bit constant and a 4-bit rotate (0,2,4,8,,30)
- mov r0, 0
- add r9, r9,1
- Register operand
- mov r2, r0
- Shifted register operand
- ASR, LSL, LSR, ROR, RRX (by one bit)
- mov r2, r0, LSL 2 shift r0 left by 2, write
to r2 (r2r0x4) - sub r10,r9,r8, LSR 4 r10 r9 - r8/16
- sov r10,r9,r8, ROR r3 r10 r9 - (r8 rotated by
value of r3)
37ARM data-processing
- AND
- EOR
- SUB Rd Rn - shifter operand
- RSB Rd shifter operand - Rn
- ADD
- ADC (with carry)
- SBC
- RSC (reverse SBC)
- TST update flags after Rn AND shifter operand
- TEQ
- CMP
- CMN copmare negated
- ORR (logical OR)
- MOV
- BIC
- MVN (mov not)
38ARM data-processing
- Shift, Rotate ? shifter-operand
- LSL, LSR logical shift left/right
- ASR arithmetic shift left/right
- ROR rotate right
- RRX rotate right extended with C
39Data operation varieties
- Logical shift
- fills with zeroes.
- Arithmetic shift
- fills with sign extension
- RRX performs 33-bit rotate, including C bit from
CPSR above sign bit.
40Load and Store instructions
- Two types
- 32-bit word or an 8-bit unsigned byte
- Load and store halfword and load signed byte
- Addressing modes
- Base register
- Any one of GPR (including the PC)
- Offset
- Three format
41Addressing modes
- Offset
- Immediate unsigned number (12 bits or 8 bits)
- Register GPR (not the PC)
- Scaled register shifted by an immediate value
- LSL, LSR, ASR, ROR, RRX
- Three ways to form the memory address
- EA Base register or Offset
- Offset
- Pre-indexed
- Post-indexed
42Addressing modes
- Base-plus-offset addressing
- LDR r0,r1,16
- Loads from location r116
- Pre-indexing increments base register
- LDR r0,r1,16!
- Post-indexing fetches, then does offset
- LDR r0,r1,16
- Loads r0 from r1, then adds 16 to r1.
43Load and store
- LDR
- LDRB
- LDRH
- LDRSB (signed byte)
- LDRSH (signed halfw)
44Examples
- LDR R1, R0 load R1 from the address in R0
- LDR R8, R3, 4 EA R3 4
- LDR R8, R3, -4 EA R3 4
- STRB R10, R7, -R4 EA R7 R4
- LDR R11, R3, R5, LSL 2 EA R3 (R5x4)
- LDR R3, R9, 4 EA R9, R9 R9 4
post-indexed - LDR R1, R0, 2 ! EA R02, R0R02
pre-indexed - LDR R0, PC, 40 load R0 from PC0x40 (
address of the instruction 8 0x40)
45Load and store multiple
- Addressing modes
- IA increment after
- IB increment before
- DA decrement after
- DB decrement before
46Load and store multiple
- LDM
- STM
- Examples
- LDMIA r0, r5 r8
load multiple r5-r8 from
the
address in r0 - STMDA r1!, r2, r5, r7 r9, r11
update r1
47Branch instructions
- Conditional branch forwards or backwards up to 32
MB - Sign-extending the 24-bit imm_data to 32 bits
- Shifting the result left two bits
- Adding this to the PC (the addr of branch 8)
- Approximately 32MB
- B, BL
48Examples
- B label
- BCC label branch if carry flag is clear
- BEQ label if zero flag is set
- MOV PC, 0 branch to location zero
- BL func subroutine call
- MOV PC,LR return
- MOV LR, PC
- LDR PC, func
49ARM ADR pseudo-op
- Cannot refer to an address directly in an
instruction. - Generate value by performing arithmetic on PC.
- ADR pseudo-op generates instruction required to
calculate address - ADR r1,FOO
50Examples
- start MOV r0, 10
- ADR r4, start gt SUB r4,pc,0xc
- start pc - 4 - 8 pc - 12 pc - 0xc
51Example C assignments
- C
- x (a b) - c
- Assembler
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- ADR r4,b get address for b, reusing r4
- LDR r1,r4 get value of b
- ADD r3,r0,r1 compute ab
- ADR r4,c get address for c
- LDR r2r4 get value of c
52C assignment, contd.
- SUB r3,r3,r2 complete computation of x
- ADR r4,x get address for x
- STR r3r4 store value of x
53Example C assignment
- C
- y a(bc)
- Assembler
- ADR r4,b get address for b
- LDR r0,r4 get value of b
- ADR r4,c get address for c
- LDR r1,r4 get value of c
- ADD r2,r0,r1 compute partial result
- ADR r4,a get address for a
- LDR r0,r4 get value of a
54C assignment, contd.
- MUL r2,r2,r0 compute final value for y
- ADR r4,y get address for y
- STR r2,r4 store y
55Example C assignment
- C
- z (a ltlt 2) (b 15)
- Assembler
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- MOV r0,r0,LSL 2 perform shift
- ADR r4,b get address for b
- LDR r1,r4 get value of b
- AND r1,r1,15 perform AND
- ORR r1,r0,r1 perform OR
56C assignment, contd.
- ADR r4,z get address for z
- STR r1,r4 store value for z
57Example if statement
- C
- if (a lt b) x 5 y c d else x c - d
- Assembler
- compute and test condition
- ADR r4,a get address for a
- LDR r0,r4 get value of a
- ADR r4,b get address for b
- LDR r1,r4 get value for b
- CMP r0,r1 compare a lt b
- BGE fblock if a gt b, branch to false block
58If statement, contd.
- true block
- MOV r0,5 generate value for x
- ADR r4,x get address for x
- STR r0,r4 store x
- ADR r4,c get address for c
- LDR r0,r4 get value of c
- ADR r4,d get address for d
- LDR r1,r4 get value of d
- ADD r0,r0,r1 compute y
- ADR r4,y get address for y
- STR r0,r4 store y
- B after branch around false block
59If statement, contd.
- false block
- fblock ADR r4,c get address for c
- LDR r0,r4 get value of c
- ADR r4,d get address for d
- LDR r1,r4 get value for d
- SUB r0,r0,r1 compute a-b
- ADR r4,x get address for x
- STR r0,r4 store value of x
- after ...
60Example Conditional instruction implementation
- true block
- MOVLT r0,5 generate value for x
- ADRLT r4,x get address for x
- STRLT r0,r4 store x
- ADRLT r4,c get address for c
- LDRLT r0,r4 get value of c
- ADRLT r4,d get address for d
- LDRLT r1,r4 get value of d
- ADDLT r0,r0,r1 compute y
- ADRLT r4,y get address for y
- STRLT r0,r4 store y
61Conditional instruction implementation, contd.
- false block
- ADRGE r4,c get address for c
- LDRGE r0,r4 get value of c
- ADRGE r4,d get address for d
- LDRGE r1,r4 get value for d
- SUBGE r0,r0,r1 compute a-b
- ADRGE r4,x get address for x
- STRGE r0,r4 store value of x
62Example FIR filter
- C
- for (i0, f0 iltN i)
- f f cixi
- Assembler
- loop initiation code
- MOV r0,0 use r0 for I
- MOV r8,0 use separate index for arrays
- ADR r2,N get address for N
- LDR r1,r2 get value of N
- MOV r2,0 use r2 for f
63FIR filter, cont.d
- ADR r3,c load r3 with base of c
- ADR r5,x load r5 with base of x
- loop body
- loop LDR r4,r3,r8 get ci
- LDR r6,r5,r8 get xi
- MUL r4,r4,r6 compute cixi
- ADD r2,r2,r4 add into running sum
- ADD r8,r8,4 add one word offset to array
index - ADD r0,r0,1 add 1 to i
- CMP r0,r1 exit?
- BLT loop if i lt N, continue
64Nested subroutine calls
- Nesting/recursion requires coding convention
- f1 LDR r0,r13 load arg into r0 from stack
- call f2()
- STR r14,r13! store f1s return adrs
- STR r0,r13! store arg to f2 on stack
- BL f2 branch and link to f2
- return from f1()
- SUB r13,4 pop f2s arg off stack
- LDR r15,r13! restore register and return
65Summary
- Load/store architecture
- Most instructions are RISCy, operate in single
cycle. - Some multi-register operations take longer.
- All instructions can be executed conditionally.
66MPC850
- Integrated Communication Microprocessor
67Reference Manuals
- MPC850 Family User Manual
- PowerPC Programming Environment Manual
- Course Home Page http//calab.kaist.ac.kr/maeng/c
s310/micro02.htm - Motorola Home Page
- http//e-www.motorola.com
68Overview
- Versatile, one-chip, integrated communication
processor - Embedded PowerPC core
- Versatile memory controller
- Communication processor module (CPM)
- Serial communication controllers (SCCs)
- One USB
- Etc.
69(No Transcript)
70Embedded PowerPC core
- Single issue, 32-bit version
- Branch folding and prediction
- 2-K byte I-cache, 1K byte D-cache
- 2-way set-associative
- Physical
- MMUs with 8-entry TLBs
- 4K, 16K, 256K, 512K, and 8MB page sizes
71Other Features
- Dynamic data bus sizing 8-, 16-, 32-bit
- CPU clock 0-80MHz
- System Integration Unit (SIU)
- Memory Controller
- General Purpose timer
- CPM, SCCs, SMCs, etc.
72PowerPC Architecture
73PowerPC instruction set
- Overview
- Operand Conventions
- PowerPC Registers and programming model
- Addressing Modes
- Instruction Set
- Cache model
- Exception Model
- Memory management model
74PowerPC Architecture
- Motorola, IBM, Apple computer
- Power Architecture RS/6000 family
- 64-bit architecture with a 32-bit subset
- Three Levels of the architecture
- Flexibility degrees of SW compatibility
- UISA (User instruction set architecture)
- VEA (Virtual environment architecture)
- OEA (Operating environment architecture)
75Features not defined by the PowerPC Architecture
- For flexibility
- System bus interface signals
- Cache design
- The number and the nature of execution units
- Other internal micro-architecture issues
76Endianness
- Relationship between bit and byte/word ordering
defines endianness
bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
PowerPC, IBM, Motorola
ARM, Intel
77Programming Model Registers
78(No Transcript)
79PowerPC programming model - Register Set
- User Model UISA (32-bit architecture)
Condition register
GPR0(32)
FGPR0(64)
CR(32)
GPR1(32)
FGPR1(64)
FP status and control register
GPR31(32)
FPSCR(32)
FGPR31(64)
XER register
Link register
Count register
CTR(64/32)
XER(32)
LR(64/32)
80Condition Registers (CR)
- For testing and branching
CR0
CR1
CR7
CR6
CR5
CR4
CR3
CR2
0
31
FP
Condition register CRn Field Compare Instruction
For all integer instrs. Bit0 Negative(LT) Bit1
Positive(GT) Bit2 Zero (EQ) Bit3 Summary
Overflow(SO)
back
81XER Register (XER)
back
82XER Register (XER), contd
83Link Register (LR), Count Register (CTR)
bclrx (bc to link register) Branch with link
update
84Counter Register
85VEA Register Set Time Base
86OEA Register Set
87Machine State Register (MSR)
88(No Transcript)
89(No Transcript)
90Addressing Modes
- Effective Address Calculation
- Register indirect with immediate index mode
- Register indirect with index mode
- Register indirect mode
91Register Indirect with Immediate Index Addressing
back
92Register Indirect with Index
back
93Register Indirect
back
94Instruction Formats
- 4 bytes long and word-aligned
- Bits 0-5 always specify the primary opcode
- Extended opcode
95Instruction set
- Integer
- Floating-point
- Load and store
- Flow control
- Processor control
- Memory synchronization
- Memory control
- External control
96Integer Instructions
- Arithmetic, compare, logical, rotate and shift
- Integer arithmetic, shift, rotate, and string
move - May update or read values from the XER
- The CR may be updated if the Rc bit is set.
- addic - addic.
97(No Transcript)
98(No Transcript)
99(No Transcript)
100Integer Compare
- Algebraically, logically
- crfD can be omitted if the result is to be placed
in CR0 - crfD field the target CR
- The L bit has no effect on 32-bit operations
101Integer compare, contd
102Integer Logical
103Integer Logical, contd
104Rotate and Shift Instructions
- SH specify the number of bits to rotate
- MB mask start
- ME mask stop
105Integer Rotate
106Integer Shift
107Load and Store
- Integer load and store
- Integer load and store with byte-reverse
- Integer load and store multiple
- FP load and store
- Memory synchronization
108(No Transcript)
109(No Transcript)
110(No Transcript)
111(No Transcript)
112Branch and Flow Control
- EA calculation
- Branch relative
- Branch conditional to relative address
- Branch to absolute address
- Branch conditional to absolute address
- Branch conditional to link register
- Branch conditional to count register
113Branch Relative
114Branch conditional to relative
115Branch to Absolute
116Branch conditional to absolute
117Branch conditional to LR
118Branch conditional to count register
119Conditional Branch control
120Branch Instructions
121CR logical Instructions
122Trap, System Linkage
123Processor Control
124(No Transcript)
125Memory Synchronization
126Example
- Test and Set
- loop lwarx r5,0,r3 load and reserve
- cmpwi r5,0 done if word
- bne 12 not equal to 0
- stwcx. r4,0,r3 try to store
non-zero - bne- loop loop if lost
reservation
127Summary
- UISA, VEA, OEA
- Register set
- Fixed size instruction - RISC
- Load and store architecture
- 3 addressing modes
- Condition Register Update Rc field
- 8 condition registers
- Branch addressing modes
- BO, BI fields
- Relative, absolute, LR, CTR