Embedded System HW

About This Presentation

Title:

Embedded System HW

Description:

Title: SHARC programming model Author: Wayne Wolf Last modified by: Created Date: 6/17/1995 11:31:02 PM Document presentation format: – PowerPoint PPT presentation

Number of Views:271

Avg rating:3.0/5.0

Slides: 138

Provided by: wayne74

Category:

more less

Transcript and Presenter's Notes

Title: Embedded System HW

1
Embedded System HW

Processor Technology

2
Processor Technology

General Purpose (software)
Application Specific
Single Purpose (Hardware)
IC technology
Full Custom/VLSI
Semi-custom ASIC (gate-array, standard cell)
PLD

3
Custom single-purpose processors Hardware
4
Outline

Introduction
Combinational logic
Sequential logic
Custom single-purpose processor design
RT-level custom single-purpose processor design
Read chapter 2 in Embedded System Design A
unified Hardware/Software Introduction, Frank
Vahid and Tony Givargis.

5
Introduction

Processor
Digital circuit that performs a computation tasks
Controller and datapath
General-purpose variety of computation tasks
Single-purpose one particular computation task
Custom single-purpose non-standard task
A custom single-purpose processor may be
Fast, small, low power
But, high NRE, longer time-to-market, less
flexible

6
Custom single-purpose processor basic model
7
Example greatest common divisor

First create algorithm
Convert algorithm to complex state machine
Known as FSMD finite-state machine with datapath
Can use templates to perform such conversion

(c) state diagram
(b) desired functionality
0 int x, y 1 while (1) 2 while
(!go_i) 3 x x_i 4 y y_i 5 while
(x ! y) 6 if (x lt y) 7
y y - x else 8
x x - y 9 d_o x
8
State diagram templates
9
Creating the datapath

Create a register for any declared variable
Create a functional unit for each arithmetic
operation
Connect the ports, registers and functional units
Based on reads and writes
Use multiplexors for multiple sources
Create unique identifier
for each datapath component control input and
output

10
Creating the controllers FSM

Same structure as FSMD
Replace complex actions/conditions with datapath
configurations

11
Splitting into a controller and datapath
go_i
Controller
!1
1
0000
1
!(!go_i)
2
0001
!go_i
2-J
0010
x_sel 0 x_ld 1
3
0011
y_sel 0 y_ld 1
4
0100
x_neq_y0
5
0101
x_neq_y1
6
0110
x_lt_y1
x_lt_y0
y_sel 1 y_ld 1
x_sel 1 x_ld 1
7
8
0111
1000
6-J
1001
5-J
1010
d_ld 1
9
1011
1-J
1100
12
Controller state table for the GCD example
13
Completing the GCD custom single-purpose
processor design

We finished the datapath
We have a state table for the next state and
control logic
All thats left is combinational logic design
This is not an optimized design, but we see the
basic steps

14
Summary

Custom single-purpose processors
Straightforward design techniques
Can be built to execute algorithms
Typically start with FSMD
CAD tools can be of great assistance

15
General-Purpose Processors Software
16
Introduction

General-Purpose Processor
Processor designed for a variety of computation
tasks
Low unit cost, in part because manufacturer
spreads NRE over large numbers of units
Motorola sold half a billion 68HC05
microcontrollers in 1996 alone
ARM processors 1.5 billion processors
Carefully designed since higher NRE is acceptable
Can yield good performance, size and power
Low NRE cost, short time-to-market/prototype,
high flexibility
User just writes software no processor design
a.k.a. microprocessor micro used when they
were implemented on one or a few chips rather
than entire rooms

17
Why use microprocessors?

Alternatives field-programmable gate arrays
(FPGAs), custom logic, etc. (Custom
Single-purpose Processor or HW Logic)
Microprocessors are often very efficient can use
same logic to perform many different functions.
Microprocessors simplify the design of families
of products.

18
The performance paradox

Microprocessors use much more logic to implement
a function than does custom logic.
But microprocessors are often at least as fast
heavily pipelined
large design teams
aggressive VLSI technology.

19
Power

Custom logic is a clear winner for low power
devices.
Modern microprocessors offer features to help
control power consumption.
Software design techniques can help reduce power
consumption.

20
Basic Architecture
21
Basic Architecture

Control unit and datapath
Note similarity to single-purpose processor
Key differences
Datapath is general
Control unit doesnt store the algorithm the
algorithm is programmed into the memory

22
Superscalar and VLIW Architectures

Performance can be improved by
Faster clock (but theres a limit)
Pipelining slice up instruction into stages,
overlap stages
Multiple ALUs to support more than one
instruction stream
Superscalar
Scalar non-vector operations
Fetches instructions in batches, executes as many
as possible
May require extensive hardware to detect
independent instructions
VLIW each word in memory has multiple
independent instructions
Currently growing in popularity
Relies on the compiler to detect and schedule
instructions

23
Pipelining Increasing Instruction Throughput
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Wash
Non-pipelined
Pipelined
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Dry
Time
Time
non-pipelined dish cleaning
pipelined dish cleaning
Fetch-instr.
1
2
3
4
5
6
7
8
Decode
1
2
3
4
5
6
7
8
Fetch ops.
1
2
3
4
5
6
7
8
Pipelined
Execute
1
2
3
4
5
6
7
8
Instruction 1
Store res.
1
2
3
4
5
6
7
8
Time
pipelined instruction execution
24
Two Memory Architectures

Princeton
Fewer memory wires
Harvard
Simultaneous program and data memory access

25
Princeton vs. Harvard

Harvard cant use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming
data
greater memory bandwidth
more predictable bandwidth.

26
Cache Memory

Memory access may be slow
Cache is small but fast memory close to processor
Holds copy of part of memory
Hits and misses

27
Application-Specific Instruction-Set Processors
(ASIPs)
28
Application-Specific Instruction-Set Processors
(ASIPs)

General-purpose processors
Sometimes too general to be effective in
demanding application
e.g., video processing requires huge video
buffers and operations on large arrays of data,
inefficient on a GPP
But single-purpose processor has high NRE, not
programmable
ASIPs targeted to a particular domain
Contain architectural features specific to that
domain
e.g., embedded control, digital signal
processing, video processing, network processing,
telecommunications, etc.
Still programmable

29
Microprocessor varieties

Microcontroller includes I/O devices, on-board
memory.
Digital signal processor (DSP) microprocessor
optimized for digital signal processing.
Typical embedded word sizes 8-bit, 16-bit,
32-bit.

30
Embedded Processors

???? ????
??? ????????? ??
????????? ??? ????? ??
CPU ??, ???, ?? ??, ?????? ??? ??? ???? ??? ????
??

Netsilicon NETARM Embedded Processor
31
Many Types of Programmable Processors

Past
Microprocessor
Microcontroller
DSP
Graphics Processor

Now / Future
Network Processor
Sensor Processor
Cryptoprocessor
Game Processor
Wearable Processor
Mobile Processor

32
A Common ASIP Microcontroller

For embedded control applications
Reading sensors, setting actuators
Mostly dealing with events (bits) data is
present, but not in huge amounts
e.g., VCR, disk drive, digital camera (assuming
SPP for image compression), washing machine,
microwave oven
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial
communication, etc.
Tightly integrated for programmer, typically part
of register space
On-chip program and data memory
Direct programmer access to many of the chips
pins
Specialized instructions for bit-manipulation and
other low-level operations

33
Another Common ASIP Digital Signal Processors
(DSP)

For signal processing applications
Large amounts of digitized data, often streaming
Data transformations must be applied fast
e.g., cell-phone voice filter, digital TV, music
synthesizer
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction,
other instrs.
Efficient vector operations e.g., add two
arrays
Vector ALUs, loop buffers, etc.

34
Trend Even More Customized ASIPs

In the past, microprocessors were acquired as
chips
Today, we increasingly acquire a processor as
Intellectual Property (IP)
e.g., synthesizable VHDL model
Opportunity to add a custom datapath hardware and
a few custom instructions, or delete a few
instructions
Can have significant performance, power and size
impacts
Problem need compiler/debugger for customized
ASIP
Remember, most development uses structured
languages
One solution automatic compiler/debugger
generation
e.g., www.tensillica.com
Another solution retargettable compilers
e.g., www.improvsys.com (customized VLIW
architectures)

35
Reconfigurable SoC
Other Examples Atmels FPSLIC(AVR
FPGA) Alteras Nios(configurable RISC on a PLD)

Triscends A7 CSoC

36
Selecting a Microprocessor

Issues
Technical speed, power, size, cost
Other development environment, prior expertise,
licensing, etc.
Speed how evaluate a processors speed?
Clock speed but instructions per cycle may
differ
Instructions per second but work per instr. may
differ
Dhrystone Synthetic benchmark, developed in
1984. Dhrystones/sec.
MIPS 1 MIPS 1757 Dhrystones per second (based
on Digitals VAX 11/780). A.k.a. Dhrystone MIPS.
Commonly used today.
So, 750 MIPS 7501757 1,317,750 Dhrystones
per second
SPEC set of more realistic benchmarks, but
oriented to desktops
EEMBC EDN Embedded Benchmark Consortium,
www.eembc.org
Suites of benchmarks automotive, consumer
electronics, networking, office automation,
telecommunications

37
Processors ??
Sources Intel, Motorola, MIPS, ARM, TI, and IBM
Website/Datasheet Embedded Systems Programming,
Nov. 1998
38
Summary

General-purpose processors
Good performance, low NRE, flexible
Controller, datapath, and memory
Structured languages prevail
But some assembly level programming still
necessary
Many tools available
Including instruction-set simulators, and
in-circuit emulators
ASIPs
Microcontrollers, DSPs, network processors, more
customized ASIPs
Choosing among processors is an important step
Designing a general-purpose processor is
conceptually the same as designing a
single-purpose processor

39
Instruction Sets
40
RISC vs. CISC

Complex instruction set computer (CISC)
many addressing modes
many operations.
Reduced instruction set computer (RISC)
load/store
pipelinable instructions.

41
CISC ????

Intel ?? ????????? ?? ? ??

?? ???? ?? ????? ?? ??
1971 4004 2,250 ??? ? ???? ????, Busicom ???? ??
1972 8008 2,500 Mark-8?? ??, ??? ??? ???
1974 8080 5,000 Altair?? ??
1978 8086/8088 29,000 IBM-PC XT?? ??, ??? ????? ??
1982 80286 120,000 IBM-PC AT?? ??, 6?? ? 5??? ??
1985 80386 275,000 32?? ?? ??? ??
1989 80486 1,180,000 ?? ?? ???? ??
1993 Pentium 3,100,000 ??, ??? ?? ?? ??
1995 Pentium Pro 5,500,000 Dynamic Execution ?? ??
1997 Pentium 2 7,500,000 MMX ?? ??
1999 Pentium 3 24,000,000 SIMD ??, 12 ???? ?????
2001 Itanium 25,000,000 64??, Explicitly Parallel Instruction Computing(EPIC)
2002 Pentium 4 55,000,000 20 ???? ??? ?????, ??? ???
2003 Itanium 2 410,000,000 Machine Check Architecture, EPIC, 6MB L3 ??
42
CISC - History Packaging?? ??
43
CISC - History
44
Instruction set characteristics

Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.

45
ARM data processing Instruction Format(RISC)
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
46
Intel IA-32 Instruction Format (CISC)
47
Programming model

Programming model registers visible to the
programmer.
Some registers are not visible (IR).

48
Multiple implementations

Successful architectures have several
implementations
varying clock speeds
different bus widths
different cache sizes
etc.

49
ARM Architecture

Advanced RISC Machines(1990)
(ACORN and Apple Computer)

50
ARM Architecture

ARM versions.
ARM assembly language.
ARM programming model.

51
ARM versions

ARM architecture has been extended over several
versions.
We will concentrate on ARMv5

52
Evolution of the ARM architecture versions
53
ARMv6 Improvement

Memory management
Multiprocessing
Multimedia support SIMD capability

54
Evolution of the ARM architecture
ARM11
55
Introduction

To allow very small, yet high-performance
implementations
RISC
Large uniform register file
Load/store architecture
Simple addressing modes
Uniform and fixed-length instr fields
Auto-increment and auto-decrement addr modes
Conditional execution of all instrcutions

56
Extension of the RISC rules

High code density, low power, and small die size
Variable cycle execution
Multiple load and store
Improve code density, reduce Ifs, and reduces
overall power consumption
Inline barrel shifter
Conditional execution
16-bit Thumb instruction set
Enhance DSP instructions
16X16 multiply, arithmetic saturation
DSP-specific routines

57
ARM assembly language

Fairly standard assembly language
LDR r0,r8 a comment
label ADD r4,r0,r1

58
Programming Model
59
ARM data types

Byte
Halfword 16 bits
Must be aligned to two-byte boundaries
Word 32 bits
Must be aligned to four-byte boundaries
ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either little-
or bit-endian mode.

60
Processor modes

User usr Normal program execution modes
FIQ fiq Supports a high-speed data transfer or
channel process
IRQ irq Used for general-purpose interrupt
handling
Supervisor svc A protected mode for OS
Abort abt Implements VM and/or memory
protection
Undefined und Supports software emulation of
HW coprocessors
System sys Runs privileged OS tasks
fiq, irq, svc, abt, und exception modes

61
Registers
r0
r8
r1
r9
0
31
r2
r10
CPSR
r3
r11
r4
r12
r5
r13
r6
r14
r7
r15 (PC)
Link register
unbanked registers
banked registers
62
(No Transcript)
63
Endianness

Relationship between bit and byte/word ordering
defines endianness

bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
64
ARM status bits

Every arithmetic, logical, or shifting operation
may set CPSR (current program statues register)
bits
N (negative), Z (zero), C (carry), V (overflow).
Examples
-1 1 0 NZCV 0110.
231-11 -231 NZCV 0101.

65
ARM data processing operand addressing

Instruction syntax
ltopcodegtltcondgtS ltRdgt, ltRngt, ltshifter-operandgt
ltshifter-operandgt has 11 options

66
Condition field

Almost all ARM instrs. conditionally executed

67
ARM data processing operand addressing
Data processing immediate shift
Data processing register shift
Data processing 32-bit immediate
68
Shifter operand

Immediate
8-bit constant and a 4-bit rotate (0,2,4,8,,30)
mov r0, 0
add r9, r9,1
Register operand
mov r2, r0
Shifted register operand
ASR, LSL, LSR, ROR, RRX (by one bit)
mov r2, r0, LSL 2 shift r0 left by 2, write
to r2 (r2r0x4)
sub r10,r9,r8, LSR 4 r10 r9 - r8/16
sov r10,r9,r8, ROR r3 r10 r9 - (r8 rotated by
value of r3)

69
ARM data-processing

AND
EOR
SUB Rd Rn - shifter operand
RSB Rd shifter operand - Rn
ADD
ADC (with carry)
SBC
RSC (reverse SBC)

TST update flags after Rn AND shifter operand
TEQ
CMP
CMN copmare negated
ORR (logical OR)
MOV
BIC
MVN (mov not)

70
ARM data-processing

Shift, Rotate ? shifter-operand
LSL, LSR logical shift left/right
ASR arithmetic shift left/right
ROR rotate right
RRX rotate right extended with C

71
Data operation varieties

Logical shift
fills with zeroes.
Arithmetic shift
fills with sign extension
RRX performs 33-bit rotate, including C bit from
CPSR above sign bit.

72
Load and Store instructions

Two types
32-bit word or an 8-bit unsigned byte
Load and store halfword and load signed byte
Addressing modes
Base register
Any one of GPR (including the PC)
Offset
Three format

73
Addressing modes

Offset
Immediate unsigned number (12 bits or 8 bits)
Register GPR (not the PC)
Scaled register shifted by an immediate value
LSL, LSR, ASR, ROR, RRX
Three ways to form the memory address
EA Base register or Offset
Offset
Pre-indexed
Post-indexed

74
Addressing modes

Base-plus-offset addressing
LDR r0,r1,16
Loads from location r116
Pre-indexing increments base register
LDR r0,r1,16!
Post-indexing fetches, then does offset
LDR r0,r1,16
Loads r0 from r1, then adds 16 to r1.

75
Load and store

LDR
LDRB
LDRH
LDRSB (signed byte)
LDRSH (signed halfw)

STR
STRB
STRH

76
Examples

LDR R1, R0 load R1 from the address in R0
LDR R8, R3, 4 EA R3 4
LDR R8, R3, -4 EA R3 4
STRB R10, R7, -R4 EA R7 R4
LDR R11, R3, R5, LSL 2 EA R3 (R5x4)
LDR R3, R9, 4 EA R9, R9 R9 4
post-indexed
LDR R1, R0, 2 ! EA R02, R0R02
pre-indexed
LDR R0, PC, 40 load R0 from PC0x40 (
address of the instruction 8 0x40)

77
Load and store multiple

Addressing modes
IA increment after
IB increment before
DA decrement after
DB decrement before

78
Load and store multiple

LDM
STM
Examples
LDMIA r0, r5 r8
load multiple r5-r8 from
the
address in r0
STMDA r1!, r2, r5, r7 r9, r11
update r1

79
Branch instructions

Conditional branch forwards or backwards up to 32
MB
Sign-extending the 24-bit imm_data to 32 bits
Shifting the result left two bits
Adding this to the PC (the addr of branch 8)
Approximately 32MB
B, BL

80
Examples

B label
BCC label branch if carry flag is clear
BEQ label if zero flag is set
MOV PC, 0 branch to location zero
BL func subroutine call
MOV PC,LR return
MOV LR, PC
LDR PC, func

81
ARM ADR pseudo-op

Cannot refer to an address directly in an
instruction.
Generate value by performing arithmetic on PC.
ADR pseudo-op generates instruction required to
calculate address
ADR r1,FOO

82
Examples

start MOV r0, 10
ADR r4, start gt SUB r4,pc,0xc
start pc - 4 - 8 pc - 12 pc - 0xc

83
Example C assignments

C
x (a b) - c
Assembler
ADR r4,a get address for a
LDR r0,r4 get value of a
ADR r4,b get address for b, reusing r4
LDR r1,r4 get value of b
ADD r3,r0,r1 compute ab
ADR r4,c get address for c
LDR r2r4 get value of c

84
C assignment, contd.

SUB r3,r3,r2 complete computation of x
ADR r4,x get address for x
STR r3r4 store value of x

85
Example C assignment

C
y a(bc)
Assembler
ADR r4,b get address for b
LDR r0,r4 get value of b
ADR r4,c get address for c
LDR r1,r4 get value of c
ADD r2,r0,r1 compute partial result
ADR r4,a get address for a
LDR r0,r4 get value of a

86
C assignment, contd.

MUL r2,r2,r0 compute final value for y
ADR r4,y get address for y
STR r2,r4 store y

87
Example C assignment

C
z (a ltlt 2) (b 15)
Assembler
ADR r4,a get address for a
LDR r0,r4 get value of a
MOV r0,r0,LSL 2 perform shift
ADR r4,b get address for b
LDR r1,r4 get value of b
AND r1,r1,15 perform AND
ORR r1,r0,r1 perform OR

88
C assignment, contd.

ADR r4,z get address for z
STR r1,r4 store value for z

89
Example if statement

C
if (a lt b) x 5 y c d else x c - d
Assembler
compute and test condition
ADR r4,a get address for a
LDR r0,r4 get value of a
ADR r4,b get address for b
LDR r1,r4 get value for b
CMP r0,r1 compare a lt b
BGE fblock if a gt b, branch to false block

90
If statement, contd.

true block
MOV r0,5 generate value for x
ADR r4,x get address for x
STR r0,r4 store x
ADR r4,c get address for c
LDR r0,r4 get value of c
ADR r4,d get address for d
LDR r1,r4 get value of d
ADD r0,r0,r1 compute y
ADR r4,y get address for y
STR r0,r4 store y
B after branch around false block

91
If statement, contd.

false block
fblock ADR r4,c get address for c
LDR r0,r4 get value of c
ADR r4,d get address for d
LDR r1,r4 get value for d
SUB r0,r0,r1 compute a-b
ADR r4,x get address for x
STR r0,r4 store value of x
after ...

92
Example Conditional instruction implementation

true block
MOVLT r0,5 generate value for x
ADRLT r4,x get address for x
STRLT r0,r4 store x
ADRLT r4,c get address for c
LDRLT r0,r4 get value of c
ADRLT r4,d get address for d
LDRLT r1,r4 get value of d
ADDLT r0,r0,r1 compute y
ADRLT r4,y get address for y
STRLT r0,r4 store y

93
Conditional instruction implementation, contd.

false block
ADRGE r4,c get address for c
LDRGE r0,r4 get value of c
ADRGE r4,d get address for d
LDRGE r1,r4 get value for d
SUBGE r0,r0,r1 compute a-b
ADRGE r4,x get address for x
STRGE r0,r4 store value of x

94
Example FIR filter

C
for (i0, f0 iltN i)
f f cixi
Assembler
loop initiation code
MOV r0,0 use r0 for I
MOV r8,0 use separate index for arrays
ADR r2,N get address for N
LDR r1,r2 get value of N
MOV r2,0 use r2 for f

95
FIR filter, cont.d

ADR r3,c load r3 with base of c
ADR r5,x load r5 with base of x
loop body
loop LDR r4,r3,r8 get ci
LDR r6,r5,r8 get xi
MUL r4,r4,r6 compute cixi
ADD r2,r2,r4 add into running sum
ADD r8,r8,4 add one word offset to array
index
ADD r0,r0,1 add 1 to i
CMP r0,r1 exit?
BLT loop if i lt N, continue

96
Nested subroutine calls

Nesting/recursion requires coding convention
f1 LDR r0,r13 load arg into r0 from stack
call f2()
STR r14,r13! store f1s return adrs
STR r0,r13! store arg to f2 on stack
BL f2 branch and link to f2
return from f1()
SUB r13,4 pop f2s arg off stack
LDR r15,r13! restore register and return

97
Summary

Load/store architecture
Most instructions are RISCy, operate in single
cycle.
Some multi-register operations take longer.
All instructions can be executed conditionally.

98
MPC850
99
Reference Manuals

MPC850 Family User Manual
PowerPC Programming Environment Manual
Course Home Page http//calab.kaist.ac.kr/maeng/c
s310/micro02.htm
Motorola Home Page
http//e-www.motorola.com

100
Overview

Versatile, one-chip, integrated communication
processor
Embedded PowerPC core
Versatile memory controller
Communication processor module (CPM)
Serial communication controllers (SCCs)
One USB
Etc.

101
(No Transcript)
102
Embedded PowerPC core

Single issue, 32-bit version
Branch folding and prediction
2-K byte I-cache, 1K byte D-cache
2-way set-associative
Physical
MMUs with 8-entry TLBs
4K, 16K, 256K, 512K, and 8MB page sizes

103
Other Features

Dynamic data bus sizing 8-, 16-, 32-bit
CPU clock 0-80MHz
System Integration Unit (SIU)
Memory Controller
General Purpose timer
CPM, SCCs, SMCs, etc.

104
PowerPC Architecture
105
PowerPC instruction set

Overview
Operand Conventions
PowerPC Registers and programming model
Addressing Modes
Instruction Set
Cache model
Exception Model
Memory management model

106
PowerPC Architecture

Motorola, IBM, Apple computer
Power Architecture RS/6000 family
64-bit architecture with a 32-bit subset
Three Levels of the architecture
Flexibility degrees of SW compatibility
UISA (User instruction set architecture)
VEA (Virtual environment architecture)
OEA (Operating environment architecture)

107
Features not defined by the PowerPC Architecture

For flexibility
System bus interface signals
Cache design
The number and the nature of execution units
Other internal micro-architecture issues

108
Endianness

Relationship between bit and byte/word ordering
defines endianness

bit 31
bit 0
bit 0
bit 31
byte 3
byte 2
byte 1
byte 0
byte 0
byte 1
byte 2
byte 3
little-endian
big-endian
PowerPC, IBM, Motorola
ARM, Intel
109
Programming Model Registers
110
(No Transcript)
111
PowerPC programming model - Register Set

User Model UISA (32-bit architecture)

Condition register
GPR0(32)
FGPR0(64)
CR(32)
GPR1(32)
FGPR1(64)

FP status and control register
GPR31(32)
FPSCR(32)
FGPR31(64)
XER register
Link register
Count register
CTR(64/32)
XER(32)
LR(64/32)
112
Condition Registers (CR)

For testing and branching

CR0
CR1
CR7
CR6
CR5
CR4
CR3
CR2
0
31
FP
Condition register CRn Field Compare Instruction
For all integer instrs. Bit0 Negative(LT) Bit1
Positive(GT) Bit2 Zero (EQ) Bit3 Summary
Overflow(SO)
back
113
XER Register (XER)
back
114
XER Register (XER), contd
115
Link Register (LR), Count Register (CTR)
bclrx (bc to link register) Branch with link
update
116
Counter Register

Loop count

117
VEA Register Set Time Base
118
OEA Register Set
119
Machine State Register (MSR)
120
(No Transcript)
121
(No Transcript)
122
Addressing Modes

Effective Address Calculation
Register indirect with immediate index mode
Register indirect with index mode
Register indirect mode

123
Register Indirect with Immediate Index Addressing
back
124
Register Indirect with Index
back
125
Register Indirect
back
126
Instruction Formats

4 bytes long and word-aligned
Bits 0-5 always specify the primary opcode
Extended opcode

127
Instruction set

Integer
Floating-point
Load and store
Flow control
Processor control
Memory synchronization
Memory control
External control

128
Summary

UISA, VEA, OEA
Register set
Fixed size instruction - RISC
Load and store architecture
3 addressing modes
Condition Register Update Rc field
8 condition registers
Branch addressing modes
BO, BI fields
Relative, absolute, LR, CTR

129
RISC Xscale Microarchitecture Features

Arm Architecture Version 5TE ISA ??
??? ???(?? 400MHz)
Modified Harvard Architecture
instruction cache? data cache? ??(2 caches)
32KB Instruction Cache
32KB Data Cache
Intel Media Processing Technology
Instruction and Data Memory Management Unit
Branch Target Buffer
Debug Capability via JTAG Port
0.35µm 3 Layer metal CMOS, 2.6 million transistor
256 PBGA package (17 x 17mm)

130
RISC Xscale System Integration Features

Memory controller
Power management controller
Normal, idle, sleep mode ??
USB client
Multi channel DMA controller
????? ???? ??, ?? DMA ??
LCD controller
AC97 codec
Multimedia card serial interface to standard
memory card, FIFO ??
FIR communication ??? ?? ??
Synchronous serial protocol port
I2C

131
RISC Xscale System Integration Features

85 GPIO ports
irq, wake up interrupt ??
UART
Real-time clock and timer
32?? ???, 32.7kHz ????, ??? /- 5sec/mon
OS timer with alarm register
Pulse width modulation
Interrupt controller
?? ??? ????? ???

132
RISC XScale ???

Architecture V5TE? ??

133
Internal Structure
134
RISC - Xscale ??

Palm size device - Example

135
PXA255 Pin
UDC-
L_DD(150)
Serial Channel 0 (USB)
UDC
L_FCLK
RXD_1
L_LCLK
LCDControl
Serial Channel 1
TXD_1
L_PCLK
RXD_2
L_BIAS
Intel? XScale PXA250 256-pins
Serial Channel 2 (IrDA)
TXD_2
GP(270)
GPIO Ports
RXD_3
nCAS/ DQM(30)
Serial Channel 3 (UART)
TXD_3
nRAS/ nSDCS(30)
TXD_C
nOE
RXD_C
nWE
Serial Channel 4(CODEC)
SFRM_C
nCS(50)
Memory Control
SCLK_C
RDY
BATT_FAULT
nSDRAS
VDD_FAULT
nSDCAS
Power Management
PWR_EN
SDCKElt10gt
SDCLKlt20gt
TCK_BYP
RD/nWR
Transceiver Control
TESTCLK
nPOE
PEXTAL
nPWE
PXTAL
nPIOR
nPIOW
TEXTAL
nPCElt21gt
PCMCIA Bus Signals
Clocks, Reset and Test
TXTAL
PSKTSEL
nPREG
nRESET
nPWAIT
nRESET_OUT
nIOIS16
SMROM_EN
Address Bus
Alt250gt
ROM_SEL
TCK
Dlt310gt
Data Bus
TDI
TDO
VDD
JTAG
TMS
VDDX
Supply
nTRST
VSS/VSSX
136
RISC Xscale running modes