Title: A bit about computer architecture
1A bit about computer architecture
- CS 147, Fall Semester 2007
- Robert Correll
2Overview
- RISC microprocessor design
- Diagnostic testing
- Software development
- Microprocessor features
- System-on-Chip (SoC)
3RISC microprocessor design
- 12 members on the team
- Design Manager (1)
- ASIC Design Engineers (9)
- Diagnostics Manager (1)
- Software Engineer (1)
- Culture
- High-tech (Verilog)
- Very quiet
4Embedded 32-bit microprocessor
- Earns Editor's Choice Award
- Microprocessor Report Names IDTs RC32364 Best
Embedded Processor for Price/Performance - (Volume 12, Number 7, June 1, 1998)
-
5Embedded processor-based applications
- Low-end routers and switches
- Cellular base stations
- Consumer multimedia game systems
6Device Overview
- MIPS-II RISC architecture with enhancements
- Scalar 5-stage pipeline minimizes branch and load
delays - DSP engine capable of doing 1 multiply accumulate
instruction every 2 clock cycles
7Device Overview (continued)
- Enhanced instruction set architecture
- MIPS-IV compatible conditional move instructions
- MIPS-IV superset PREF (prefetch) instruction
- Fast multiplier with atomic multiply-add,
multiply-sub - Count leading zero/one instructions
8Device Overview (continued)
- Large, efficient on-chip caches
- Separate 8KB Instruction cache and 2KB Data cache
- 2-way set associative
- Write-back and write-through support on a per
page basis - Optional cache locking, with per line resolution,
to facilitate deterministic response - Simultaneous instruction and data fetch in each
clock cycle, achieves over l GB/sec bandwidth
9Device Overview (continued)
- Flexible MMU with 32-page TLB
- Variable page size
- Enhanced write algorithm support
- Variable number of locked entries
- No performance penalty for address translation
10Device Overview (continued)
- Flexible bus interface allows simple, low-cost
designs - Bus interface runs at a fraction of pipeline rate
Programmable port-width interface (8-,16-, 32-bit
memory and I/O regions) - Programmable bus turnaround (BTA) times
- Supports single datum or burst transactions
- Selectable system byte-ordering
11RC32364 Block Diagram
12Diagnostic Testing
- Began with 300 tests and behavior model
- Downloaded 10 to 40 new tests per day
- One test per directory
- Build each test
- Run each test on an RTL model
- Debug and track failures
- Finished with more than 3,000 tests
13Software Development
- Test Release System
- Automated regression process
- Distributed jobs based upon cycle counts
- Provided customized history reports
- Accumulated load per signal utility
- Test vectors
- Many other value-added scripts
- Diagnostic tests
14CPU Instruction Set
15Load Link Store Conditional Opcodes
li 9, 1 sw 9, 0(6) .word
0xc0850000 opcode ll 5, 0(4)
bne 5, 0, Fail verify sem 0 li
5, 2 li 9, 2 sw 9, 0(6) .word
0xe0850000 opcode sc 5, 0(4) bne
5, 8, Fail verify sc indicates
success li 8, 2
16CPU Pipeline Architecture
17CPU Pipeline Stages
- 1I - Instruction Fetch, Phase one
- Instruction address translation begins
- 2I - Instruction Fetch, Phase two
- Instruction cache fetch begins
- Instruction address translation continues
18CPU Pipeline Stages (continued)
- 1R - Register Fetch, Phase one
- The instruction cache fetch finishes.
- The instruction cache tag is checked against the
physical page frame number obtained from the
address translation.
19CPU Pipeline Stages (continued)
- 2R - Register Fetch, Phase two
- The instruction decoder decodes the instruction.
- Any required operands are fetched from the
register file. - Make a decision to either issue or slip (for an
interlock condition). - For a branch, the branch address is calculated.
20CPU Pipeline Stages (continued)
- 1A - Execution, Phase one
- Any result from the A or D stages are bypassed.
- The arithmetic logic unit (ALU) starts the
integer arithmetic, logical or shift operation. - The ALU calculates the data virtual address for
load and store instructions. - The ALU determines whether the branch condition
is true.
21CPU Pipeline Stages (continued)
- 2A - Execution, Phase two
- The integer arithmetic, logical or shift
operation will complete. - A data cache access will start.
- Store data is shifted to the specified byte
position(s). - The data virtual to physical address translation
will start.
22CPU Pipeline Stages (continued)
- 1D - Data Fetch, Phase one
- The data cache access will continue.
- The data address translation completes.
- 2D - Data Fetch, Phase two
- The data cache access will finish and the data is
then shifted down and extended. - The data cache tag is checked against the
physical address for any data cache access.
23CPU Pipeline Stages (continued)
- 1W - Write Back, Phase one
- The processor uses this phase internally to
resolve all exceptions in preparation for the
register file write. - 2W - Write Back, Phase two
- For register-to-register and load instructions,
the result is written back to the register file. - Branch instructions perform no operation during
this stage.
24Activities during each ALU pipeline stage...
25...for load, store, and branch instructions.
26Stall Conditions
- Detected after the R pipe-stage.
- The processor will resolve the condition.
- Detect cache miss
- Start moving dirty cache line data to write
buffer - Get first doubleword into cache and restart
pipeline - Load remainder of cache line into cache
27Slip Conditions
- Slipped instructions are retried on subsequent
cycles - Detect cache miss
- Get entire cache line into cache
- Continue pipeline
- Inserted NOP instructions
28Memory Management Unit (MMU)
- Generates translation lookaside buffer (TLB)
exceptions such as - TLB refill
- TLB invalid
- TLB modified
- Offers the following advantages
- Variable page size
- Enhanced Write Algorithm support
- Mapping of a larger portion of the virtual
address space - Variable number of locked entries
2932-bit Virtual Address Translation
30TLB Register Format
31TLB Register Field Descriptions
32MMU Register Descriptions
33Range of wired and random entries
34User Mode Address Space
35Kernal Mode Address Space
36CPU Exception Processing
- Begins when the processor receives and detects
exceptions such as - address translation errors
- arithmetic overflows
- I/O interrupts
- system calls
- Processor suspends normal instruction sequence
and enters Kernel mode
37CPU Exception Processing (continued)
- Processor then disables interrupts,
- Forces execution of a software handler, which is
located at a fixed address. - The handler may save processor context
- program counter contents
- current operating mode (User or Kernel mode)
- interrupt status (enabled or disabled)
38Exception Processing Registers...
39Basic CP0 Registers
40Exception Priority
41Cache Organization, Operation, and Coherency
42Primary I-Cache Line Format
43Primary D-Cache Line Format
44Conceptual Primary Cache Lookup Seq.
45Primary Cache Data and Tag Organization
46Primary Cache States
47Clocking, Reset, and Initialization Interfaces
48Timing Illustration of MasterClock-to-PClock
Multiply by 2
49EJTAG (In-circuit Emulator) Interface
50EJTAG Block Diagram
51System-on-Chip (SoC)
52SoC (continued)
53SoC (continued)
54Summary
- RISC microprocessor design
- Diagnostic testing
- Software development
- Microprocessor features
- System-on-Chip (SoC)
55References
- IDT 79RC32364 RISController Advanced
Architecture, 32-bit Embedded Microprocessor,
Users Reference Manual, 1999, http//www.idt.com/
products/files/10750/79RC32364_MA_38374.pdf?CFID1
729583CFTOKEN95787432 - IDT Interprise 79RC32351 Integrated
Communications Processor Data Sheet, 2004
http//www.idt.com/products/files/10702/RC32351_DS
_23066.pdf?CFID1729583CFTOKEN95787432
56References (continued)
- IDT Interprise 79RC32365 Integrated
Communications Processors User Reference Manual,
2004, http//www.idt.com/products/files/10712/79RC
32365_MA_12022.pdf?CFID1729583CFTOKEN95787432 - IDT Interprise 79RC32435 Integrated
Communications Processor Data Sheet, 2006,
http//www.idt.com/products/files/571508/32435_ds.
pdf?CFID1729583CFTOKEN95787432
57A bit about computer architecture
- CS 147, Fall Semester 2007
- Robert Correll