designKilla: The 32-bit pipelined processor - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

designKilla: The 32-bit pipelined processor

Description:

designKilla: The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 35
Provided by: dat133
Learn more at: https://www.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: designKilla: The 32-bit pipelined processor


1
designKillaThe 32-bit pipelined processor
  • Brought to you by
  • Victoria Farthing
  • Dat Huynh
  • Jerry Felker
  • Tony Chen
  • Supervisor Young Cho

2
32-Bit RISC Pipelined Processor
  • Reduced Instruction Set allows for faster
    execution of simple, frequently used instructions
    which can be combined to achieve the same result
    as a single, slower CISC instruction
  • Pipelining allows a faster clock cycle and less
    wasted resources

3
Datapath Pipeline Stages
  • 5 Stages
  • Instruction Fetch
  • Instruction Decode
  • Execution
  • Memory Write
  • Write Back

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Unique Data Path Features
  • Next instruction address calculation
  • For basic incrementation, the address is
    calculated by a counter

12
Address Jump Calculations
  • For address jumps, there is a 19-bit load port on
    the counter
  • The loaded address comes from an adder with
    multiplexed inputs
  • Load bit is controlled by a comparator (beq)
    or-ed with the absolute jump control bit

13
Double Clocked Memory Interface
  • Problem One Memory for both Instruction and
    Data
  • Solution Double Clock!
  • Access the memory twice during one clock cycle

14
Double Clocked Memory Interface
Write Enable
Fast Clock
Clock
Fetch Instruction
Fetch Data
Fetch Instruction
Write Data
  • Fetches Instruction in First Cycle
  • Fetches or Writes Data In Second Cycle
  • Data is output by end of Clock Cycle

15
Unique Data Path Features
  • Structural Multiplier
  • 16 X 16 bit
  • Multi-level creation
  • Four 8 X 8 bit multipliers
  • Each containing four 4 X 4 bit multipliers
  • Each comprised of a cascaded network of full and
    half adders, built on logic gates

16
16-Bit Multiplier Unit
  • Based On Hand Multiplication
  • Made Up of Networkof AND Gatesand Adders

17
Why 32 ? 16 bit?
  • 32bit x 32bit 64 bits!
  • Multiple complex changes to existing architecture
    would be required
  • Only one register can be written per clock cycle
  • Could hold value for next cycle or stall the
    pipeline
  • Would require pseudoinstruction as well as new
    hardware and multiple control signals

18
Use pseudo-code instruction mult32
19
Improve the Multiplier
  • Can decrease the latency of a combinational
    multiplier with carry-look ahead adding methods.
  • Small amount of extra hardware needed, worth it
    if multiplier has largest latency.

20
Other Multiplier Topologies
  • Shifting multiplication
  • Shift multiplicand several times based on
    multiplier bits
  • Add intermediate shifted values

21
Other Multiplier Topologies
  • Pipelined multiplication
  • Store intermediate sums
  • Allows for faster clock cycle if traditional
    combinational multiplication presents the
    critical path

22
Other Multiplier Topologies
  • Pipelined multiplication
  • Sequential multiplication
  • Useful to minimize hardware waste if
    multiplication is an infrequent operation
  • Continues to allow for faster clock cycle if
    traditional combinational multiplication presents
    the critical path

23
Instruction Set Architecture
R-Type
24
I-Type
J-Type
25
The Assembler
  • Converts assembly code to binary representation

Add 3,1,2 gt 0000000001000100001100000000000
16-bit wide memory modules Split into high and
low bits for output
000000000100010 gt High 0001100000000000 gt Low
26
Assembler Features
  • Allows for labels to be used in loops
  • Automatically calculates offsets based on label
    position
  • LABEL add 1,2,3
  • jmp LABEL
  • Resolves hazards created by pipelining
  • Automatically determines the appropriate number
    of NO-OPS to insert based on relative position of
    consecutive instructions

27
Design allows for pseudo-instructions to be
used Pseudo Instruction HLT Actual
Instructions H1 JMP H1 NOP NOP
28
Topic 2 Design Compiler
  • Bison - Parser
  • A compiler compiler
  • A grammar generator
  • -------------------------
  • Flex Lexer
  • A Fast lexical analyzer
  • Tool used in pattern matching on text

29
Compiling The C Language
  • Interface Lexer and Parser
  • Lex will feed tokens to Bison (YACC)
  • A grammar tree is generated

30
Source code to run-time
31
A simple program
  • A simple C program
  • void main ( void )
  • int b
  • int d
  • int x
  • int y 3
  • int g
  • x b d
  • g y x
  • Assembly Code Equivalent
  • lwi 4, 0, 3
  • add 6, 1, 2
  • sw 3, 6, 0
  • add 6, 4, 3
  • sw 5, 6, 0
  • Machine Code Instructions
  • Memory High
  • 0 0000110000000100
  • 1 0000000000100010
  • 2 0000000000000000
  • 3 0000000000000000
  • 4 0000000000000000
  • 5 0000000000000000
  • 6 0000000000000000
  • 7 0000100011000011
  • 8 0000000000000000
  • 9 0000000000000000
  • 10 0000000000000000
  • 11 0000000010000011
  • 12 0000000000000000
  • 13 0000000000000000
  • 14 0000000000000000
  • 15 0000000000000000
  • 16 0000000000000000
  • 17 0000100011000101
  • Memory Low
  • 0 0000000000000011
  • 1 0011000000000000
  • 2 0000000000000000
  • 3 0000000000000000
  • 4 0000000000000000
  • 5 0000000000000000
  • 6 0000000000000000
  • 7 0000000000000000
  • 8 0000000000000000
  • 9 0000000000000000
  • 10 0000000000000000
  • 11 0011000000000000
  • 12 0000000000000000
  • 13 0000000000000000
  • 14 0000000000000000
  • 15 0000000000000000
  • 16 0000000000000000
  • 17 0000000000000000

32
Could Use a Little Work
  • Currently the Processor could use a little work
    to improve performance.
  • Decreased memory latency would be largest and
    most direct improvement to processor.
  • Must optimize ALU as well as multiplier unit.
  • All in all, will work but not ready for
    commercial usage.

33
References
Computer Organization and Design The Hardware
Software Interface (2nd Ed) Patterson, David A.
and Hennessy, John L. Morgan Kaufman
Publishers, 1997 Introduction to
Compilers http//cs.wwc.edu/aabyan/221_2/PLBOOK/
Translation.html Aaby, Anthony A., 1998 The
Compiler Design Handbook Srikant, Y. N. and
Shankar, Priti CRC Press, 2002
34
THE END
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com