Title: More Intel machine language
1More Intel machine language
- and one more look at other architectures
2Data manipulation instructions
- Includes add, sub, cmp, and, or, not
- First operand is a register second may require
memory fetch - Takes 8-17 clock cycles, as described on next
slide
3Data manipulation instructions
- Fetch instruction (1)
- Update IP (1)
- Decode (1)
- If required fetch operand from memory
- if BX mode (0)
- if xxxx, xxxx BX or xxxx address is even
(1) - if xxxx address odd (2)
- If required, update IP to point beyond operand
(0-1)
4Data manipulation instructions
- Compute address of operand
- if not BX or xxxxBX (0)
- if BX (1)
- if xxxxBX (2)
- Get value of operand send to ALU
- if constant (0)
- if register (1)
- if word-aligned RAM (2)
- if odd-addressed RAM (3)
5Data manipulation instructions
- Fetch value of first operand (register) send to
ALU (1) - Perform operation (1)
- Store result in 1st operand (register) (1)
6Data movement operation with RAM destination
- Takes 5-11 clock cycles
- As with most instructions, the variation is due
to the number of memory fetches that may be
required during execution
7MOV memory location, register
- Fetch instruction (1)
- Update IP to point to next byte (1)
- Decode instruction
- If required, fetch operand from memory (0-2)
- If required, update IP to point beyond operand
(0-1 0 if no operand) - Compute opened address, if necessary (0-2)
- Get value (of register) to store (1)
- Store fetched value into destination (1-3)
8Fetch/execute cycle pipelining
- In the examples weve looked at this far, an
underlying theme has been the use of one or more
clock cycles per instruction, with additional
cycles necessary to control details within
certain steps - Modern CPUs break the fetch/execute cycle into
smaller steps, some of which can be performed in
parallel, speeding up execution - This method of overlapping instructions is called
pipelining
9Pipelining
- We can break the fetch/execute cycle into 6
general steps - Fetch instruction
- Decode
- Calculate operand address(es)
- Fetch operands
- Execute instruction
- Store result
- Each step can be considered a pipeline stage
goal is to balance time taken by each stage, so
that slower ports of process dont bog down
faster parts
10Standard von Neumann model vs. pipelining
Source http//www.cs.cmu.edu/afs/cs/academic/cla
ss/15745-s06/web/handouts/11.pdf
11Pipelining issues
- Although not all instructions require every stage
of pipeline (e.g. no operand) all instructions
proceed through all stages - Pipeline conflicts
- resource conflicts
- data dependencies
- conditional branch statements
12Intel pipelining
- 8086-80486 were single-stage pipeline
architectures - Pentium 2 five-stage pipelines
- Pentium II increased to 12 (mostly for MMX)
- Pentium III 14
- Pentium IV 24
13MIPS a RISC architecture
- Little-endian
- Word-addressable
- Fixed-length instructions
- Load-store architecture
- only LOAD store operations have RAM access
- all other instructions must have register
operands - requires large register set
- 5 or 8 stage pipelining
14One more architecture the Java Virtual Machine
- Java compiler is platform-independent makes no
assumptions about characteristics of underlying
hardware - JVM required to run Java byte code
- Works as a wrapper around a real machines
architecture so the JVM itself is extremely
platform dependent
15How it works
- Java compiler translates source code into JBC
- JVM acts as interpreter - translates specific
byte codes into machine instructions specific to
the harbor platform its running on - Acts like giant switch/case structure each
bytecode instruction triggers jump to a specific
block of code that implements the instruction in
the architectures native machine language
16Characteristics of JVM and JBC
- Stack-based language machine
- Instructions consist of one-byte opcode followed
by 0 or more operands - 4 registers
17Characteristics of JVM and JBC
- All memory references based on register offsets -
neither pointers nor absolute addresses are used - No general-purpose registers
- means more memory fetches, detrimental to
performance - tradeoff is high degree of portability