CH14 Instruction Level Parallelism and Superscalar Processors - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

CH14 Instruction Level Parallelism and Superscalar Processors

Description:

CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 29

Provided by: DrB63

Learn more at: http://www2.latech.edu

Category:

Tags: ch14 | instruction | level | organization | parallelism | processors | superscalar

Transcript and Presenter's Notes

Title: CH14 Instruction Level Parallelism and Superscalar Processors

1
CH14 Instruction Level Parallelism and
Superscalar Processors

Decode and issue more and one instruction at a
time
Executing more than one instruction at a time
More than one Execution Unit

TECH Computer Science
CH01
2
What is Superscalar?

Common instructions (arithmetic, load/store,
conditional branch) can be initiated and executed
independently
Equally applicable to RISC CISC
In practice usually RISC

3
Why Superscalar?

Most operations are on scalar quantities (see
RISC notes)
Improve these operations to get an overall
improvement

4
General Superscalar Organization
5
Superpipelined

Many pipeline stages need less than half a clock
cycle
Double internal clock speed gets two tasks per
external clock cycle
Superscalar allows parallel fetch execute

6
Superscalar vSuperpipeline
7
Limitations

Instruction level parallelism
Compiler based optimisation
Hardware techniques
Limited by
True data dependency
Procedural dependency
Resource conflicts
Output dependency
Antidependency

8
True Data Dependency

ADD r1, r2 (r1 r1r2)
MOVE r3,r1 (r3 r1)
Can fetch and decode second instruction in
parallel with first
Can NOT execute second instruction until first is
finished

9
Procedural Dependency

Can not execute instructions after a branch, in
parallel with, instructions before a branch
Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
This prevents simultaneous fetches

10
Resource Conflict

Two or more instructions requiring access to the
same resource at the same time
e.g. two arithmetic instructions
Can duplicate resources
e.g. have two arithmetic units

11
Dependencies
12
Design Issues

Instruction level parallelism
Instructions in a sequence are independent
Execution can be overlapped
Governed by data and procedural dependency
Machine Parallelism
Ability to take advantage of instruction level
parallelism
Governed by number of parallel pipelines

13
Instruction Issue Policy

Order in which instructions are fetched
Order in which instructions are executed
Order in which instructions change registers and
memory

14
In-Order Issue In-Order Completion

Issue instructions in the order they occur
Not very efficient
May fetch gt1 instruction
Instructions must stall if necessary

15
In-Order Issue In-Order Completion, e.g.
16
In-Order Issue Out-of-Order Completion, e.g.
17
In-Order Issue Out-of-Order Completion

Output dependency
R3 R3 R5 (I11)
R4 R3 1 (I12)
R3 R5 1 (I13)
I12 depends on result of I11 - data dependency
If I13 completes before I11, the result from I1
will be wrong - output (read-write) dependency

18
Out-of-Order IssueOut-of-Order Completion

Decouple decode pipeline from execution pipeline
Can continue to fetch and decode until this
pipeline is full
When a functional unit becomes available an
instruction can be executed
Since instructions have been decoded, processor
can look ahead

19
Out-of-Order Issue Out-of-Order Completion e.g.
20
Antidependency

Write-write dependency
R3R3 R5 (I1)
R4R3 1 (I2)
R3R5 1 (I3)
R7R3 R4 (I4)
I3 can not complete before I2 starts as I2 needs
a value in R3 and I3 changes R3

21
Register Renaming

Output and antidependencies occur because
register contents may not reflect the correct
ordering from the program
May result in a pipeline stall
Registers allocated dynamically
i.e. registers are not specifically named

22
Register Renaming example

R3bR3a R5a (I1)
R4bR3b 1 (I2)
R3cR5a 1 (I3)
R7bR3c R4b (I4)
Without subscript refers to logical register in
instruction
With subscript is hardware register allocated
Note R3a R3b R3c

23
Machine Parallelism

Duplication of Resources
Out of order issue
Renaming
Not worth duplication functions without register
renaming
Need instruction window large enough (more than 8)

24
Branch Prediction

80486 fetches both next sequential instruction
after branch and branch target instruction
Gives two cycle delay if branch taken

25
RISC - Delayed Branch

Calculate result of branch before unusable
instructions pre-fetched
Always execute single instruction immediately
following branch
Keeps pipeline full while fetching new
instruction stream
Not as good for superscalar
Multiple instructions need to execute in delay
slot
Instruction dependence problems
Revert to branch prediction

26
Superscalar Execution
27
Superscalar Implementation

Simultaneously fetch multiple instructions
Logic to determine true dependencies involving
register values
Mechanisms to communicate these values
Mechanisms to initiate multiple instructions in
parallel
Resources for parallel execution of multiple
instructions
Mechanisms for committing process state in
correct order

28
Required Reading

Stallings chapter 13
Manufacturers web sites
IMPACT web site
research on predicated execution

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Chapter 3 General-Purpose Processors: Software PowerPoint PPT Presentation

Chapter 3 General-Purpose Processors: Software - ... software; no processor design ... N-bit processor. N-bit ALU, registers, buses, memory ... than longest register to register delay in entire processor ... | PowerPoint PPT presentation | free to view

LEneS: Task Scheduling for LowEnergy Systems Using Variable Supply Voltage Processors PowerPoint PPT Presentation

LEneS: Task Scheduling for LowEnergy Systems Using Variable Supply Voltage Processors - LEneS: Task Scheduling for Low-Energy Systems Using Variable Supply Voltage Processors ... High parallelism. Low parallelism. Conclusion ... | PowerPoint PPT presentation | free to view

Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine PowerPoint PPT Presentation

Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine - Energy efficient way to express fine-grained parallelism and exploit bandwidth ... Vector instr specify 64 way-parallelism, hardware exec 8-way ... | PowerPoint PPT presentation | free to view

Parallel Processing Architectures PowerPoint PPT Presentation

Parallel Processing Architectures - Parallelism moved to instruction level. Microprocessor performance ... Process Level or Thread level parallelism; mainstream for general purpose computing? ... | PowerPoint PPT presentation | free to view

CS 213: Parallel Processing Architectures PowerPoint PPT Presentation

CS 213: Parallel Processing Architectures - Parallelism moved to instruction level. Microprocessor performance ... Process Level or Thread level parallelism; mainstream for general purpose computing? ... | PowerPoint PPT presentation | free to view

Parallel Systems PowerPoint PPT Presentation

Parallel Systems - Parallelism. physical (a.k.a. true) parallelism. logical (a.k.a. pseudo-) parallelism ... Only the 'bus master' can access memory. Solution ... | PowerPoint PPT presentation | free to view

Current and Future Trends in Processor Architecture Theo Ungerer Dept' of Computer Design and Fault PowerPoint PPT Presentation

Current and Future Trends in Processor Architecture Theo Ungerer Dept' of Computer Design and Fault - ... long instruction word) is the choice for most signal processors. ... two-level adaptive Intel PentiumPro, Pentium II, AMD K6. Hybrid prediction DEC Alpha 21264 ... | PowerPoint PPT presentation | free to view

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I PowerPoint PPT Presentation

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I - Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the s of Prof. David Patterson, University of California, Berkeley * CPE 731, ILP * Outline ... | PowerPoint PPT presentation | free to view

Instruction Level Parallelism (ILP) PowerPoint PPT Presentation

Instruction Level Parallelism (ILP) - Instruction Level Parallelism (ILP) Colin Stevens What is a parallel instruction? ILP is a measure of the number of instructions that can be performed during a single ... | PowerPoint PPT presentation | free to view

The IA-64 architecture and Itanium processors Explicitly Parallel Instruction Computing PowerPoint PPT Presentation

The IA-64 architecture and Itanium processors Explicitly Parallel Instruction Computing - Predicates are used to allow for conditional execution. 6 bits used to address 64 predicate registers The Itanium processor issues 8 ops/clock: ... | PowerPoint PPT presentation | free to view

Instruction Based Memory Distance Analysis and its Application to Optimization PowerPoint PPT Presentation

Instruction Based Memory Distance Analysis and its Application to Optimization - Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang Steve Carr Soner nder Zhenlin Wang Motivation Widening gap between ... | PowerPoint PPT presentation | free to view

Novel Multimedia Instruction Capabilities in VLIW Media Processors PowerPoint PPT Presentation

Novel Multimedia Instruction Capabilities in VLIW Media Processors - Title: Novel Multimedia Instruction Capabilities in VLIW Media Processors Subject: template landscape Author: COS Keywords: PPT landscape plain Last modified by | PowerPoint PPT presentation | free to view

Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes PowerPoint PPT Presentation

Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes - Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and ... | PowerPoint PPT presentation | free to view

CSCE 432/832 High Performance Processor Architectures Superscalar Organization PowerPoint PPT Presentation

CSCE 432/832 High Performance Processor Architectures Superscalar Organization - Superscalar Organization Adopted from Lecture notes based in part on s created by Mikko H. Lipasti, John Shen, Mark Hill, David Wood, Guri Sohi, and Jim Smith | PowerPoint PPT presentation | free to view

Chapter 14 Instruction Level Parallelism and Superscalar Processors PowerPoint PPT Presentation

Chapter 14 Instruction Level Parallelism and Superscalar Processors - Chapter 14 Instruction Level Parallelism and Superscalar Processors HW: 14.1 & 14.5 | PowerPoint PPT presentation | free to view

Instructions: Language of the Computer PowerPoint PPT Presentation

Instructions: Language of the Computer - Chapter 2 Instructions: Language of the Computer Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But ... | PowerPoint PPT presentation | free to view

Operand And Instructions Representation PowerPoint PPT Presentation

Operand And Instructions Representation - Operand And Instructions Representation By Dave Maung Different types of Operand/Instruction Zero Operands per instruction One Operands per instruction Two Operands ... | PowerPoint PPT presentation | free to view

Operand Addressing and Instruction Representation PowerPoint PPT Presentation

Operand Addressing and Instruction Representation - Operand Addressing and Instruction Representation Chapter 6 0, 1, 2, or 3 Address Designs Many processors do not allow an arbitrary number of operands because | PowerPoint PPT presentation | free to view

An Operating System Architecture for Application-Level Management Dawson R. Engler, M. Frans Kaashoek, and James O PowerPoint PPT Presentation

An Operating System Architecture for Application-Level Management Dawson R. Engler, M. Frans Kaashoek, and James O - An Operating System Architecture for Application-Level Management Dawson R. Engler, M. Frans Kaashoek, and James O Toole Jr. M.I.T. Laboratory for Computer Science | PowerPoint PPT presentation | free to view

Superscalar Processors PowerPoint PPT Presentation

Superscalar Processors - Superscalar Processors J. Nelson Amaral Ready Bit (cont.) Upon completion, an instruction broadcasts the name and content of its result physical register to all ... | PowerPoint PPT presentation | free to view

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers PowerPoint PPT Presentation

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers - Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers Jack Sampson*, Rub n Gonz lez , Jean-Francois Collard , Norman P. Jouppi ... | PowerPoint PPT presentation | free to view

Global Communications Processors Market Trend Report PowerPoint PPT Presentation

Global Communications Processors Market Trend Report - This report studies Communications Processors in Global market, especially in North America, Europe, China, Japan, Korea and Taiwan, focuses on top manufacturers in global market, with Production, price, revenue and market share for each manufacturer, covering Advanced Micro Devices Inc. (NYSE: AMD) Applied Micro Circuits Corp. (Nasdaq: AMCC) | PowerPoint PPT presentation | free to view

Global Network Processors Market Application Research Report PowerPoint PPT Presentation

Global Network Processors Market Application Research Report - This report studies Network Processors in Global market, especially in North America, Europe, China, Japan, Korea and Taiwan, focuses on top manufacturers in global market, with Production, price, revenue and market share for each manufacturer, covering Advanced Micro Devices Inc. (NYSE: AMD) Applied Micro Circuits Corp. (Nasdaq: AMCC) Broadcom Ltd., a subsidiary of Avago Technologies Ltd. (Nasdaq: AVGO) | PowerPoint PPT presentation | free to view

Global Processors for Wearables Market: Fitbit-Apple Watch War Expected to Boost Processor Technology PowerPoint PPT Presentation

Global Processors for Wearables Market: Fitbit-Apple Watch War Expected to Boost Processor Technology - Get a sample brochure @ http://tinyurl.com/z5r4xuj The Processors for Wearables Market deals with the development, manufacture and distribution of the latest processor technology for integration with wearable technology in various wearable devices. Wearable electronics like smart watches, fitness trackers, etc., constitute some of the most trending or cool technology available in the market right now. Whether it is the Fitbit or the Apple Watch, wearable technology is capable of complex tasks while being worn on the body. | PowerPoint PPT presentation | free to view

Developing Real-Time Systems on Application Processors PowerPoint PPT Presentation

Developing Real-Time Systems on Application Processors - Guaranteeing real-time and deterministic behavior on SoC-based systems can be challenging. In this blog post, we offer three approaches to add real-time control to systems that use a SoC running a feature-rich OS such as Linux. Source: https://www.toradex.com/blog/developing-real-time-systems-on-application-processors | PowerPoint PPT presentation | free to view

Global And China Multi-Functional Cooking Food Processors Market Research Report 2017 PowerPoint PPT Presentation

Global And China Multi-Functional Cooking Food Processors Market Research Report 2017 - The Global And China Multi-Functional Cooking Food Processors Industry 2017 Market Research Report is a professional and in-depth study on the current state of the Multi-Functional Cooking Food Processors industry. | PowerPoint PPT presentation | free to view

Multi Core Processors Market Is Expected To Reach USD 110.8 Billion By 2025 : PowerPoint PPT Presentation

Multi Core Processors Market Is Expected To Reach USD 110.8 Billion By 2025 : - The global market for multi-core processors was valued at USD 35.4 billion in 2018; it is expected to reach USD 110.8 billion by the end of the forecast period at a CAGR of 17.7%. Report Sample : https://www.marketresearchfuture.com/sample_request/8248 | PowerPoint PPT presentation | free to view