Lecture 18: Core Design, Parallel Algos - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 18: Core Design, Parallel Algos

Description:

Lecture 18: Core Design, Parallel Algos Today: Innovations for ILP, TLP, power and parallel algos Sign up for class presentations * * * * * * * * * * * * SMT Pipeline ... – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 26

Provided by: RajeevBalas163

Learn more at: https://my.eng.utah.edu

Category:

Tags: algos | core | design | lecture | parallel | power | vlsi

Transcript and Presenter's Notes

Title: Lecture 18: Core Design, Parallel Algos

1
Lecture 18 Core Design, Parallel Algos

Today Innovations for ILP, TLP, power and
parallel algos
Sign up for class presentations

2
SMT Pipeline Structure
Private/ Shared Front-end
I-Cache
Bpred
Front End
Front End
Front End
Front End
Private Front-end
Rename
ROB
Execution Engine
Regs
IQ
Shared Exec Engine
FUs
DCache
SMT maximizes utilization of shared execution
engine
3
SMT Fetch Policy

Fetch policy has a major impact on throughput
depends
on cache/bpred miss rates, dependences, etc.
Commonly used policy ICOUNT every thread has
an
equal share of resources
faster threads will fetch more often improves
thruput
slow threads with dependences will not hoard
resources
low probability of fetching wrong-path
instructions
higher fairness

4
Area Effect of Multi-Threading

The curve is linear for a while
Multi-threading adds a 5-8 area overhead per
thread (primary
caches are included in the baseline)

From Davis et al., PACT 2005
5
Single Core IPC
4 bars correspond to 4 different L2 sizes
IPC range for different L1 sizes
6
Maximal Aggregate IPCs
7
Power/Energy Basics

Energy Power x time
Power Dynamic power Leakage power
Dynamic Power a C V2 f
a switching activity factor
C capacitances being charged
V voltage swing
f processor frequency

8
Guidelines

Dynamic frequency scaling (DFS) can impact
power, but
has little impact on energy
Optimizing a single structure for power/energy
is good
for overall energy only if execution time is
not increased
A good metric for comparison ED (because DVFS
is an
alternative way to play with the E-D trade-off)
Clock gating is commonly used to reduce dynamic
energy,
DFS is very cheap (few cycles), DVFS and power
gating
are more expensive (micro-seconds or tens of
cycles,
fewer margins, higher error rates)

2
9
Criticality Metrics

Criticality has many applications performance
and
power usually, more useful for power
optimizations
QOLD instructions that are the oldest in the
issueq are considered critical
can be extended to oldest-N
does not need a predictor
young instrs are possibly on mispredicted paths
young instruction latencies can be tolerated
older instrs are possibly holding up the window
older instructions have more dependents in
the pipeline than younger instrs

10
Other Criticality Metrics

QOLDDEP Producing instructions for oldest in q
ALOLD Oldest instr in ROB
FREED-N Instr completion frees up at least N
dependent instrs
Wake-Up Instr completion triggers a chain of
wake-up operations
Instruction types cache misses, branch mpreds,
and instructions that feed them

11
Parallel Algorithms Processor Model

High communication latencies ? pursue
coarse-grain
parallelism (the focus of the course so far)
Next, focus on fine-grain parallelism
VLSI improvements ? enough transistors to
accommodate
numerous processing units on a chip and
(relatively) low
communication latencies
Consider a special-purpose processor with
thousands of
processing units, each with small-bit ALUs and
limited
register storage

12
Sorting on a Linear Array

Each processor has bidirectional links to its
neighbors
All processors share a single clock
(asynchronous designs
will require minor modifications)
At each clock, processors receive inputs from
neighbors,
perform computations, generate output for
neighbors, and
update local storage

input
output
13
Control at Each Processor

Each processor stores the minimum number it has
seen
Initial value in storage and on network is ,
which is
bigger than any input and also means no
signal
On receiving number Y from left neighbor, the
processor
keeps the smaller of Y and current storage Z,
and passes
the larger to the right neighbor

14
Sorting Example
15
Result Output

The output process begins when a processor
receives
a non-, followed by a
Each processor forwards its storage to its left
neighbor
and subsequent data it receives from right
neighbors
How many steps does it take to sort N numbers?
What is the speedup and efficiency?

16
Output Example
17
Bit Model

The bit model affords a more precise measure of
complexity we will now assume that each
processor
can only operate on a bit at a time
To compare N k-bit words, you may now need an N
x k
2-d array of bit processors

18
Comparison Strategies

Strategy 1 Bits travel horizontally, keep/swap
signals
travel vertically after at most 2k steps,
each processor
knows which number must be moved to the right
2kN
steps in the worst case
Strategy 2 Use a tree to communicate
information on
which number is greater after 2logk steps,
each processor
knows which number must be moved to the right
2Nlogk
steps
Can we do better?

19
Strategy 2 Column of Trees
20
Pipelined Comparison
Input numbers 3 4 2
0 1 0 1
0 1 1 0 0
21
Complexity

How long does it take to sort N k-bit numbers?
(2N 1) (k 1) N (for output)
(With a 2d array of processors) Can we do even
better?
How do we prove optimality?

22
Lower Bounds

Input/Output bandwidth Nk bits are being
input/output
with k pins requires W(N) time
Diameter the comparison at processor (1,1)
influences
the value of the bit stored at processor (N,k)
for
example, N-1 numbers are 011..1 and the last
number is
either 000 or 100 it takes at least Nk-2
steps for
information to travel across the diameter
Bisection width if processors in one half
require the
results computed by the other half, the
bisection bandwidth
imposes a minimum completion time

23
Counter Example

N 1-bit numbers that need to be sorted with a
binary tree
Since bisection bandwidth is 2 and each number
may be
in the wrong half, will any algorithm take at
least N/2 steps?

24
Counting Algorithm

It takes O(logN) time for each intermediate node
to add
the contents in the subtree and forward the
result to the
parent, one bit at a time
After the root has computed the number of 1s,
this
number is communicated to the leaves the
leaves
accordingly set their output to 0 or 1
Each half only needs to know the number of 1s
in the
other half (logN-1 bits) therefore, the
algorithm takes
W(logN) time
Careful when estimating lower bounds!

25
Title

Bullet

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Introduction to CMOS VLSI Design Lecture 12: Datapath Functional Units PowerPoint PPT Presentation

Introduction to CMOS VLSI Design Lecture 12: Datapath Functional Units - CMOS VLSI Design. Equality Comparator. Check if each bit is equal (XNOR, aka equality gate) ... CMOS VLSI Design. Funnel Shifter. A funnel shifter can do all ... | PowerPoint PPT presentation | free to view

332:578 Deep Submicron VLSI Design Lecture 9 VLSI Economics PowerPoint PPT Presentation

332:578 Deep Submicron VLSI Design Lecture 9 VLSI Economics - 332:578 Deep Submicron VLSI Design Lecture 9 VLSI Economics | PowerPoint PPT presentation | free to view

LECTURE 2: DSP Architectures PowerPoint PPT Presentation

LECTURE 2: DSP Architectures - EECS 318 CAD. Computer Aided Design. LECTURE 2: DSP Architectures. Instructor: Francis G. Wolff ... desirable to call every modern computer a parallel computer. ... | PowerPoint PPT presentation | free to view

CE562 Lecture 20 Local Roads and Streets PowerPoint PPT Presentation

CE562 Lecture 20 Local Roads and Streets - 4. U.S. Department of Transportation, Federal Highway Administration. ... A Guide for Transportation Landscape and Environmental Design, Washington, D.C. ... | PowerPoint PPT presentation | free to view

Learning design as a foundation for the future success of e-learning PowerPoint PPT Presentation

Learning design as a foundation for the future success of e-learning - Teacher/Lecturer. Learner. Other learner(s) Concepts. What does it take to learn (formal learning) ... Teacher/Lecturer. Learner. Other learner(s) Task goal ... | PowerPoint PPT presentation | free to view

Introduction to CMOS VLSI Design Lecture 1: Circuits & Layout PowerPoint PPT Presentation

Introduction to CMOS VLSI Design Lecture 1: Circuits & Layout - CMOS VLSI Design Lecture 1: Circuits ... 53% compound annual growth rate over 45 years No other technology has grown so fast so long Driven by miniaturization of ... | PowerPoint PPT presentation | free to view

332:479 Concepts in VLSI Design Lecture 12 Circuit Families PowerPoint PPT Presentation

332:479 Concepts in VLSI Design Lecture 12 Circuit Families - 332:479 Concepts in VLSI Design Lecture 12 Circuit Families David Harris and Mike Bushnell ... use pull-up transistor that is always ON In CMOS, ... | PowerPoint PPT presentation | free to view

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. PowerPoint PPT Presentation

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. - Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. | PowerPoint PPT presentation | free to view

CMPE 49B Sp. Top. in CMPE: Multi-Core Programming PowerPoint PPT Presentation

CMPE 49B Sp. Top. in CMPE: Multi-Core Programming - Title: CMPE 49B Sp. Top. in CMPE: Multi-Core Programming Author: Can Ozturan Last modified by: a Created Date: 3/6/1998 1:25:44 AM Document presentation format | PowerPoint PPT presentation | free to view

CS 267 Applications of Parallel Computers Lecture 4: More about Shared Memory Processors and Programming PowerPoint PPT Presentation

CS 267 Applications of Parallel Computers Lecture 4: More about Shared Memory Processors and Programming - CS 267 Applications of Parallel Computers Lecture 4: More about Shared Memory Processors and Programming Jim Demmel http://www.cs.berkeley.edu/~demmel/cs267_Spr99 ... | PowerPoint PPT presentation | free to view

Object Oriented Analysis and Design - Lecture 5 PowerPoint PPT Presentation

Object Oriented Analysis and Design - Lecture 5 - Title: Object Oriented Analysis and Design - Lecture 5 Author: KIVANCD Description: RLSD 51011 KD Last modified by: KIVANCD Created Date: 9/12/2005 3:51:44 AM | PowerPoint PPT presentation | free to view

Lecture 1: Introduction to High Performance Computing PowerPoint PPT Presentation

Lecture 1: Introduction to High Performance Computing - Title: CSE 574 Parallel Processing Author: ICS Faculty User Last modified by: Esin Onbasioglu Created Date: 7/12/2005 12:19:29 PM Document presentation format | PowerPoint PPT presentation | free to view

VLSI Design Challenges for Gigascale Integration PowerPoint PPT Presentation

VLSI Design Challenges for Gigascale Integration - for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005 Outline Technology scaling challenges Circuit and design solutions Microarchitecture advances ... | PowerPoint PPT presentation | free to view

CS 152 Computer Architecture and Engineering Lecture 25: The Final Chapter PowerPoint PPT Presentation

CS 152 Computer Architecture and Engineering Lecture 25: The Final Chapter - Computer Architecture and Engineering Lecture 25: The Final Chapter Dec 5, 1995 Dave Patterson (patterson@cs) lecture s: http://www-inst.eecs.berkeley.edu/~cs152/ | PowerPoint PPT presentation | free to view

Lecture 2. The core Java language PowerPoint PPT Presentation

Lecture 2. The core Java language - Programming for Geographical Information Analysis: Core Skills Lecture 2: Storing data | PowerPoint PPT presentation | free to view

Welcome to EECS 150: Components and Design Techniques for Digital Systems PowerPoint PPT Presentation

Welcome to EECS 150: Components and Design Techniques for Digital Systems - What is logic design? What is digital hardware? What will we be doing in this class? Quick Review Class administration, overview of course web, and logistics ... | PowerPoint PPT presentation | free to view

Lecture Outlines Natural Disasters, 5th edition PowerPoint PPT Presentation

Lecture Outlines Natural Disasters, 5th edition - Title: Lecture Outlines Natural Disasters, 5th edition Author: Christiane Stidham Last modified by: LCPS Created Date: 1/21/2005 10:03:29 PM Document presentation format | PowerPoint PPT presentation | free to view

15-740/18-740 Computer Architecture Lecture 4: Pipelining PowerPoint PPT Presentation

15-740/18-740 Computer Architecture Lecture 4: Pipelining - 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University | PowerPoint PPT presentation | free to view

Parallel and Distributed Models in Evolutionary Computing PowerPoint PPT Presentation

Parallel and Distributed Models in Evolutionary Computing - Parallel and Distributed Models in Evolutionary Computing Motivation Parallelization models Distributed models Neural and Evolutionary Computing - Lecture 10 * | PowerPoint PPT presentation | free to view

An Introduction and Overview of the Parallel Curriculum Model: Promise and Process PowerPoint PPT Presentation

An Introduction and Overview of the Parallel Curriculum Model: Promise and Process - An Introduction and Overview of the Parallel Curriculum Model: Promise and Process. 1. Explanation: Welcome to our online support materials for those of you ... | PowerPoint PPT presentation | free to view

Lecture-3%20Optical%20Microscopy PowerPoint PPT Presentation

Lecture-3%20Optical%20Microscopy - ... fluorescence microscopy is a very powerful tool in Cell ... work in combination with ... Verdana Arial Times New Roman Symbol Wingdings Default Design Lecture ... | PowerPoint PPT presentation | free to view

Design of Component-based Parallel Image Processing Libraries in a GRID environment PowerPoint PPT Presentation

Design of Component-based Parallel Image Processing Libraries in a GRID environment - Design of Component-based Parallel Image Processing Libraries in a GRID environment Proposta di tesi di Dottorato XIX ciclo Candidata: Antonella Galizia | PowerPoint PPT presentation | free to view

Wireless AC Power Transfer using HF Air Core Transformer PowerPoint PPT Presentation

Wireless AC Power Transfer using HF Air Core Transformer - Wireless power transmission project is used to transfer the power from the power source to electrical loads using high frequency resonating air core transformers. | PowerPoint PPT presentation | free to view

7 Essential SEO Tips to Make Your Web Design SEO Friendly in 2019-2020 PowerPoint PPT Presentation

7 Essential SEO Tips to Make Your Web Design SEO Friendly in 2019-2020 - Best effective SEO tips for website design SEO friendly in 2020. SEO is going help to rank your website on the search engine. Website designer should have knowledge for SEO so that website rank on google, yahoo, bing, and another search engine. Read Blog: http://bit.ly/2nuA98T | PowerPoint PPT presentation | free to view

University Lecture Chairs: Preparing For Students to Return PowerPoint PPT Presentation

University Lecture Chairs: Preparing For Students to Return - The end of the academic year is usually one of the busiest times for lecture chair manufacturers as universities across the country want their new lecture theatre seating installed over the Summer months when teaching spaces aren’t in use. | PowerPoint PPT presentation | free to view

Best Web Design Company In New York PowerPoint PPT Presentation

Best Web Design Company In New York - Axtongroup three-stage responsive web design service is resolutely rooted in strategy and research into your brands target market. Equipped with this skill we can then make knowledgeable decisions on layout, utility, content and how they fit into a responsive web design. Each website we build are to receptive web design standards and conforms to W3C convenience guidelines to make sure that users past, present and future can view our clients websites. Not only does the responsive web design services maximize a website’s audience, it also delivers a tailor made experience. If you would like to know more about Axtongroup and how we can help you and your business so Call Us right now at 610-209-3937 or write us at info@axtongroup.com | PowerPoint PPT presentation | free to view

Explain Yoga Poses For Abs And Strengthens Your Core PowerPoint PPT Presentation

Explain Yoga Poses For Abs And Strengthens Your Core - Any exercise that involves using your abdominal and back muscles in a coordinated manner counts as a core exercise. Strengthening your core can also relieve back pain and improve your posture. Yoga exercises reduce the risk of muscle strain, back injury and pain, and allow you to perform ballistic movements that include strength, cardio and flexibility training. | PowerPoint PPT presentation | free to view