Extending Amdahls Law in the Multicore Era presentation

About This Presentation

Transcript and Presenter's Notes

Title: Extending Amdahls Law in the Multicore Era

1
Extending Amdahls Law in the Multicore Era

Erlin Yao, Yungang Bao, Guangming Tan and Mingyu
Chen
Institute of Computing Technology, Chinese
Academy of Sciences
yaoerlin_at_gmail.com, baoyg, tgm, cmy_at_ncic.ac.cn

2
A Brief Intro Of ICT, CAS
ICT has developed the Loongson CPU
ICT has built the Fastest HPC in China Dawning
5000, which is 233.5TFlops and rank 10th in
Top500.
3
Outline

I. Background and Related Works
II. Model of Multicore Scalability
III. Symmetrical Multicore Chips
IV. Asymmetrical Multicore Chips
V. Dynamic Multicore Chips
VI. Conclusion and Future Work

4
We are in the Multi-Core Era

Mainstream market has already been dominated by
multicore
Intel 2-core Core Duo, 4-core i7
AMD 2-core Athlon, 4-core Opteron
IBM 2-core POWER6, 9-core Cell
Sun 8-core T1/T2

5
Many-Core is coming

Some processor vendors have announced or released
their manycore processors
Tilera 64-core
Intel 80-core
GPGPU 100x-core

6
Revisiting Amdahls Law in the Multi/Many-Core Era

Assume that a fraction f of a programs execution
time was infinitely parallelizable with no
scheduling overhead, while the remaining
fraction, 1 - f, was totally sequential. Using p
processors to accelerate the parallel fraction.
Fixed-size speedup, the amount of work to be
executed is independent of the number of
processors

7
Implications of Amdahls Law

Despite its simplicity, Amdahls law applies
broadly and gives important insights such as
(i) Attack the common case When f is small,
optimization will have little effect.
(ii) The aspects you ignore also limit speedup
Even if p approaches infinity, speedup is bounded
by 1/(1-f) .

8
Mark Hill et al.s Insights

Hill and Marty apply Amdahls law to multicore
hardware by constructing a cost model for the
number and performance of cores in one chip.
? Obtaining optimal multicore performance
requires further research both in extracting more
parallelism and in making sequential cores
faster.
Woo and Lee have extended Hills work by taking
power and energy into account.

9
Motivation of Our Work

The revised Amdahls Law model provides a better
understanding of multicore scalability.
However, there is little work on theoretical
analysis.
This paper presents our investigations on
theoretical analysis of multicore scalability and
attempts to find the optimal results under
different conditions.

10
Model of Multicore Scalability

We adopt the same cost model on multicore
hardware proposed by Hill and Marty, which
includes two assumptions
First, assume that a multicore chip of given size
and technology generation can contain at most n
base core equivalents (BCE)
Second, assume that the individual core with more
resources (r BCEs) can achieve better sequential
performance.
1 lt perf(r) lt r
The architecture of multicore chips can be
classified into three types
Symmetric
Asymmetric
Dynamic

11
Model-Symmetrical

A symmetric multicore chip requires that all its
cores have the same cost.
Example given 16 BCEs.
r 8 ? 2 cores 8 BCEs/core
r 4 ? 4 cores 4 BCEs/core
Given the resource budget of n BCEs, we have n/r
cores, each with r BCEs. Performance of each core
is perf(r). Then we get

12
Model-Asymmetrical

In an asymmetric multicore chip, several cores
are more powerful than the others.
Example given 16 BCEs
1 four-BCE core and 12 base cores.
1 six-BCE core and 10 base cores.
Given the resource budget of n BCEs, we have
1n-r cores with one larger core (with r BCEs)
and n-r base cores (with 1 BCE each). Then we get

13
Model-Dynamic

A dynamic multicore chip can dynamically combine
up to r cores into one core in order to boost
sequential performance.
In sequential mode, it can execute with
performance of perf(r) when the dynamic
techniques use r BCEs.
In parallel mode, it can obtain performance of n
using all base cores in parallel.
Then, we get

14
Symmetrical Multicore Chips

Fixed n and r, speedup is an increasing function
of f
Fixed f and r, speedup is an increasing function
of n
? Increasing both the parallel fraction (f) and
the number of base core (n) can improve the
speedup of symmetric multicore chip.
For fixed f and n, we have the following theorem

15
Symmetrical Multicore Chips

For any fixed f and c,
if f lt c, the maximum speedup is achieved at r
n.
if f gt c and n is not big, the maximum speedup is
achieved at r 1.
if f gt c and n is big enough, to obtain optimal
multicore performance,
the resources of BCEs should be
dedicated to one core
intended to offer reasonable individual cores
performance.

16
Symmetrical Multicore Chips

If n is big enough, then will the maximum speedup
always be achieved between extremes for any
perf(x) lt x?
Counterexample
(i) perf(x)kx, for any 0ltklt1
(ii) perf(x)xc, for any fltclt1.

17
Asymmetrical Multicore Chips

Similarly, increasing both the parallel fraction
(f) and the number of BCEs (n) can improve the
speedup of asymmetric multicore chip.
For fixed f and n, we have the following theorem

18
Asymmetrical Multicore Chips

If f gtc and n is not big, maximum speedup is
achieved at r 1.
If f ltc and n is not big, maximum speedup is
achieved at r n.
For any fixed f and c, if n is big enough, the
maximum speedup is achieved at 1ltr0ltn.

19
Asymmetrical Multicore Chips

Note that the optimal r0 in Theorem 2 can not be
solved analytically.
r0 is linear with n, and if n is big enough, r0
will approach n to any extent.

20
Asymmetrical Multicore Chips

If n is big enough, will the maximum speedup
always be achieved between extremes for any
perf(x)ltx?
Counterexample
perf(x)kx, for any fltklt1.
For saturated functions,
Like p(x)xc, p(x)kxcmxc, where c, clt1.

21
Asymmetrical Multicore Chips

Based on the simplistic assumptions of Amdahls
law, it makes most sense to devote extra
resources to increase only one cores capability.
In fact we have the following theorem
Although the architecture of asymmetric multicore
chip using one large core and many base cores is
assumed originally for simplicity, it is indeed
the optimal architecture in the sense of speedup.

22
Dynamic Multicore Chips

We should increase both f and n to enhance the
speedup of dynamic multicore chip.
For fixed f and n,
if perf(r) is an increasing function, speedup is
also an increasing function
? the maximum speedup is always achieved at r
n.
? Dynamic multicore chips can offer potential
speedups that are greater and never worse than
symmetric or asymmetric multicore chips with
identical perf(r) functions.
So researchers should continue to investigate
methods that approximate a dynamic multicore chip.

23
Potentials of Maximum Speedups

Recall that in the Amdahls law, even if the
number of processors approaches infinity, the
speedup is bound by1/(1-f) .
The increasing of n can improve the speedup
continuously. Under the assumption of perf(r)
rc, when n approaches infinity, the speedup can
also approach infinity even if the performance
index c is small.

24
Implications and Results

A theoretical analysis of multicore scalability
is investigated, and quantitative conditions are
given to determine how to obtain optimal
multicore performance.
The theorems and corollary provide computer
architects with a better understanding of
multicore design types, enabling them to make
more informed tradeoffs.
However, our precise quantitative results are
suspect because the real world is much more
complex. The model considered here ignores many
important structures.
This theoretical analysis attempts to provide
insights on future work.

25
Future Work

In applications, the parallel fraction f can not
be infinitely parallelizable. The parallel degree
can be less than some constant d or even be
random in some circumstances.
Introducing practical structures, such as memory
hierarchy, shared caches, etc.
More cores might allow more parallelism for
larger problem size. Fixed-time speedup, like the
Gustafsons law, should be considered.

26
Acknowledgements

We would like to thank Professor Mark Hill for
his valuable comments and suggestions.
We also appreciate the help of Dr. Mark Squillant
and the arrangement of the MAMA organizator on
this video presentation.

Extending Amdahls Law in the Multicore Era PowerPoint PPT Presentation