Measuring and Modeling Hyper-threaded Processor Performance - PowerPoint PPT Presentation

About This Presentation
Title:

Measuring and Modeling Hyper-threaded Processor Performance

Description:

Batch model hyperthreading experiments. threads. throughput. vicksburg - job size scaled to processor speed. dell - hyperthreading off, seed 111, scaled job size. – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 24
Provided by: EthanB9
Learn more at: https://www.cs.umb.edu
Category:

less

Transcript and Presenter's Notes

Title: Measuring and Modeling Hyper-threaded Processor Performance


1
Measuring and Modeling Hyper-threaded Processor
Performance
  • Ethan Bolker
  • UMass-Boston
  • September 17, 2003

2
  • Joint work with Yiping Ding, Arjun Kumar (BMC
    Software)
  • Accepted for presentation at CMG32, December 2003
  • Paper (with references) available on request

3
Improving Processor Performance
  • Speed up clock
  • Invent revolutionary new architecture
  • Replicate processors (parallel application)
  • Remove bottlenecks (use idle ALU)
  • caches
  • pipelining
  • prefetch

4
Hyper-threading Technology (HTT) Default for new
Intel high end chips
  • One ALU
  • Duplicate state of computation (registers) to
    create two logical processors (chip
    size 1.05)
  • Parallel instruction preparation (decode)
  • ALU should see ready work more often
  • (provided there are two active threads)

5
The path to instruction execution
Intel Technology Journal, Volume 06 Issue 01,
February 14, 2002, p8
6
How little must we understand?
  • Treat processor as a black box
  • Experiment to observe behavior
  • Model to predict behavior
  • Batch workload repeated dispatch of identical
    compute intensive jobs
  • vary number of threads
  • measure throughput (jobs/second)

7
Batch throughput
8
Transaction processing
  • More interesting than batch
  • Random size jobs arrive at random times
  • M/M/1
  • M Markov
  • M// arrival stream is Poisson, rate ?
  • /M/ job size exponentially distributed, mean
    s
  • //1 single processor

9
M/M/1 model evaluation
  • Utilization U ?s
  • U is dimensionless jobs/sec sec/job
  • U lt 1 else saturation
  • Response time r s/(1-U)
  • randomness ? each job sees (virtual) processor
    slowed down (by other jobs) by factor 1/(1-U), so
    to accumulate s seconds of real work takes r
    s/(1-U) seconds of real time

10
Benchmark
  • Java driver
  • chooses interarrival times and service times from
    exponential distributions,
  • dispatches each job in its own thread,
  • records actual job CPU usage, response time
  • Input parameters
  • job arrival rate ?
  • mean job service time s
  • Fix s 1 second, vary ? (hence U), track r

11
Benchmark validation
12
Theory vs practice
  • In theory, there is no difference between theory
    and practice. In practice, there is no
    relationship between theory and practice.
    Grant Gainey
  • The gap between theory and practice in practice
    is much larger than the gap between theory and
    practice in theory. Jeff Case

13
Explain/remove discrepancy
  • Examine, tune benchmark driver
  • Compute actual coefficients of variation,
    incorporate in corrected M/M/1 formula
  • Nothing helps
  • Postpone worry in the meanwhile

14
HTT on vs HTT off
  • Use this benchmark to measure the effect of
    hyper-threading on response time
  • Use throughput (?) as the independent variable
  • Utilization is ambiguous (digression)

15
HTT on vs HTT off
16
Whats happening
  • Hyper-threading allows more of the application
    parallelism to make its way to the ALU
  • Can we understand this quantitatively?

17
Model HTT architecture
18
Theory vs practice
s1 0.13 s2 0.81
19
Model parameters
  • To compute response time r from model, need
    (virtual) service parameters s1, s2 (? is
    known)
  • Finding s1, s2
  • eyeball measured data
  • fit two data points
  • maximum likelihood
  • derive from first principles
  • s1 0.13, s2 0.81 make sense
    15 of work is preparatory, 85
    execution

20
Benchmark validation (reprise)
  • Chip hardware unchanged when HTT off
  • Assume one path used
  • Tandem queue
  • Parameter estimation as before

?
?
?
?
0
21
Theory vs practice

s1 0.045 s2 0.878
22
Future work
  • Do serious statistics
  • Does 11 tandem queue model predict
    hyper-threading response as well as complex 21
    model?
  • Understand two-processor machine puzzle
  • Explore how s1 and s2 vary with application
    (e.g. fixed vs floating point)
  • Find ways to estimate s1 and s2 from first
    principles

23
Summary
  • Hyper-threading is
  • Abstraction (modelling) leverages information
    you can often understand a lot even when you know
    very little
  • r s/(1-U) is worth remembering
  • You do need to connect theory and practice and
    practice is harder than theory
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com