Influence of heavy-tailed distributions on load balancing - PowerPoint PPT Presentation

About This Presentation

Title:

Influence of heavy-tailed distributions on load balancing

Description:

WAN EMU. Geographically- dispersed clients. 10Mbps uplink. 100Mbps ... WAN EMU results. Propagation delay has additive effect. Reduces improvement factor. ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 39

Provided by: rob1129

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Influence of heavy-tailed distributions on load balancing

1
Scheduling Your Network Connections
Mor Harchol-Balter Carnegie Mellon University
Joint work with Bianca Schroeder
2
Q Which minimizes mean response time?
size service requirement
load r lt 1
3
Q Which best represents scheduling in web
servers ?
FCFS
jobs
size service requirement
load r lt 1
jobs
PS
SRPT
jobs
4
IDEA
How about using SRPT instead of PS in web servers?
client 1
Get File 1
WEB SERVER (Apache)
client 2
Internet
Get File 2
Linux 0.S.
client 3
Get File 3
5
Immediate Objections
1) Cant assume known job size
Many servers receive mostly static web
requests.GET FILEFor static web requests,
know file size
Approx. know service requirement of request.
2) But the big jobs will starve ...
6
Outline of Talk
1) Analysis of SRPT Scheduling Investigating
Unfairness 2)
Size-based Scheduling to Improve Web
Performance 3) Web servers under overload
How scheduling can help

www.cs.cmu.edu/harchol/
7
SRPT has a long history ...
1966 Schrage Miller derive M/G/1/SRPT response
time
1968 Schrage proves optimality 1979 Pechinkin
Solovyev Yashkov generalize 1990 Schassberger
derives distribution on queue length
BUT WHAT DOES IT ALL MEAN?
8
SRPT has a long history (cont.)
1990 - 97 7-year long study at Univ. of Aachen
under Schreiber SRPT WINS BIG ON
MEAN! 1998, 1999 Slowdown for SRPT under
adversary Rajmohan, Gehrke,
Muthukrishnan, Rajaraman, Shaheen,
Bender, Chakrabarti, etc. SRPT
STARVES BIG JOBS! Various o.s. books
Silberschatz, Stallings, Tannenbaum
Warn about starvation of big jobs
... Kleinrocks Conservation Law
Preferential treatment given to one class of
customers is afforded at the expense of other
customers.
9
Unfairness Question
Let r0.9. Let G Bounded Pareto(a 1.1,
max1010)
Question Which queue does biggest job prefer?
10
Our Analytical Results (M/G/1)
All-Can-Win Theorem Under workloads with
heavy-tailed (HT) property, ALL jobs, including
the very biggest, prefer SRPT to PS, provided
load not too close to 1. Almost-All-Win-Big
Theorem Under workloads with HT property, 99
of all jobs perform orders of magnitude better
under SRPT.
PS
SRPT
11
Whats Heavy-Tail?
Fraction of jobs with CPU duration gt x
Berkeley Unix process CPU lifetimes HD96
12
Whats the Heavy-Tail property?
Defn heavy-tailed distribution
-
a
lt
lt
gt
a
2
0

,

Pr
x
x
X
Many real-world workloads well-modeled by
truncated HT distribution. Key property HT
Property Largest 1 of jobs comprise half the
load.
13
Our Analytical Results (M/G/1)
All-Can-Win Theorem Under workloads with
heavy-tailed (HT) property, ALL jobs, including
the very biggest, prefer SRPT to PS, provided
load not too close to 1. Almost-All-Win-Big
Theorem Under workloads with HT property, 99
of all jobs perform orders of magnitude better
under SRPT.
PS
SRPT
14
Our Analytical Results (M/G/1)
All-distributions-win-thm If load lt .5, for
every job size distribution, ALL jobs prefer SRPT
to PS. Bounding-the-damage Theorem For any
load, for every job size distribution, for every
size x,
ö
æ
r

ç

lt
x
T
E
x
T
E
)
(

1
)
(

PS
SRPT

ç
)
-
2(1
r
ø
è
15
From theory to practice
What does SRPT mean within a Web server?

Many devices Where to do the scheduling?
No longer one job at a time.

16
Servers Performance Bottleneck
Site buys limited fraction of ISPs bandwidth
client 1
Get File 1
WEB
SERVER
client 2
(Apache)
Rest of Internet
Get File 2
ISP
Linux 0.S.
client 3
Get File 3
5
We model bottleneck by limiting bandwidth on
servers uplink.
17
Network/O.S. insides of traditional Web server
Socket 1
Client1
Network Card
Socket 2
Client2
BOTTLENECK
Socket 3
Client3
Sockets take turns draining --- FAIR PS.
18
Network/O.S. insides of our improved Web server
Socket 1
Client1
S
Network Card
1st
Socket 2
Client2
2nd
M
BOTTLENECK
3rd
Socket 3
Client3
L
priority queues.
Socket corresponding to file with smallest
remaining data gets to feed first.
19
Experimental Setup
Implementation SRPT-based scheduling 1)
Modifications to Linux O.S. 6 priority Levels
2) Modifications to Apache Web server
3) Priority algorithm design.
20
Flash
Experimental Setup
Apache
10Mbps uplink
1
2
WAN EMU
3
100Mbps uplink
APACHE WEB SERVER
1
200
Linux
2
Surge
1
3
2
Trace-based
WAN EMU
3
switch
200
Linux
Open system
Linux 0.S.
1
Partly-open
2
WAN EMU
3
200
WAN EMU
Linux
Geographically- dispersed clients
Trace-based workload Number requests made
1,000,000 Size of file requested 41B -- 2
MB Distribution of file sizes requested has HT
property.
Load lt 1
Transient overload
Other effects initial RTO user abort/reload
persistent connections, etc.
21
Preliminary Comments
1
2
WAN EMU
3
APACHE WEB SERVER
1
200
Linux
2
1
3
2
WAN EMU
3
switch
200
Linux
Linux 0.S.
1
2
WAN EMU
3
200
Linux

Job throughput, byte throughput, and bandwidth
utilization were same under SRPT and FAIR
scheduling.
Same set of requests complete.
No additional CPU overhead under SRPT
scheduling.
Network was bottleneck in all experiments.

22
Results Mean Response Time
.
.
.
Mean Response Time (sec)
FAIR
.
.
SRPT
.
Load
23
Results Mean Slowdown
FAIR
Mean Slowdown
SRPT
Load
24
Mean Response Time vs. Size Percentile
Load 0.8
FAIR
Mean Response time (ms)
SRPT
Percentile of Request Size
25
Summary so far ...

SRPT scheduling yields significant improvements
in Mean Response Time at the server.
Negligible starvation.
No CPU overhead.
No drop in throughput.

26
More questions

So far only showed LAN results.
Are the effects of SRPT in a WAN as strong?
So far only showed load lt 1.
What happens under SRPT vs. FAIR when the
server runs under transient overload?
-gt new analysis
-gt implementation study

27
WAN EMU results
Propagation delay has additive effect. Reduces
improvement factor.
FAIR
SRPT
28
WAN EMU results
Loss has quadratic effect. Reduces improvement
factor a lot.
FAIR
SRPT
29
WAN results
Geographically-dispersed clients
Load 0.9
Load 0.7
30
Overload 5 minute overview
Person under overload
31
Q What happens under overload? A Buildup in
number of connections.
FAIR
SRPT
Q What happens to response time?
32
Web server under overload
When reach SYN-queue limit, server drops all
connection requests.
SYN-queue
Clients
Server
SYN-queue
ACK-queue
Apache-processes
33
Transient Overload
rgt1
rgt1
rgt1
rlt1
rlt1
rgt1
rgt1
rgt1
rlt1
rlt1
rlt1
34
Transient Overload - Baseline Mean response time
SRPT
FAIR
35
Transient overload Response time as function of
job size
FAIR
SRPT
small jobs win big!
big jobs arent hurt!
WHY?
36
FACTORS
Baseline Case
WAN propagation delays
RTT 0 150 ms
WAN loss
Loss 0 15

WAN loss delay
Loss 0 15
RTT 0 150 ms,

Persistent Connections
0 10 requests/conn.
RTO 0.5 sec 3 sec
Initial RTO value
ON/OFF
SYN Cookies
Abort after 3 15 sec, with 2,4,6,8 retries.
User Abort/Reload
Packet Length
Packet length 536 1500 Bytes
Realistic Scenario
RTT 100 ms Loss 5 5 requests/conn., RTO
3 sec pkt len 1500B User aborts After 7 sec
and retries up to 3 times.
37
Transient Overload - Realistic Mean response time
FAIR
SRPT
38
Conclusion

SRPT scheduling is a promising solution for
reducing
mean response time seen by clients,
particularly when the load at server bottleneck
is high.
SRPT results in negligible or zero unfairness to
large requests.
SRPT is easy to implement.
Results corroborated via implementation and
analysis.