Title: CS551: Queue Management
1CS551 Queue Management
- Christos Papadopoulos
- (http//netweb.usc.edu/cs551/)
2Congestion control vs. resource allocation
- Networks key role is to allocate its
transmission resources to users or applications - Two sides of the same coin
- let network do resource allocation (e.g., VCs)
- difficult to do allocation of distributed
resources - can be wasteful of resources
- let sources send as much data as they want
- recover from congestion when it occurs
- easier to implement, may lose packets
3Connectionless Flows
- How can a connectionless network allocate
anything to a user? - It doesnt know about users or applications
- Flow
- a sequence of packets between same source -
destination pair, following the same route - Flow is visible to routers - it is not a channel,
which is an end-to-end abstraction - Routers may maintain soft-state for a flow
- Flow can be implicitly defined or explicitly
established (similar to VC) - Different from VC in that routing is not fixed
4Taxonomy
- Router-centric v.s. Host-centric
- router-centric address problem from inside
network - routers decide what to forward and what
to drop - A variant not captured in the taxonomy adaptive
routing! - host centric address problem at the edges -
hosts observe network conditions and adjust
behavior - not always a clear separation hosts and routers
may collaborate, e.g., routers advise hosts
5..Taxonomy..
- Reservation-based v.s. Feedback-based
- Reservations hosts ask for resources, network
responds yes/no - implies router-centric allocation
- Feedback hosts send with no reservation, adjust
according to feedback - either router or host centric explicit (e.g.,
ICMP source quench) or implicit (e.g., loss)
feedback
6..Taxonomy
- Window-based v.s. Rate-based
- Both tell sender how much data to transmit
- Window TCP flow/congestion control
- flow control advertised window
- congestion control cwnd
- Rate still an open area of research
- may be logical choice for reservation-based system
7Service Models
- In practice, fewer than eight choices
- Best-effort networks
- Mostly host-centric, feedback, window based
- TCP as an example
- Networks with flexible Quality of Service
- Router-centric, reservation, rate-based
8Queuing Disciplines
- Each router MUST implement some queuing
discipline regardless of what the resource
allocation mechanism is - Queuing allocates bandwidth, buffer space, and
promptness - bandwidth which packets get transmitted
- buffer space which packets get dropped
- promptness when packets get transmitted
9FIFO Queuing
- FIFOfirst-in-first-out (or FCFS
first-come-first-serve) - Arriving packets get dropped when queue is full
regardless of flow or importance - implies
drop-tail - Important distinction
- FIFO scheduling discipline (which packet to
serve next) - Drop-tail drop policy (which packet to drop next)
10Dimensions
Scheduling
Single class
Per-connection state
Class-based queuing
FIFO
Drop position
Tail
Head
Random location
Early drop
Overflow drop
11..FIFO
- FIFO drop-tail is the simplest queuing
algorithm - used widely in the Internet
- Leaves responsibility of congestion control to
edges (e.g., TCP) - FIFO lets large user get more data through but
shares congestion with others - does not provide isolation between different
flows - no policing
12Fair Queuing
13Fair Queuing
- Main idea
- maintain a separate queue for each flow currently
flowing through router - router services queues in Round-Robin fashion
- Changes interaction between packets from
different flows - Provides isolation between flows
- Ill-behaved flows cannot starve well-behaved
flows - Allocates buffer space and bandwidth fairly
14FQ Illustration
Flow 1
Flow 2
I/P
O/P
Flow n
Variation Weighted Fair Queuing (WFQ)
15Some Issues
- What constitutes a user?
- Several granularities at which one can express
flows - For now, assume at the granularity of
source-destination pair, but this assumption is
not critical - Packets are of different length
- Source sending longer packets can still grab more
than their share of resources - We really need bit-by-bit round-robin
- Fair Queuing simulates bit-by-bit RR
- not feasible to interleave bits!
16Bit-by-bit RR
- Router maintains local clock
- Single flow suppose clock ticks when a bit is
transmitted. For packet i - Pi length, Ai arrival time, Si begin transmit
time, Fi finish transmit time. Fi SiPi - Fi max (Fi-1, Ai) Pi
- Multiple flows clock ticks when a bit from all
active flows is transmitted
17Fair Queuing
- While we cannot actually perform bit-by-bit
interleaving, can compute (for each packet) Fi.
Then, use Fi to schedule packets - Transmit earliest Fi first
- Still not completely fair
- But difference now bounded by the size of the
largest packet - Compare with previous approach
18Fair Queuing Example
Cannot preempt packet currently being transmitted
19Delay Allocation
- Aim give less delay to those using less than
their fair share - Advance finish times for sources whose queues
drain temporarily - Bi Pi max (Fi-1, Ai - d)
- Schedule earliest Bi first
20Allocate Promptness
- Bi Pi max (Fi-1, Ai - d)
- d gives added promptness
- if Ai lt Fi-1, conversation is active and d does
not affect it Fi Pi Fi-1 - if Ai gt Fi-1, conversation is inactive and d
determines how much history to take into account
21Notes on FQ
- FQ is a scheduling policy, not a drop policy
- Still achieves statistical muxing - one flow can
fill entire pipe if no contenders FQ is work
conserving - WFQ is a possible variation need to learn about
weights off line. Default is one bit per flow,
but sending more bits is possible
22More Notes on FQ
- Router does not send explicit feedback to source
- still needs e2e congestion control - FQ isolates ill-behaved users by forcing users to
share overload with themselves - user flow, transport protocol, etc
- Optimal behavior at source is to keep one packet
in the queue - But, maintaining per flow state can be expensive
- Flow aggregation is a possibility
23Congestion Avoidance
- TCPs approach is reactive
- detect congestion after it happens
- increase load trying to maximize utilization
until loss occurs - TCP has a congestion avoidance phase, but thats
different from what were talking about here - Alternatively, we can be proactive
- we can try to predict congestion and reduce rate
before loss occurs - this is called congestion avoidance
24Router Congestion Notification
- Routers well-positioned to detect congestion
- Router has unified view of queuing behavior
- Routers can distinguish between propagation and
persistent queuing delays - Routers can decide on transient congestion, based
on workload - Hosts themselves are limited in their ability to
infer these from perceived behavior
25Router Mechanisms
- Congestion notification
- the DEC-bit scheme
- explicit congestion feedback to the source
- Random Early Detection (RED)
- implicit congestion feedback to the source
- well suited for TCP
26Design Choices for Feedback
- What kind of feedback
- Separate packets (source quench)
- Mark packets, receiver propagates marks in ACKs
- When to generate feedback
- Based on router utilization
- You can be near 100 utilization without seeing a
throughput degradation - Queue lengths
- But what queue lengths (instantaneous, average)?
27A Binary Feedback Scheme for Congestion Control
in Computer Networks (DEC-bit)
28The Dec-bit Scheme
29The Dec-bit Scheme
- Basic ideas
- on congestion, router sets a bit (CI) bit on
packet - receiver relays bit to sender in acknowledgements
- sender uses feedback to adjust sending rate
- Key design questions
- Router Feedback policy (how and when does a
router generate feedback) - Source Signal filtering (how does the sender
respond?)
30Why Queue Lengths?
- It is desirable to implement FIFO
- Fast implementations possible
- Shares delay among connections
- Gives low delay during bursts
- FIFO queue length is then a natural choice for
detecting the onset of congestion
31The Use of Hysteresis
- If we use queue lengths, at what queue lengths
should we generate feedback? - Threshold or hysteresis?
- Surprisingly, simulations showed that if you want
to increase power - Use no hysteresis
- Use average queue length threshold of 1
- Maximizes power function
Power throughput/delay
32Computing Average Queue Lengths
- Possibilities
- Instantaneous
- Premature, unfair
- Averaged over a fixed time window, or exponential
average - Can be unfair if time window different from
round-trip time - Solution
- Adaptive queue length estimation busy/idle
cycles - But need to account for long current busy periods
33Sender Behavior
- How often should the source change window?
- In response to what received information should
it change its window? - By how much should the source change its window?
- We already know the answer to this AIMD
- DEC-bit scheme uses a multiplicative factor of
0.875
34How Often to Change Window?
- Not on every ACK received
- Window size would oscillate dramatically because
it takes time for a window changes effects to be
felt - If window changes to W, it takes (W1) packets
for feedback about that window to be received - Correct policy wait for (WW) acks
- Where W is window size before update and W is
size after update
35Using Received Information
- Use the CI bits from W acks in order to decide
whether congestion still persists - Clearly, if some fraction of bits are set, then
congestion exists - What fraction?
- Depends on the policy to set the threshold
- When queue size threshold is 1, cutoff fraction
should be 0.5 - This has the nice property that the resulting
power is relatively insensitive to this choice
36Changing the Senders Window
- Sender policy
- monitor packets within a window
- make change if more than 50 of packets had CI
set - if lt 50 had CI set, then increase window by 1
- else new window window 0.875
- additive increase, multiplicative decrease for
stability
37Dec-bit Evaluation
- Relatively easy to implement
- No per-connection state
- Stable
- Assumes cooperative sources
- Conservative window increase policy
- Some analytical intuition to guide design
- Most design parameters determined by extensive
simulation
38Random Early Detection (RED)
39Random Early Detection (RED)
- Motivation
- high bandwidth-delay flows have large queues to
accommodate transient congestion - TCP detects congestion from loss - after queues
have built up and increase delay - Aim
- keep throughput high and delay low
- accommodate bursts
40Why Active Queue Management? (Rfc2309)
- Lock-out problem
- drop-tail allows a few flows to monopolize the
queue space, locking out other flows (due to
synchronization) - Full queues problem
- drop tail maintains full or nearly-full queues
during congestion but queue limits should
reflect the size of bursts we want to absorb, not
steady-state queuing
41Other Options
- Random drop
- packet arriving when queue is full causes some
random packet to be dropped - Drop front
- on full queue, drop packet at head of queue
- Random drop and drop front solve the lock-out
problem but not the full-queues problem
42Solving the Full Queues Problem
- Drop packets before queue becomes full (early
drop) - Intuition notify senders of incipient congestion
- example early random drop (ERD)
- if qlen gt drop level, drop each new packet with
fixed probability p - does not control misbehaving users
43Differences With Dec-bit
- Random marking/dropping of packets
- Exponentially weighted queue lengths
- Senders react to single packet
- Rationale
- Exponential weighting better for high bandwidth
connections - No bias when weighting interval different from
round-trip time, since packets are marked
randomly - Random marking avoids bias against bursty traffic
44RED Goals
- Detect incipient congestion, allow bursts
- Keep power (throughput/delay) high
- keep average queue size low
- assume hosts respond to lost packets
- Avoid window synchronization
- randomly mark packets
- Avoid bias against bursty traffic
- Some protection against ill-behaved users
45RED Operation
Min thresh
Max thresh
Average queue length
P(drop)
1.0
MaxP
Avg length
minthresh
maxthresh
46Queue Estimation
- Standard EWMA avg - (1-wq) avg wqqlen
- Upper bound on wq depends on minth
- want to set wq to allow a certain burst size
- Lower bound on wq to detect congestion relatively
quickly
47Thresholds
- minth determined by the utilization requirement
- Needs to be high for fairly bursty traffic
- maxth set to twice minth
- Rule of thumb
- Difference must be larger than queue size
increase in one RTT - Bandwidth dependence
48Packet Marking
- Marking probability based on queue length
- Pb maxp(avg - minth) / (maxth - minth)
- Just marking based on Pb can lead to clustered
marking -gt global synchronization - Better to bias Pb by history of unmarked packets
- Pb Pb/(1 - countPb)
49RED Algorithm
50RED Variants
- FRED Fair Random Early Drop (Sigcomm, 1997)
- maintain per flow state only for active flows
(ones having packets in the buffer) - CHOKe (choose and keep/kill) (Infocom 2000)
- compare new packet with random pkt in queue
- if from same flow, drop both
- if not, use RED to decide fate of new packet
51Extending RED for Flow Isolation
- Problem what to do with non-cooperative flows?
- Fair queuing achieves isolation using per-flow
state - expensive at backbone routers - Pricing can have a similar effect
- But needs much infrastructure to be developed
- How can we isolate unresponsive flows without
per-flow state?
52Red Penalty Box
- With RED, monitor history for packet drops,
identify flows that use disproportionate
bandwidth - Isolate and punish those flows
53Flows That Must Be Regulated
- Unresponsive
- fail to reduce load in response to increased loss
- Not TCP friendly
- long-term usage exceeds that of TCP under same
conditions - Using disproportionate bandwidth
- use disproportionately more bandwidth than other
flows during congestion - Assumptions
- We can monitor a flows arrival rate
54Identifying Flows to Regulate
- Not TCP friendly use TCP model
- TCP tput (1.5sqrt(0.66B)) / (RTTsqrt(p))
- B packet size in bytes, p packet drop rate
- Better approximation in Padhye et al. paper
- Problems
- Needs bounds on packet sizes and RTTs
- Unresponsive
- if drop rate increases by x then arrival rate
should decrease by a factor of sqrt(x)
55..Flows to Regulate
- Flows using disproportionate bandwidth
- assume additive increase, multiplicative decrease
only flows - assume cwin W at loss
- can be shown that loss prob lt 8/(3W2)
- for segment size B
- tput lt 0.75WB/RTT