Title: The Congestion Manager
1The Congestion Manager
draft-ietf-ecm-cm-01.txt
- Hari Balakrishnan Srinivasan Seshan
- MIT LCS CMU
- http//nms.lcs.mit.edu/
2CM architecture
. . .
HTTP
RTP/RTCP
NNTP
TCP1
UDP
TCP2
SCTP
API
Congestion Manager
IP
- Integrates congestion management across all
applications (transport protocols user-level
apps) - Exposes API for application adaptation,
accommodating ALF applications - This draft sender-only module
3Outline
- Draft overview (tutorial for slackers!)
- Terminology
- System components
- Abstract CM API
- Applications
- Issues for discussion
4Assumptions terminology
- Application Any protocol that uses CM
- Well-behaved application Incorporates
application-level receiver feedback, e.g., TCP
(ACKs), RTP (RTCP RRs), - Stream
- Group of packets with five things in common
src_addr, src_port, dst_addr, dst_port,
ip_proto - Macroflow
- Group of streams sharing same congestion control
and scheduling algorithms (a congestion group)
5Architectural components
API to streams on macroflow
CM
Congestion controller
Scheduler
- CM scope is per-macroflow not on data path
- Congestion controller algorithm MUST be
TCP-friendly (see Floyd document) - Scheduler apportions bandwidth to streams
6Congestion Controller
- One per macroflow
- Addresses two issues
- WHEN can macroflow transmit?
- HOW MUCH data can be transmitted?
- Uses app notifications to manage state
- cm_update() from streams
- cm_notify() from IP output whenever packet sent
- Standard API for scheduler interoperability
- query(), notify(), update()
- A large number of controllers are possible
7Scheduler
- One per macroflow
- Addresses one issue
- WHICH stream on macroflow gets to transmit
- Standard API for congestion controller
interoperability - schedule(), query_share(), notify()
- This does not presume any scheduler
sophistication - A large number of schedulers are possible
8Sharing
- All streams on macroflow share congestion state
- What should granularity of macroflow be?
- Discussed in November 99 IETF
- Default is all streams to given destination
address - Grouping ungrouping API allows this to be
changed by an application program
9Abstract CM API
- State maintenance
- Data transmission
- Application notification
- Querying
- Sharing granularity
10State maintenance
- stream_info is platform-dependent data structure,
containingsrc_addr, src_port, dst_addr,
dst_port, ip_proto - cm_open(stream_info) returns stream ID, sid
- cm_close(sid) SHOULD be called at the end
- cm_mtu(sid) gives path MTU for stream
- Add call for sid---gtstream_info (so non apps can
query too)
11Data transmission
- Two API modes, neither of which buffers data
- Accommodates ALF-oriented applications
- Callback-based
- Application controls WHAT to send at any point in
time
12Callback-based transmission
Application
1. cm_request()
2. cmapp_send() / callback /
CM
- Useful for ALF applications
- TCP too
- On a callback, decide what to send (e.g.,
retransmission), independent of previous requests
13Synchronous transmission
- Applications that transmit off a (periodic) timer
loop - Send callbacks wreck timing structure
- Use a different callback
- First, register rate and RTT thresholds
- cm_setthresh() per stream
- cmapp_update(newrate, newrtt, newrttdev) when
values change - Application adjusts period, packet size, etc.
14Application notification
- Tell CM of successful transmissions and
congestion - cm_update(sid, nrecd, nlost, lossmode, rtt)
- nrecd, nsent since last cm_update call
- lossmode specifies type of congestion as
bit-vector CM_PERSISTENT, CM_TRANSIENT, CM_ECN - Should we define more specifics?
15Notification of transmission
- cm_notify(stream_info, nsent) from IP output
routine - Allows CM to estimate outstanding bytes
- Each cmapp_send() grant has an expiration
- max(RTT, CM_GRANT_TIME)
- If app decides NOT to send on a grant, SHOULD
call cm_notify(stream_info, 0) - CM congestion controller MUST be robust to broken
or crashed apps that forget to do this
16Querying
- cm_query(sid, rate, srtt, rttdev) fills values
- Note CM may not maintain rttdev, so consider
removing this? - Invalid or non-existent estimate signaled by
negative value
17Sharing granularity
- cm_getmacroflow(sid) returns mflow identifier
- cm_setmacroflow(mflow_id, sid) sets macroflow for
a stream - If macroflowid is -1, new macroflow created
- Iteration over flows allows grouping
- Each call overrides previous mflow association
- This API sets grouping, not sharing policy
- Such policy is scheduler-dependent
- Examples include proxy destinations,client
prioritization, etc.
18Example applications
- TCP/CM
- Like RFC 2140, TCP-INT, TCP sessions
- Congestion-controlled UDP
- Real-time streaming applications
- Synchronous API, esp. for audio
- HTTP server
- Uses TCP/CM for concurrent connections
- cm_query() to pick content formats
19Linux implementation
App stream
cmapp_()
Stream requests, updates
libcm.a
User-level library implements API
Control socket for callbacks
System calls (e.g., ioctl)
UDP-CC
CM macroflows, kernel API
TCP
Congestion controller
Scheduler
ip_output()
ip_output()
cm_notify()
IP
20Server performance
CPU seconds for 200K pkts
cmapp_send()
Buffered UDP-CC
TCP, no delack
TCP/CM, no delack
TCP/CM, w/ delack
TCP, w/ delack
Packet size (bytes)
21Security issues
- Incorrect reports of losses or congestion
absence of reports when theres congestion - Malicious application can wreck other flows in
macroflow - These are all examples of NOT-well-behaved
applications - RFC 2140 has a list
- Will be incorporated in next revision
- Also, draft-ietf-ipsec-ecn-02.txt has relevant
stuff
22Issues for discussion
- Prioritization to override cwnd limitation
- cm_request(num_packets)
- Request multiple transmissions in a single call
- Reporting variances
- Should all CM-to-app reports include a variance
- Reporting congestion state
- Should we try and define persistent congestion?
- Sharing policy interface
- Scheduler-dependent (many possibilities)
23Overriding cwnd limitations
- Prioritization
- Suppose a TCP loses a packet due to congestion
- Sender calls cm_update()
- This causes CM to cut window
- Now, outstanding exceeds cwnd
- What happens to the retransmission?
- Solution(?)
- Add a priority parameter to cm_request()
- At most one high-priority packet per RTT?
24A more complex cm_request()?
- Issue raised by Joe Touch
- cm_request(num_packets)
- Potential advantage higher performance due to
fewer protection-boundary crossings - Disadvantage makes internals complicated
- Observe that
- Particular implementations MAY batch together
libcm-to-kernel calls, preserving simple app API - Benefits may be small (see graph)
25Reporting variances
- Some CM calls do not include variances, e.g., no
rate-variance reported - There are many ways to calculate variances
- These are perhaps better done by each application
(e.g., by a TCP) - The CM does not need to maintain variances to do
congestion control - In fact, our implementation of CM doesnt even
maintain rttdev...
26Semantics of congestion reports
- CM_PERSISTENT
- Persistent congestion (e.g., TCP timeouts)
- Causes CM to go back into slow start
- CM_TRANSIENT Transient congestion, e.g., three
duplicate ACKs - CM_ECN ECN echoed from receiver
- Should we more precisely define when
CM_PERSISTENT should be reported? - E.g., no feedback for an entire RTT (window)
27Sharing policy
- Sender talking to a proxy receiver
- See, e.g., MUL-TCP
- Client prioritization differentiation
- These are scheduler issues
- Particular schedulers may provide interfaces for
these and more - The scheduler interface specified here is
intentionally simple and minimalist - Vern will talk more about the scheduler
28Future Evolution
- Support for non-well behaved applications
- Likely use of separate headers
- Policy interfaces for sharing
- Handling QoS-enabled paths
- E.g., delay- and loss-based divisions
- Aging of congestion information for idle periods
- Expanded sharing of congestion information
- Within cluster and across macroflows