Title: Principles in Practice
1Principles in Practice
- CS 685 Network Algorithmics
- Spring 2006
2Scheduling for ATM VC's
- Background
- Asynchronous Transfer Mode virtual-circuit
service - Individual circuits may be flow-controlled for
various reasons - bandwidth allocation
- end-to-end flow control
- At each output port, each v.c. has its own queue
of packets waiting for transmission - A scheduler selects which packet to transmit
next, according to per-v.c. state
transmit
scheduler
3Scheduling for ATM VC's
- Background
- To be eligible for selection a v.c. must have
- at least one "credit" (i.e. it has not yet used
up its allocation) - at least one packet queued
- "Credits" may be accumulated by
- passage of time (for BW allocation)
- transmission from upstream (for end-to-end flow
control)
transmit
scheduler
4Scheduling for ATM VC's
- Problem
- Too expensive to find next ready queue via linear
search, if there are 1000's of virtual circuits - How can we avoid spending time looking at many
queues that are not ready? - Hint P12 "add state for speed"
transmit
scheduler
5Scheduling for ATM VC's
- Solution (version 0)
- Maintain a queue of ready v.c.'s
- When a flow becomes ready, add it via tail
pointer - Flow becomes ready when
- pkts gt 0, credits 0 ? 1
- credits gt 0, pkts 0 ? 1
- When a flow is serviced (i.e. a packet is
transmitted), remove it via head pointer - If still ready after xmission, place it back in
queue via tail
Tail
Head
scheduler
6Scheduling for ATM VC's
- Problem
- Credits and packets arrive asynchronously
- How to ensure v.c. is not added to the list
multiple times? - Ensure that events that may cause transition
between "ready" states are "atomic" - But locking is expensive!
Tail
Head
scheduler
7Scheduling for ATM VC's
- Solution 1
- Add a ready/not ready bit to each v.c.'s state
- bit set means the v.c. is already in the queue
- clear bit before adding to/removing from the
queue
Tail
Head
scheduler
8Scheduling for ATM VC's
- Problem
- Extend the scheme to allow some v.c.'s to get
"more opportunities to send", based on
administratively-assigned "weights" - E.g. a v.c.'s with a weight of 2 would get twice
as many opportunities as a v.c. with a weight of
1 - Possible Solution
- Multiply credits by weight when they arrive
Tail
Head
scheduler
9Passive Monitor Using Bridge Hardware
- Background
- Ethernet bridge box with connections to multiple
Ethernets - Keeps track of which interface a 48-bit MAC
address lives on - Forwards incoming packets out the right i/f for
the destination address - Contains special "CAM" hardware
- Keeps track of up to 64000 (addr, i/f ) pairs
- Given addr, returns i/f in just 1.4 ?sec
10Bridge Hardware Lookup
Address 00062012CDBB
20AB0C716FA8
I/F 4
0200206AF319
I/F 1
up to 64000 entries
00062012CDBB
I/F 6
Address 00062012CDBB ? I/F 6
11Passive Monitor Using Bridge Hardware
- Problem
- Want to build a monitor to count the number of
packets sent between source, destination MAC
address pairs - Keep a counter per (S,D) pair, where S, D are
48-bit MAC addresses - Need to exploit the existing bridge hardware
(P4c) to do the lookup in less than 64 ?sec
(minimum inter-packet time on 10Mbps Ethernet) - There are about 1000 possible sources and 1000
destinations, but only around 20000 pairs active
at any time
12Passive Monitor Using Bridge Hardware
- Naive Solution
- Map each 48-bit address to a smaller one (say 10
bits) using one lookup each - Store 2-D array indexed by 10-bit quantities
- Bad array size 1000000
- Better
- Map each 48-bit address to a smaller one (say 24
bits) using one lookup each - Concatenate 24-bit IDs to get a single 48-bit ID
for the connection - Retrieve counter using another hardware lookup
with that ID
13Passive Monitor Solution
Src 123456789ABC
Dest 123456789ABC
CounterID 99981B0122FE
Counter 322
SrcID 99981B
DestID 0122FE
14Refinement I
- Set of active source-destination pairs changes
over time. How to handle this without storing
state for every pair that ever communicated since
power-on? - Solution
- Run a "sweeper" process in the background, which
does the following - Mark every counter periodically
- Mark is cleared when counter is incremented
- Once per period, dump marked counters to
secondary storage and reclaim them (make them
available or use with new pairs)
15Challenge
- Can we solve the problem using only two lookups
with bridge hardware, without requiring extra
memory?
16Tries With Node Compression
- Background
- Trie tree data structure
- Each node is an array of M 2c elements (M lt 32)
- Each entry is either a key (value) or a pointer
to another trie node, or empty - Empty space is wasted
Key X
Key X
Key Q
5 out of 24 entries used ? apply P1
17Tries With Node Compression
- To search the trie
- Input string broken into c-bit "chunks"
- First chunk used as index into root node's array
- If the indexed entry is
- null search terminates (not found)
- Key search terminates (found, result Key)
- Pointer to another node search continues there
with next chunk
Key X
Key X
Key Q
18Tries With Node Compression
- Problem how to compress the tree to remove
wasted space without slowing search more than a
small factor? - Can we replace the node array with a linked list?
- Not good takes up to factor of M longer to search
Key X
Key X
Key Q
19Tries With Node Compression
- Idea
- Use a bitmap (P14) to indicate which entries are
missing - Replace M-element array with array containing
only non-null entries - To search with c-bit index k (indices start at
1) - Convert c-bit index k to smaller index
- If bit k of bitmap is 0 ? 0, else ? of 1's in
first k bits
Original index 4 New index 2
Key X
Key X
10010010
00010000
Key Q
00000100
20Tries With Node Compression
- Efficiency
- Can count 1's in bitmap using special hardware
(or s/w) - Requires two memory accesses
- one for bitmap
- one to read array entry
- Slowdown factor of 2
Original index 4 New index 2
Key X
Key X
10010010
00010000
Key Q
00000100
21Challenge I
- Challenge Can we use table lookup to speed up
counting of bits set in bitmap in software
(P2a, P14)? - Solution (for M8, c3)
- There are 256 possible bitmap values
- Keep a table indexed by bitmap value, position,
containing the (precomputed) number of 1 bits up
to that position (3 bits per table entry) - Total 256 x 8 x 3 6144 bits ? probably doable
in hardware
...
22Challenge II
- Challenge What if the bitmap is really huge?
(e.g. c16) - Table in previous solution would have 264K rows!
- How can we speed up counting bits for such a
large bitmap, with at most one additional memory
access? - Solution Apply P12 "add state for speed" and
P12a "compute incrementally" - Logically divide bitmap into chunks (say 32 bits
each) - For each node, keep an array B with one element
per chunk - For 64K bitmap in 32-bit chunks B has 2K
elements of 16 bits each 1K 32-bit words (still
much smaller than 64K entries!) - Bj contains of 1 bits in positions 1 through
32(j-1) - To count 1's up through position k Bk 1's
in chunk k - Count 1's in chunk k individually
23Packet Classification
- Background
- TCP/IP "flows" identified by some set of fields
in packet headers, including - Source, destination IP address
- Source, destination port number
- Protocol (e.g. TCP or UDP)
- Higher-level (application) information
- Packet "filter" specification of fields that
define a set of packets "of interest" to some
receiver - Example (simple)
- Source IP X, Dest IP
- Protocol UDP
- Source port , Dest port Z
24Packet Classification
- Background
- Consider a "router" that serves a group of
receivers - Receivers specify packets they want to receive by
giving filters - Router takes each incoming packet, matches it
against all filters, and forwards to each
receiver that has some matching filter - Example
- To receive NBC's video transmission from Winter
Olympics specify source address of NBC host in
Turin, UDP protocol, and a specific destination
port
25Packet Classification
- Problem
- How to compare each packet's headers against many
(possibly thousands) of filter specifications? - Observation
- Caching (P11a) is not a very good solution
- Caching is good for storing pairs of the form (x,
?(x)) in order to avoid re-computing (or looking
up) ?(x) - Store is keyed by x, which is typically small (48
bits or less) - Here the key could be
- How could we solve this if header contained a
"flow identifier" field F in the network layer
header? - Note IPv6 has such a field
26Packet Classification
- Problem How to assign/use "flow ID" field?
- Solution
- Let the sender assign it!
- E.g. keep a local counter, increment for each
flow - Keep (in addition to the regular list of filters)
a "fast" mapping from (source address, flow ID)
to sets of receivers - When a packet arrives, first search the "fast"
list - If present, dispatch to indicated set of
receivers - Otherwise search the "slow" list to determine set
G of receivers - Then add (source address, flow ID, G) to fast list
27Classification Tricky Bits
- If a source crashes and restarts flows with
different flow IDs, chaos could result - Solution
- Time out "Fast list" entries periodically
- Ensure rebooting host does not re-use old entries
(e.g. start with last used flowID 1) - Each "slow list" entry has a pointer to its "fast
list" entry - When a packet fails to match any "fast list"
entry, but does match a "slow list" entry,
existing fast list filter should be invalidated - When a receiver adds a new filter, "fast list"
entries may need to be updated - How to do this?