Title: STRESS: Systematic Testing of Protocol Robustness by Evaluation of Synthesized Scenarios
1STRESS Systematic Testing of Protocol Robustness
by Evaluation of Synthesized Scenarios
D. Estrin, S. Gupta, A. HelmyS. Begum, M.
Sharma, K. Seada, P. Jain
University of Southern Californiahttp//catarina.
usc.edu/stress
2Research Statement
- Objective design tools to aid development of
robust high-performance protocols - Observation much of protocol complexity is about
dealing with network failures - Fault-oriented approach useful
- Approach systematically
- Test protocol robustness
- Synthesize worst/best case performance scenarios
- Add value to any complete simulation framework
- Suggest protocol enhancements
3Anticipated Benefits
- Network manageability scenario as new faults are
identified during operation, stress techniques
may be applied to identify other system failures
that could result due to similar faults - STRESS anticipated to contribute to each stage of
life cycle of a protocol - During design, more thorough debugging, analysis,
and characterization - After design, more predictable behavior
- After deployment, more manageable networks,
continued robustness enhancements
4STRESS Framework
Testing
5Previous Results
- Robustness study for multicast routing
(PIM-DM/SM) - Uncovered looping, blackholes, overhead, join
latency problems. Some encountered two years
after our results - End-to-end multicast (SRM)
- Synthesized worst/best case performance scenarios
for the timer suppression mechanism - Wireless ad-hoc routing
- Generated scenarios for network partition for DSR
and AODV - Multicast over ATM network MARS
- Generated scenarios that maximize join/leave
latency and blackholes - Discovered packet losses due to different views
of the group - Mobile IP
- Generated scenarios of blackholes and duplication
(duration depends on sequencing and/or binding
policy) - Discovered unrecoverable error under a crash
scenario
6Summary of Current Research
- Multicast Congestion Control (pgmcc case study)
- NACK suppression leads to wrong acker selection
- Scenarios leading to unnecessary starvation (with
acker leave/crash), and duplicates/gaps with
reboots - Wireless ad-hoc MAC protocols (802.11 MACAW
studies) - Systematic analysis of 1st and 2nd order
collisions - Discovered
- Collisions
- Under-utilization due to lack of processing
during wait state - without mobility or loss!
7Ongoing work and Future Directions
- Complete automation and analysis of
- PGMCC
- 802.11
- MACAW
- Extend topology generation for wireless to
accommodate asymmetry, multiple power-levels and
mobility - Automate the process of generation of complete
STRESS scenarios for wireless protocols - Develop more robust and efficient versions of
wireless protocols
8Background / Review
9Problem Dimensions
- Problem Statement
- Automate synthesis of stress test scenarios
- A Test Scenario may include
- Topology (LAN, regular, random)
- Host Events (Joining, Leaving, Sending)
- Faults (losses, crashes)
10FSM Model Simple Multicast
S
NF
F
F Forwarder NF Non-forwarder R Receiver NR
Non-receiver
NF
F
Join
NR
HostJoin
NR
NR
R
R
HostJoin
Leave
Prune
host1
host2
Data
11Modeling Global FSM Model
- FSM ltStates, I/O Stimuli, Transitionsgt
F,NF,R,NR Join,Prune,Leave
- Global FSM
- Global State
- e.g. G F1,R2,NR3,R4
- Correctness
- iff R exists then exactly one F must exist
F Forwarder, NF Non-forwarder R Receiver, NR
Non-receiver
12Modeling Transition Table
13Modeling Transition Table (Contd.)
- Add subscripts to denote different routers
- Add semantics to denote multicast, unicast, and
broadcast messages - Derive pre/post-condition table automatically
Joini
Prunej.Ri
Prunej
Leavej.NRj
14Forward Search-based Technique
- Reachability Analysis
- Start from initial states and expand the space
- Reduce complexity using equivalence classes
- exploiting symmetry of multicast routing, e.g.
- G1F1,R2,NR3,R4 , G2R1,NR2,R3,F4
- equivalent state F1,R2,NR1
15Equivalence to Reduce Search Complexity Example
NF1,NR2,NR3
S1
J2
.
F1,NR2,NR3
NF1,R2,NR3
.
Equivalent Subspace
.
Pruned Space
16FOTG Algorithm Overview
- 0) Start from the fault (e.g. message to be lost)
- 1) Construct a global state needed to trigger the
message - 2) Use forward search to check for errors in
presence of message loss - 3) Search backward from the error to obtain a
sequence from an initial state
17FOTG Test Generation Example
Joini
Prunej.Ri
Prunej
Leavej.NRj
Leavej
Host Event
Constructed Topology
No loss of Join
GI1NRj,Ri,Fk
NFk
NRj,
Ri,
GI
Loss of Join
.
.
.
Prunej
Error state
GI1NRj,Ri,NFk
GI-1NRj,Ri,Fk
Initial State
.
.
.
time
18Timer-suppression Mechanism (TSM)
Requester
Request (q)
Response (p)
V. LAN
- Example problem statement
- Find the topology (delay matrix) that
experiences worst case - behavior (i.e. max number of responses, or no
suppression)
19End-to-End Algorithm Overview
- Identify target event (e.g. sending response)
- Identify wanted/unwanted conditions to max/min
the target event for worst/best case - Obtain sequences leading to the conditions using
backward search - Calculate the time values for these sequences
- Obtain relations/inequalities to satisfy the
wanted conditions
20Worst-case Overhead Example
t(pti) gt t(pri,j) and
t(Wanted) gt t(Cond) and t(Wanted) lt t(Unwanted),
or t(Cond) gt t(Unwanted)
t(pti) lt t(pri,j), or
t(qri) gt t(pri,j)
21Deriving Inequalities
- Times derived in terms of delays and timer
expirations - Worst case inequalities
Best-Case inequalities
22Reliable Multicast Congestion Control PGMCC
Sender
Receiver
Acker
Data
NACK
ACK
Sender sends packet
Packet lost by one receiver
Receiver sends NACK and Acker sends ACK
If NACK is from a receiver worse than Acker, then
it is designated as the new Acker
23Characteristics of FSM Model
- Sequence numbers, window, tokens, sender buffer
size - Channel model
- List of data packets, ACKs, NACKs between sender
and each receiver - Delay matrix (for the multicast tree)
- List of packets received by each receiver
- List of packets sent by sender
- List of ACKs received by sender
- Acker selection (throughput comparison)
- Steady state vs. transient state
- Complexity
- Proportional to the number of receivers
- Proportional to sender buffer size
24Evaluation Criteria
- Unnecessary starvation
- Duplicate packets received by receiver
- Gaps in sequence number by sender
- Out-of-order ACKs
- Wrong Acker selection (fairness problem)
- Packet loss without recovery
Scenarios Generated
- Unnecessary starvation
- Acker leave, crash
- Wrong Acker selection
- NACK suppression
- Loss ratio weighting
- Duplicate packets, sequence gaps
- Receiver, sender reboot
- Out-of-order ACKs
- Switch from a far Acker to a close Acker
25Unnecessary Starvation
26NAK Suppression
27Automation
- Forward search and fault-oriented search to be
implemented - Key challenges
- Impact of sequence numbers on search space
- Channel delay (dynamic)
- Congestion modeling (Traffic)
- Is this the best way to achieve fairness with TCP?
28Wireless MAC
FSM Model
- Neighborhood of a node i is modeled as a set Gi
where only the nodes in Gi hear the transmission
from i - A message from node i is broadcast that affects
only the nodes in Gi - Mobility is modeled by changing Gi
- Power of a node is modeled using discrete values
which is updated according to its state, duration
in the state and the type of message
transmitted/received
29Correctness Criteria
- When a node i is transmitting to some node j,
there must be only one transmission in the
transmission range of node i - Violation leads to collision two types
- Collision at intended receiver causes packet
loss triggering retransmission in general - Collision at nodes other than intended receiver
causes packet loss resulting in different views
of the channel - When the channel is reserved by a message from
time t1 to time t2, some transmission must be
going on during that period from the source of
the message to the destination - Violation leads to unnecessary deference
30Unnecessary Defer in 802.11
No loss, Static nodes
- GFSM Node j and m are deferring because of a
neighboring transmission i sends RTS to j - when RTS is received by nodes in Gi
- j does not reply to RTS as it is deferring
- k and l go to defer state
- Unnecessary deference by all nodes in the range
of transmitter as the destination ignores RTS
while deferring
31Automation
- Complexity of forward search is O(n) where n is
the number of nodes in the network - If nodes are dispersed, complexity of search
decreases with the decrease in number of nodes in
the transmission range of either transmitter or
receiver or in the overlapping region - Number of errors increases with the increase in
number of nodes in the neighborhood
32Comparison of MAC protocols
802.11 vs. MACAW
- Channel utilization of MACAW is better than
802.11 - 802.11 while in defer state, a node ignores RTS
destined for it - MACAW while in defer state, a node initiates the
exchange after the defer period expires - MACAW has higher delays in error recovery as it
has more timers - More errors in MACAW than 802.11 as it has more
control frames and timers - The current implementation of the search does not
allow a collision to be detected
33RTS-CTS Collision
Topology Generation for MACAW
Due to RTS-CTS collision B is unaware of the
successful RTS-CTS communication between C and D
34RTS-DS Collision
Due to RTS-DS collision B is unaware of the
successful RTS-CTS communication between C and D
35Data-Data Collision
36Conclusion
- Reasons for collisions
- Simultaneous transmissions
- Not updating timers correctly during wait state
- Secondary collisions caused following a primary
collision
37Types of Collisions