Title: Performability Modeling and FaultTolerant Communication Systems
1Performability Modeling and Fault-Tolerant
Communication Systems
MURI Review _at_ Berkeley, June 2001
- Dr. Kishor S. Trivedi
- Dr. Yonghuan Cao
- Center for Advanced Computing and Communication
(CACC) - Dept. of Electrical and Computer Engineering
- Duke University
- Email kst_at_ee.duke.edu
2Agenda
- Introduction
- Motivation, Objective and Methodology
- Accomplishment
- Performability modeling of wireless mobile
networks - Multicast Logical Information Feedback Tree
(LIFT) - Progress in TCP performance modeling w/ SDE
- Looking Forward
- Ongoing and Future work
- Potential Connections w/ others
- Conclusion
3Introduction
MURI Review _at_ Berkeley, June 2001
- Overview of research at Duke CACC, our
motivation, objective methodology
4A Research Triangle
Theory
SRN MRSPN FSPN NHCTMC SDE
Applications
Tools
Real-time Systems Fault-tolerant Systems Computer
Networks Wireless Communication
SPNP SHARPE SREPT
5Stochastic Modeling
- Stochastic Processes
- Discrete-time Markov Chains (DTMC)
- Continuous-time Markov Chains (CTMC)
- Semi-Markov Processes (SMP)
- Markov Regenerative Process (MRGP)
- Stochastic Formalisms
- Stochastic Reward Nets (SRN)
- Markov Regenerative Stochastic Reward Nets
(MRSRN) - Fluid Stochastic Petri Nets (FSPN)
- Stochastic Differential Equations (SDE)
- Automated Tool
- Stochastic Petri Net Package (SPNP)
6 Architecture of SPNP
Stochastic Reward Net (SRN) models
Markovian Stochastic Petri Net
Non-Markovian Stochastic Petri Net
Fluid Stochastic Petri Net (FSPN)
Reachability Graph
Analytic-Numeric Method
Discrete Event Simulation (DES)
Steady-State
Transient
Steady-State
Transient
SOR
Std. Uniformization
Batch Means
Indep. Replication
Gauss-Seidel
Fox-Glynn Method
Regenerative Simulation
Restart
Fast Methods
Power Method
Stiff Uniformization
Importance Sampling
Importance Splitting
7What Degrades Service?
Resource limit Channels, Buffer, Bandwidth,
Long waiting-time, Time-out, Service blocking,
Resource FULL
Outage-recovery Failures, Upgrades, Maintenance, H
uman-errors,
Incomplete service, Loss of information,
Resource LOSS
8Need Performability Modeling
- New technologies, services standards needs new
models - Traditional performance model may not be
applicable without proper treatment - Pure performance modeling too optimistic!
- Outage-and-recovery behavior not considered
Performability modeling Performance
Availability Performability A more complete and
balanced picture Both steady-state and transient
solutions are informative
9Accomplishment-1
MURI Review _at_ Berkeley, June 2001
- Performability Modeling of Wireless Mobile Systems
10Wireless Mobile Challenges
- Restricted Spectrum
- Scarce bandwidth ( 10Kbps 100Kbps 4Mbps )
- Error-prone link
- Channel fading, multiple path, building blocking,
- High mobility
- Needs mobility management
- Complex distributed location DBs
- More complicated by data services mobile IP
- Service diversity
- Traditional voice/paging
- Increasing demand for data services
(email,stock,www, )
11Topics Studied
- Performability of control channel protection in
cellular system - Uplink performance of wireless packet-switched
data (A 2.5G system, GPRS) - Performance of wireless downlink scheduling
policies - The performance impact of access delay to
capacity-on-demand multiple access
12Performability Modeling and Optimization of
Cellular Systems with Control Channel Failure and
Automatic Protection Switch (APS)
- Y. Cao, H.-R. Sun and K. S. Trivedi,
Performability Analysis of TDMA Cellular Systems,
PQNet2000, Japan, Nov., 2000. - H.-R. Sun, Y. Cao, K. S. Trivedi and J. J. Han,
Method and Apparatus for control channel
restoration in cellular systems, patent filed,
2000
13A TDMA Cellular System
- Each cell has Nb base repeaters (BR)
- Each BR provides M TDM channels
- One control channel resides in one of the BRs
14Traffic In a Cell
Common Channel Pool
A Cell
15Automatic Protection Switch (APS)
- Upon control_down, the failed control channel is
automatically switched to a channel on a working
base repeater.
16Performance Measures
- New call blocking probability, Pb
- Percentage of new calls rejected
- Handoff call dropping probability, Pd
- Percentage of calls forcefully terminated
crossing cells - Channel utilization, Uc
- Fraction of time in which available channel
resource is in use
Pb, Pd, and Uc are determined not only by system
parameters (such as no. of channels, call
admission control scheme, etc.), but also
incoming traffic characteristics and call
duration distributions.
17Model of System w/o APS
CTMC State (b,k) bNo. of BR up kNo. of
talking channels
18Model of System w/ APS
CTMC State (b,k) bNo. of BR up kNo. of
talking channels
A Segment of the Composite Markov Chain Model
19Numerical Results
Handoff Call Blocking Probability Improvement
by APS
Unavailability in handoff call dropping
probability
20Packet-level Performance Analysis of ALOHA
Reservation-based MAC in GPRS under Bursty Data
Traffic
Y. Cao, H.-R. Sun and K. S. Trivedi, Performance
Analysis of Reservation-based Media Access
Protocol with Access Queue and Serving Queue
under Bursty Traffic in GPRS/EGPRS, Wireless
Network (in review), January, 2001.
21Background
- GPRS, a 2.5G system, to evolve todays TDMA-based
GSM and tdmaOne towards 3G. - Circuit-switched voice and packet-switched data
services coexist. Voice has higher priority. - Capacity-on-demand concept and multi-slot
capability. Theoretical data rate up to 172 kbps.
22Uplink Data Transfer
- Slotted-ALOHA Reservation Protocol
- Capture capability to reduce collision
- Access queue (AQ) to alleviate contention
- Serving queue (SQ)
- Cross the TDMA frame boundaries, dynamic channel
allocation - Bursty data traffic
23GPRS/GSM Architecture
PSTN
MSC/VLR
HLR
BSC
SS7
BTS
EIR
MS
24Protocol Stack Segmentation
Application
IP/X.25
SNDCP
LLC
RLC
MAC
GSM RF
Mobile terminal
25The SRN Model
LLC arrival on-off
Finite buffer Connection
pmf of LLC frame size
The tagged mobile
The rest (N-1) mobiles
26Model Accuracy
Simulation 95 CI Written in C
SRN Model Using SPNP
27Components of Frame Delay
- Waiting time in access queue dominates delay (due
to limited channel). - Contention delay negligible due to AQ and
capture.
28Performance of Queue Length Channel Quality
Based Wireless Scheduling Policies
Y. Cao, H.-R. Sun and K. S. Trivedi, Performance
of queue length and channel quality based
wireless scheduling policies, CACC Technical
Report, March, 2001.
29The Problem
A
A Scheduling Scenario
B
Wired Network
C
c
a
b
In one time slot, only one of the three downlink
streams (A-a, B-b, C-c) is allowed to transmit!
Which to choose?
30Another Look
Scheduler
a
b
c
Base Station
Incoming Traffic
Wireless Link
Terminals
31Harder Than Wire-line
Wire-line scheduling always assumes error-free
links w/ high bandwidth.
c
a
b
- Wireless Link
- High error rates / bursty errors
- Location-dependent capacity
- Time-varying link quality
- Very low bandwidth
Wireless scheduling needs to consider
time-varying channel quality.
32A Quality-aware Scheduler
a
- Two Schedulers
- Naïve Round Robin
- (NRR)
- Best-Quality-First
- (BQF)
Link Capacity _at_ t
b
c
time
NRR
Throughput under backlogged traffic
BQF
BQF Throughput Optimal!!
33Problem with BQF
Starvation may occur to queues with low average
quality.
Good channel
Bad channels
Queues with bad channels blow up.
A scheduler needs to take into account not only
link quality but also queue length.
34GWQL Scheduling
q1(t)
m1(t)
q2(t)
m2(t)
qn(t)
mn(t)
Generalized Weighted Queue Length (GWQL)
Scheduling Define score wi zi qi(t) mi(t), zi
gt 0 In each time slot, data is transmitted to the
mobile with the highest score. In case of tie,
one of them is randomly chosen.
35What GWQL Can Be?
GWQL (Generalized Weight Queue Length)
wi zi qi(t) mi(t)
zi 1
WQL (Weighted Queue Length)
mi(t) m
qi(t) q
BQF (Best-quality-first)
LQF (Longest-queue-first)
36What Makes a Good Scheduler?
Ideal Scheduler
GWQL
Simplicity Flow Isolation Optimal Throughput Uti
lization Heterogeneous QoS Guarantee
Yes. Only q-length and link-status! Depend on
traffic pattern? and channel variation? Yes.
Throughput Optimal! Tassiulas92, McKeown96,
Wasserman97 Need to set zi properly?
37Need A Performance Model
- To study the performance impact of traffic
burstiness and channel variation. - To evaluate the capability of satisfying
heterogeneous QoS requirements.
38Traffic Model
- Traffic Model
- Markov Modulated Poisson Process (MMPP) FMH92
- Able to capture inter-arrival correlations
- Able to characterize traffic burstiness
- Yet still analytically tractable!
OFF
ON
ON
A 2-state MMPP traffic model
39Channel Model
- Bursty-error Channel Model
- The well-known Gilbert-Elliot Model Gil60
- A two-state Markov chain (Good Bad)
- Extension to the GE model
- Finite-state Markov channel Wang95
- Model parameters can be derived from channel
- fading distribution and mobile speed.
a0
Bad
m2
Good
m1
a1
40A Stochastic Petri Net Model
Building the Markov chain by hand is tedious and
not necessary. We use stochastic Petri net (SPN).
L
a0
s0
a1
a1
m (g)
l
Link model
Two-state MMPP Traffic
Finite Buffer
GWQL scheduling policy is embodied in the guard
function (g).
41Measures of Interest
- Blocking Probability
- The probability that an arriving packet sees a
full queue. - Packet Delay
- The response time experienced by a packet
accepted to the queue. - Individual and System Throughput
- The amount of packets transmitted in a time unit.
42Burstiness Effect
Blocking Probability
q1(t)
Poisson
m1(t)
q2(t)
MMPP
m2(t)
Same channels
Burstiness Measure of MMPP Index of Dispersion
for Counts (IDC) IDC 1 for Poisson IDC gt 1 for
MMPP
Burstiness (IDC)
43Burstiness Impact
Packet Delay
Throughput
Burstiness (IDC)
Burstiness (IDC)
44Channel Variation
Blocking Probability
q1(t)
Poisson
m1(t)
Poisson
q2(t)
m2(t)
Measure of Channel Variation Squared coefficient
of variation C2m Varm/Em2 C2m 1/a1 Bad
Duration, if Em fixed
Bad Duration
45Effect of Channel Variation
Throughput
Packet Delay
Bad Duration
Bad Duration
46Tuning GWQL
Performance of an individual mobile is
bounded Upper bound when zi ? very large,
the queue always has the highest priority,
served whenever queue is not empty. Lower
bound when zi ? very small, the queue always
has the lowest priority, served only when all
other queue are empty. Each bound is also
determined by the channel.
47QoS Tuning Capability
Blocking Probability
z1
q1(t)
Poisson
m1(t)
Poisson
q2(t)
m2(t)
Identical Channels
z2
z2 /z1
48GWQL Tuning Capability
Packet Delay
Throughput
z2 /z1
z2 /z1
49GWQL Conclusion
- Traffic burstiness not only deteriorates one
mobile, but also the rest mobiles sharing the
same link. - Traffic regulation is needed for flow
isolation. - Large channel variation has significant negative
impact to all mobiles. - Second-moment channel information may improve.
- Tuning capability is bounded. Performance
appears sensitive to the values of zis. - The model developed is useful in search for
proper zi.
50The Effect of Access Delay in Capacity-on-demand
over a Wireless Link Under Bursty
Packet-switched Data
Y. Cao, H.-R. Sun and K. S. Trivedi,The Effect of
Access Delay in Capacity-on-demand over a
Wireless Link Under Bursty Packet-switched Data,
Performance Evaluation (submitted), March, 2001.
51Problem Definition
Access Scenario
c
a
b
Radio resource (the number of channels) is
limited. A number of mobile with data to send
compete radio links. A mobile may experience
access delay. How does access delay affect
individual performance?
52Capacity-on-demand
Todays common wireless data applications (www,
email, stock, )
Traffic A
Traffic B
Call (session) duration
For Traffic A, worth to dedicate a channel for
the entire call duration. For Traffic B, not a
good idea wasting resource in silent periods.
Capacity-on-demand to optimize the utilization
of radio links. Only establish connection when
having data to send, Release connection once data
is emptied.
53Impact of Access Delay
Traffic
Packet may drop if access too long.
A Access Delay S Service
A
A
S
S
Connection
Access delay may cause buffer overflow, long
waiting-time, etc.
54Cause of Access Delay
- Access delay is determined by a strongly coupled
system - The number of mobiles,
- Traffic pattern on each mobiles (user
behaviors), - Available radio resource (number of channels)
- The particular multiple access (MA) mechanism
The distribution of access delay is virtually
unknown and can be arbitrarily general.
55Objective
Random variable A Access Delay Want to
understand 1. How the distribution (shape)
of access delay may affect performance. 2.
Is the mean value EA enough? 3. Can a
simple distribution (such as exponential) be
used for good approximation?
56A Queueing Model
L
G/Activation
MMPP
MMPP/G/1/L with server activation
Note 1. MMPP arrivals Bursty traffic 2.
Service time (G) Arriving packets of diff.
sizes 3. Server activation (A) Link access delay
57Exhaustive Principle
Once connection is established, all buffered data
and arrivals during the connection will be
transmitted. Connection is released immediately
after buffer is emptied.
58Model Analysis
A state (l,s,m) l No. of packets in buffer
(0, 1, , L) s Server off/on (0, 1) m MMPP
state (1, 2, , M)
State-space based approach
592 Types of Transitions
MMPP Counting Process
Exponential Transition ?
Server off
Server on
General Transition --?
60An MRGP
In a semi-Markov process (SMP), state does not
change between two consecutive regenerative
points. When a general transition is enabled,
the exponential transitions (of the MMPP counting
process) keep going on and state may change. The
process is more complicated than an SMP. It is a
Markov regenerative process (MRGP).
61CTMC ? SMP ? MRGP
CTMC
SMP
state
state
t
t
T Exp
T Gen
T Exp
state
MRGP
t
T1 Gen
T1 Gen
62MRGP Analysis
Two kernels Global Kernel K(t)
Kij(t) Kij(t) (t) PrY1 j, T1 lt t Y0
i Local Kernel E(t) Eij(t) Eij(t)
PrZ(t) j, T1 gt t Z(0) i Define V(t)
Vij (t), Vij (t) PrZ(t) j Y0
i V(t) E(t) K V(t)
63Steady-state Solution
1. Steady-state solution of the embedded DTMC
with P K( )
2. The integral
Uniformization method used
3. The steady-state probability vector
4. Measures of interest can be derived from p
64Measures of Interest
- Blocking Probability (Pb)
- The probability that an arriving packet sees a
full queue. - Packet Delay (t)
- The response time experienced by a packet
accepted to the queue. - Activation Rate (rA)
- Number of times that the server needs to set up
per unit time. Overhead of capacity-on-demand.
65Effect of Access Delay
1. Traffic model two-state MMPP
2. Service time pmf of packet size
- Distributions of Access delay
- Exponential
- 2-stage Erlang
- 3-stage Erlang
- Deterministic
66Blocking Probability
Blocking probability
EA
67Packet Delay
Mean Packet Delay
EA
68Server Busy/Idle Time
Mean busy time
Mean Idle time
EA
EA
69Effect of Traffic Pattern
Comparison of Poisson Arrival and MMPP (Same Ave.
Rates)
Mean Packet Delay
Blocking probability
EA
EA
70Activation Rate
Poisson
MMPP
71Access Delay Conclusion
- A general queueing model with server activation
is used to study the impact of access delay to
bursty wireless data applications. - Have developed efficient numerical method to
solve the model. - From numerical results steady-state performance
measures appear not very sensitive to the
distribution of access delay. - Good news for further system-level evaluation.
72Accomplishment-2
MURI Review _at_ Berkeley, June 2001
- Logical Information Feedback Tree (LIFT) for
Many-to-many Distribution
A. Rodriguez, LIFT Logical Information Feedback
Tree for Information Dissemination in Wide-Area
Networks, Master Thesis, May, 2001.
73The Problem
- Solutions to native IP multicast problems Router
Replacement (too costly, x) or Network Overlays
(v)? - Overlay nodes should be organized in a
hierarchical fashion (such as a tree) to limit
overlay maximum width and thus reduce
propagation delay. - Applications Content Distribution Networks
(CDN), application layer multicast, reliable
multicast protocols, etc.
74Logical Information Feedback Tree
- Basis of LIFT
- Hyper-Chromatic Tree, a distributed parallel
version of Red-Black Tree (RBT) Messeguer
Valles 98 - LIFT beyond Hyper-Chromatic Tree
- Node synchronization to allow tree resemble the
underlying substrate network topology - Properties
- Balanced height bounded Olog(n).
- Decentralized no centralized control/info.
needed - Robust dynamic node insertion/deletion, etc.
75Preliminary Results
- A proposal of the LIFT protocol.
- LIFT Protocol simulator (based on ns-2).
- Studied --
- control packet overhead,
- tree convergence rate,
- tree transient behavior under network dynamics
(reaction to node deletion, insertion, etc.), - capability of matching substrate network
topology, etc.
76Accomplishment-3
MURI Review _at_ Berkeley, June 2001
- TCP Performance Analysis using Stochastic
Differential Equations (SDE)
77TCP Performance (Throughput and Goodput) Study
using Stochastic Differential Equations (SDE)
Yiguang Hong, Yonghuan Cao and K. S. Trivedi, A
Note on TCP Throughput and Goodput, Submitted to
IEEE Communication Letters, April, 2001.
78TCP Goodput Throughput
- Previous studies assume backlogged source (May
not true for interactive connections). - Previous studies focus more on throughput, but
goodput may be more important to a user.
Network
TX
RX
X(t)
Packet Loss (p)
A single TCP connection over an unreliable link
79An SDE Formulation
- TCP source traffic X(t) input to buffer Q(t)
- Packet loss is Poisson (modeled as a Poisson
counter, N) - TCP window size W(t) increase/decrease due to
packet loss
R Round Trip Time (RTT)
80The Critical Point
- There exists a critical point of loss probability
(p), before and after which TCP perform
differently. - Goodput EX, when p lt p.
- The well-known formula is only valid after the
critical point (p).
81Looking Forward
MURI Review _at_ Berkeley, June 2001
- Ongoing/future work and potential connections
with others
82Wireless Data Networks
Ongoing Work
- A joint analysis of multiple layers in wireless
networks has been under way. - The accomplished work so far is the basis for
joint analysis. (GPRS from PHY to LLC)
83Why Joint Analysis?
- Low layer high availability and high performance
(HA/HP) does not necessary mean HA/HP on higher
layer. - For example, even a Gb/s network will crash if no
effective routing protocol to evenly distribute
traffic across the network - Joint analysis can achieve optimal protocol
design.
84Challenges
- Stiffness
- Different time scales connection level, burst
level, packet level, etc. - Largeness
- Different layers different states, state space
explosion with large number of users - Potential solutions
- hierarchical decomposition, state aggregation,
layer abstraction, etc.
85Potential Connections
- A joint research with Mostafa (Gatech, QoS team)?
86Performance Modeling Dynamic Voting System
Ongoing Work
- Performance modeling was ignored in early studies
due to complexity. - Oversimplification in few performance studies of
dynamic voting systems. - We build comprehensive model with the help of
stochastic modeling tool (SPNP) to study
different realistic dynamic voting schemes. - Probabilistic nature of link/node failures,
delays, etc.
87Potential Connection
- A joint research with Nancys group
- (MIT, Group Communication Service)?
88Extension to TCP Performance
Future Work
- Analytical performance modeling (using FSPN/SDE)
from single connection to multiple connections - Congestion control mechanisms AQM (Active Queue
Management), RED, etc. - Emphasis on fairness, stability analysis, and
optimization.
89Extensions to LIFT
Future Work
- Fault-tolerance in LIFT
- LIFT is inherently fault-tolerant in detection of
node disappearance, but need to consider failure
during a rebalancing operation. - Simulation conducted, but need analytical model
to study performance. - Need to study LIFT-based congestion control.
90Potential Connections
- A joint research with ?
- Dr. A. Zakhor
- Real-time multicast video over packet-switched
networks Tan Zakhor -
- Dr. K. Shin
- Feedback synchronization for multicast ABR Flow
control Zhang Shin
91The End Thank you!