Title: John Doyle
1Theory of Complex Networks
- John Doyle
- Control and Dynamical Systems
- Caltech
2Collaboratorsand contributors(partial list)
- AfCS Simon, Sternberg, Arkin,
- Biology Csete,Yi, Borisuk, Bolouri, Kitano,
Kurata, Khammash, El-Samad, Gross,Ingalls,
Sauro, - Turbulence Bamieh, Dahleh, Gharib, Marsden,
Bobba, - Theory Parrilo, Paganini, Carlson, Lall,
Barahona, DAndrea, - Web/Internet Low, Effros, Zhu,Yu, Chandy,
Willinger, - Physics Mabuchi, Doherty, Marsden,
Asimakapoulos, - Engineering CAD Ortiz, Murray, Schroder,
Burdick, Barr, - Disturbance ecology Moritz, Carlson, Robert,
- Power systems Verghese, Lesieutre,
- Finance Primbs, Yamada, Giannelli, Martinez,
- and casts of thousands
Caltech faculty
Other Caltech
Other
3Message
- Possible beginnings of a coherent theory of
Internet applications and protocols - Builds on work of many others
- Wired Internet still interesting and good warm-up
for more challenging problems - Internet good meeting place for communications
and control theorists - More this afternoon
- TCP, Low
- Biology, Doyle
4Warning, caveat, apology,
- The initial results might seem pretty weird
- Probably my fault for a bad explanation
- Im going to try to be very tutorial, but Ill
need lots of help from you (to make sense of what
Im trying to say) - Not much big picture today
5Network protocols.
Files
HTTP
TCP
IP
packets
packets
packets
packets
packets
packets
Routers
6Web/internet traffic
web traffic
Is streamed out on the net.
Web client
Creating internet traffic
Web servers
7 web traffic
Lets look at some web traffic
Is streamed out on the net.
Web client
Creating internet traffic
Web servers
86
Data compression (Huffman)
WWW files Mbytes (Crovella)
5
4
Cumulative
3
Frequency
Forest fires 1000 km2 (Malamud)
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Decimated data Log (base 10)
Size of events
9Probability that a file is bigger than x.
106
Web files
5
Codewords
4
Cumulative
3
Frequency
Fires
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
Log (base 10)
116
gt1e5 files
Data compression (Huffman)
WWW files Mbytes (Crovella)
5
4
gt4e3 fires
Cumulative
3
Frequency
Forest fires 1000 km2 (Malamud)
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Decimated data Log (base 10)
Size of events
1220th Centurys 100 largest disasters worldwide
2
10
Technological (10B)
Natural (100B)
1
10
US Power outages (10M of customers)
0
10
-2
-1
0
10
10
10
132
10
Log(Cumulative frequency)
1
10
Log(rank)
0
10
-2
-1
0
10
10
10
Log(size)
14100
80
Technological (10B)
rank
60
Natural (100B)
40
20
0
0
2
4
6
8
10
12
14
size
152
10
Log(rank)
1
10
0
10
-2
-1
0
10
10
10
Log(size)
1620th Centurys 100 largest disasters worldwide
2
10
Technological (10B)
Natural (100B)
1
10
US Power outages (10M of customers)
0
10
-2
-1
0
10
10
10
176
Data compression
WWW files Mbytes
5
4
Cumulative
3
Frequency
Forest fires 1000 km2
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Decimated data Log (base 10)
Size of events
186
Data compression
WWW files Mbytes
5
exponential
4
-1
Cumulative
3
Frequency
Forest fires 1000 km2
2
-1/2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
190
1
2
3
10
10
10
10
0
10
.5
-1
10
loglog
-2
1
10
-3
10
exp
-4
10
Plotting power laws and exponentials
206
Data compression
WWW files Mbytes
5
exponential
4
Cumulative
All events are close in size.
3
Frequency
Forest fires 1000 km2
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
216
Data compression
WWW files Mbytes
5
4
-1
Cumulative
3
Frequency
Forest fires 1000 km2
2
-1/2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
22Progress
- Unified view of web and internet protocols
- Good place to start
- Add feedback and dynamics to communications
- Observations fat tails (Willinger, Paxson,
Floyd, Crovella) - Theory Source coding and web layout (Doyle)
- Theory Channel coding and congestion control
(Low) - Unified view of robustness and computation
- Anecdotes from engineering and biology
- New theory (especially Parrilo)
- Not enough time today
236
Data compression
WWW files Mbytes
Robust
5
4
-1
Cumulative
3
Frequency
Forest fires 1000 km2
2
-1/2
1
Yet Fragile
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
24Robustness of HOT systems
Fragile
Fragile (to unknown or rare perturbations)
Robust (to known and designed-for uncertainties)
Uncertainties
Robust
25Large scale phenomena is extremely non-Gaussian
- The microscopic world is largely exponential
- The laboratory world is largely Gaussian because
of the central limit theorem - The large scale phenomena has heavy tails (fat
tails) and (roughly) power laws - CLT gives Gaussians and power laws
- (Multiplicatively, gives lognormals too.)
- Power laws are just exponentials in log(x),
whereas lognormals are normal (Gaussian) - Statistically, power laws are no big surprise
26The HOT view of power laws
- Engineers design (and evolution selects) for
systems with certain typical properties - Optimized for average (mean) behavior
- Optimizing the mean often (but not always) yields
high variance and heavy tails - Power laws arise from heavy tails when there is
enough aggregate data - One symptom of robust, yet fragile
- Joint work with Jean Carlson, Physics, UCSB
27Robustness of HOT systems
Fragile
Other moments go up
Push down the mean
Robust
28HOT and fat tails?
- Surprisingly good explanation of statistics
(given the severity of the abstraction) - But statistics are of secondary importance
- Not mere curve fitting, insights lead to new
designs - Understanding ? design
29Examples of HOT fat tails?
- Power outages
- Web/Internet file traffic
- Forest fires
- Commercial aviation delays/cancellations
- Disk files, CPU utilization,
- Deaths or dollars lost due to man-made or natural
disasters? - Financial market volatility?
- Ecosystem and specie extinction events?
- Other mechanisms, examples?
30Examples with additional mechanisms?
- Word rank (Zipfs law)
- Income and wealth of individuals and companies
- Citations, papers
- Social and professional networks
- City sizes
- Many others.
- (Simon, Mandelbrot, )
31Data
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
32Data Model/Theory
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
336
WWW files Mbytes (Crovella)
5
4
Cumulative
Most files are small (mice)
3
Frequency
2
Most packets are in large files (elephants)
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Decimated data Log (base 10)
Size of events
34Mice
Network
Sources
Elephants
35Router queues
Mice
Delay sensitive
Network
Sources
Bandwidth sensitive
Elephants
36Log(bandwidth)
cheap
Expensive
Log(delay)
- Well focus to begin with on similar tradeoffs
in internetworking between bandwidth and delay. - Well assume TCP (via retransmission) eliminates
loss, and will return to this issue later.
37Log(bandwidth)
BW
Bulk transfers (most packets)
Web navigation, voice (most files)
Delay
- Mice many small files of few packets which the
user presumably wants ASAP - Elephants few large files of many packets for
which average bandwidth will be more important
than individual packet delay - Most files are mice but most packets are in
elephants - which is the manifestation of fat tails in the
web and internet.
Log(delay)
38Log(bandwidth)
BW
Bulk transfers (most packets)
Web navigation, voice (most files)
Claim I Current traffic dominated by these two
types of flows
Delay
Log(delay)
Claim II Intrinsic feature of many future
network applications
Claim III Ideal traffic for a properly
controlled network (Low, Paganini, and Doyle)
39Data
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
40Data Model/Theory
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
416
WWW files Mbytes (Crovella)
5
4
Cumulative
Most files are small (mice)
3
Frequency
2
Most packets are in large files (elephants)
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Decimated data Log (base 10)
Size of events
426
Data compression
WWW files Mbytes
5
exponential
4
Cumulative
All events are close in size.
3
Frequency
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
Size of events
43Source coding for data compression
44Shannon coding
- Ignore value of information, consider only
surprise - Compress average codeword length (over stochastic
ensembles of source words rather than actual
files) - Constraint on codewords of unique decodability
- Equivalent to building barriers in a zero
dimensional tree - Optimal distribution (exponential) and optimal
cost are
45Web layout as generalized source coding
- Keep parts of Shannon abstraction
- Minimize downloaded file size
- Averaged over an ensemble of user access
- Equivalent to building 0-dimensional barriers in
a 1- dimensional tree of content - Thanks (and apologies) to Michelle Effros for
helpful discussions
46A toy website model( 1-d grid HOT design)
document
47Optimize 0-dimensional cuts in a 1-dimensional
document
links files
48Probability of user access
49Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
50- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
51Paper
4. Title Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
52Paper
4. Title Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
53Paper
Title Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Title Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Title Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
- Title
- Title
- Title
- Title
- Title
- Title
Title Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
Abstract Abstract Abstract Abstract Abstract
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
- Title
- Title
- Title
- Title
- Title
- Title
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
54High resolution
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People
Homepage Publications Projects People Pictures
55Probability of user access
- Navigate with feedback
- Limit clicks constrain depth
- Minimize average file size
- Heavy tail file transfer sizes
- File sizes even more heavy
- Robust to user access (power law, exponential,
Gaussian)
56Probability of user access
Wasteful
57Probability of user access
Hard to navigate.
58More complete website models(Zhu, Yu)
- Detailed models
- user behavior
- content and hyperlinks
- Necessary for real web layout optimization
- Statistics consistent with simpler models
- Improved protocol design (TCP)
- Commercial implications still unclear
59More complete website models(Zhu, Yu)
Is there a simpler abstraction that captures the
essence of this problem? And is consistent with
more believable models?
60Generalized coding problems
- Minimize avg file transfer
- No feedback
- Discrete (0-d) topology
- Minimize avg file transfer
- Feedback
- 1-d topology
Web
Data compression
61Feedback
- User navigates web
- Feedback
- Click, download, look
- Repeat
Web
Data compression
62Topology
- Discrete topology
- Source unordered
- Content determines topology
- Information is connected
- Trees are 1-d
- Hyperlinking makes d lt 1
Web
Data compression
63Generalized coding problems
- Optimizing d-1 dimensional cuts in d dimensional
spaces - To minimize average size of files
- Models of greatly varying detail all give a
consistent story. - Power laws have ? ? 1/d.
Web
Data compression
64Source coding for data compression
65Shannon coding
- Ignore value of information, consider only
surprise - Compress average codeword length (over stochastic
ensembles of source words rather than actual
files) - Constraint on codewords of unique decodability
- Equivalent to building barriers in a zero
dimensional tree - Optimal distribution (exponential) and optimal
cost are
66Shannon source coding
Minimize expected length
Krafts inequality
67Codewords
0
0 100 101 110 11100 11101 11110 11111
100
10
101
1
110
11100
11
1110
11101
111
11110
1111
11111
Krafts inequality Prefix-less code
68Codewords
0 dimensional (discrete) tree
0
0 100 101 110 11100 11101 11110 11111
100
10
101
1
110
11100
11
1110
11101
111
11110
1111
11111
Krafts inequality Prefix-less code
69Coding building barriers
Channel coding
Source coding
Krafts inequality Prefix-less code
Channel noise
70Control building barriers
71Minimize
Leads to optimal solutions for codeword lengths.
With optimal cost
Equivalent to optimal barriers on a discrete
tree (zero dimensional).
72- Compressed files look like white noise.
- Compression improves robustness to limitations
in resources of bandwidth and memory. - Compression makes everything else much more
fragile - Loss or errors in compressed file
- Statistics of source file
- Information theory also addresses these issues at
the expense of (much) greater complexity
73To compare with data.
74To compare with data.
75(No Transcript)
76Data
6
DC
5
How well does the model predict the data?
4
3
2
1
0
-1
0
1
2
77Data Model
6
DC
5
How well does the model predict the data?
4
3
Not surprising, because the file was compressed
using Shannon theory.
2
1
0
-1
0
1
2
Small discrepancy due to integer lengths.
78Why is this a good model?
- Lots of models will reproduce an exponential
distribution, so the fit to data is nothing
special - Many popular models are just fancy random number
generators - Shannon source coding lets us systematically
produce optimal and easily decodable compressed
files - Fitting the data is necessary but far from
sufficient for a good model - A good theory says why a different distribution
is intrinsically bad, and how to fix the design
79Generalized coding problems
- Minimize avg file transfer
- No feedback
- Discrete (0-d) topology
- Minimize avg file transfer
- Feedback
- 1-d topology
Web
Data compression
80PLR optimization
Minimize expected loss
81document
r density of links or files l size of files
82d-dimensional
li volume enclosed ri barrier density
pi Probability of event
Resource/loss relationship
83PLR optimization
? 0 data compression ? 1 web layout
? dimension
84PLR optimization
? 0 data compression
? 0 is Shannon source coding
85Minimize average cost using standard Lagrange
multipliers
Leads to optimal solutions for resource
allocations and the relationship between the
event probabilities and sizes.
With optimal cost
86Minimize average cost using standard Lagrange
multipliers
Leads to optimal solutions for resource
allocations and the relationship between the
event probabilities and sizes.
With optimal cost
87To compare with data.
88To compare with data.
89(No Transcript)
90Data
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
91Data Model/Theory
6
DC
5
WWW
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
92Why is this a good model?
- Lots of models will reproduce heavy tailed
distribution, so the fit to data is nothing
special - Many popular models are just fancy random number
generators - A good theory says why a different distribution
is intrinsically bad, and how to fix the design - A good theory explains variations that are seen
in statistics - May not be much different than the d0 case, just
less familiar - One model includes data compression and web
layout (though the latter very abstractly) - The HOT web layout model captures geometry of web
browsing, not just the statistics - Multiple models of varying complexity and
fidelity all give consistent statistics
93Data Model/Theory
6
5
WWW
4
Are individual websites distributed like this?
3
2
Roughly, yes.
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
94Data Model/Theory
6
DC
5
WWW
4
How has the data changed since 1995?
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
95More complete website models(Zhu, Yu)
- More complex hyperlinks leads to steeper
distributions with 1lt a lt 2 - Optimize file sizes within a fixed topology
- Tree a ? 1
- Random graph a ? 2
- No analytic solutions
96Typical web traffic
Heavy tailed web traffic
? gt 1.0
log(freq gt size)
p ? s-?
log(file size)
Is streamed out on the net.
Creating fractal Gaussian internet traffic
(Willinger,)
Web servers
97Fat tail web traffic
time
creating long-range correlations with
Is streamed onto the Internet
98The broader Shannon abstraction
- Information surprise and therefore ignoring
- Value or timeliness of information
- Topology of information
- Separate source and channel coding
- Data compression
- Error-correcting codes (expansion)
- Eliminate time and space
- Stochastic relaxation (ensembles)
- Asymptopia
- Brilliantly elegant and applicable, but brittle?
- Better departure point than Kolmogorov, et al?
99Thinking about the Internet like Shannon might?
- Perhaps existing information theory results dont
apply (beyond network coding) - But much of the style and inspiration might
still be relevant - Maybe, dont apply Shannon, but instead
- try to approach this like Shannon might?
- Just because this messy, clunky presentation may
miss the mark - dont give up thinking about the Internet as a
generalized coding problem.
100What can we keep?
What must we change?
- Separation
- Source and channel
- Congestion control and error correction
- Estimation and control
- Tractable relaxations
- Stochastic embeddings
- Convex relaxations
- Add to information
- Value
- Time and dynamics
- Topology
- Feedback!!!
- More subtle treatment of computational complexity
- Naïve formulations intractable
101Mice
Network
Sources
Elephants
102Router queues
Mice
Delay sensitive
Network
Sources
Bandwidth sensitive
Elephants
103Router queues
Mice
Delay sensitive
Network
Control
Sources
Bandwidth sensitive
Elephants
104Log(bandwidth)
BW
Bulk transfers (most packets)
Web navigation, voice (most files)
Claim (channel) We can tweak TCP using ECN and
REM to make these flows co-exist very nicely.
Delay
- Specifically
- Mice/elephants are ideal traffic
- Keep queues empty (ECN/REM).
- BW slightly improved (packet loss)
- Delay greatly improved (queuing)
- Provision network for BW
- Free QOS for Delay
- Network level stays simple
Log(delay)
Currently Delays are aggravated by queuing delay
and packet drops from congestion caused by BW
traffic?
105Log(bandwidth)
BW
The rare traffic that cant or wont will be
expensive, and essentially pay for the rest.
Delay
Expensive
Log(delay)
Claim (source) Many (future) applications are
natural and intrinsically coded into exactly this
kind of fat-tailed traffic. It will get more
extreme (not less).
106Fat tailed traffic is intrinsic
- Two types of application traffic are important
communications and control - Communication to and/or from humans (from web to
virtual reality) - Sensing and/or control of dynamical systems
- Claim both can be naturally coded into
fat-tailed BW delay traffic - This claim needs more research
107BW
Log(bandwidth)
Abstraction
Expensive
Delay
Log(delay)
- Separate source and channel coding
- Source is coded into
- Delay sensitive mice
- Bandwidth sensitive elephants
- Channel coding congestion control
- Sources (applications) love mice/elephants
- Channel (controlled network) loves mice/elephants
- The best of all possible worlds
108A HOT forest fire abstraction
Fire suppression mechanisms must stop a 1-d front.
Optimal strategies must tradeoff resources with
risk.
109Generalized coding problems
- Optimizing d-1 dimensional cuts in d dimensional
spaces - To minimize average size of files or fires,
subject to resource constraint. - Models of greatly varying detail all give a
consistent story.
Data compression
Web
110Theory
d 0 data compression d 1 web layout d
2 forest fires
111Data
6
DC
5
WWW
4
3
FF
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
112Data Model/Theory
6
DC
5
WWW
4
3
FF
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
113Forest fires?
Fire suppression mechanisms must stop a 1-d front.
114Forest fires?
Geography could make d lt2.
115California geographyfurther irresponsible
speculation
- Rugged terrain, mountains, deserts
- Fractal dimension d ? 1?
- Dry Santa Ana winds drive large (? 1-d) fires
116Data HOT Model/Theory
6
5
California brushfires
4
3
FF (national) d 2
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
117Thank you for your indulgence
118Data HOTSOC
6
5
4
3
2
1
0
-1
-6
-5
-4
-3
-2
-1
0
1
2
119Critical/SOC exponents are way off
Data ? gt .5
SOC ? lt .15
120Cumulative distributions
SOC ? .15
12118 Sep 1998
Forest Fires An Example of Self-Organized
Critical Behavior Bruce D. Malamud, Gleb Morein,
Donald L. Turcotte
4 data sets
122HOT FF d 2
2
10
1
10
0
10
-2
-1
0
1
2
3
4
10
10
10
10
10
10
10
Additional 3 data sets
123(No Transcript)
124Fires are compact regions of nontrivial area.
Fires 1930-1990
Fires 1991-1995
125HOT
SOC and HOT have very different power laws.
d1
SOC
d1
- HOT ? decreases with dimension.
- SOC?? increases with dimension.
126- HOT yields compact events of nontrivial size.
- SOC has infinitesimal, fractal events.
HOT
SOC
large
infinitesimal
size
127SOC and HOT are extremely different.
HOT
SOC
128SOC and HOT are extremely different.
HOT
SOC