Title: Robust Distributed Systems: an inference based approach
1Robust Distributed Systemsan inference based
approach
Willem de BruijnVrije Universiteit
Amsterdamwdb_at_few.vu.nl
2Problem in distributed systems too much
complexity
Solution minimize human intervention it is
costly, error-prone, and slow
3Direction
Simplify interaction with the environment
Example Tasks patch all hosts running daemon X
keep all daemons patched to their latest version
keep all daemons in the system operating
correctly
plot a map of the current network setup a
cross-institutional grid plug a network flood
I want to print Y at the nearest printer I want
to watch program Z at home tonight I want to
schedule a meeting for our group next week
4ChallengeMethodArchitectureImplementationC
ase StudiesConclusions
5Method
Bottom-up move from low-level operations towards
higher-level tasks Inference-based construct
tasks from formal templates Adaptive optimize
code using runtime system state
6Architecture
7Reasoner Architecture
constraint solver creates tasks from templates
and state conductor communicates with fabric,
peers and users
8ChallengeMethodArchitectureImplementationC
ase StudiesConclusions
9Task Description
basic operations
10Task Description
composite tasks
Atask(...), Btask(...), Ctask(...),
Dtask(seq(A,B,C),_), task(par(D,D,A,_)
11Task Description
extensions
recurrent tasks
event handling
12Task Adaptation
13Template-driven Adaptation
14Model-driven Adaptation
dot implements a plotter graph implements a
plotter dot is available at nodeX dot produces
compressed SVG dot produces PNG dot accepts
x/dotsrc files nodeX has Y free cycles
15ChallengeMethodArchitectureImplementationC
ase StudiesConclusions
16Case Studies
location-aware printing service distributed make
tool adaptive webserver distributed system
monitor distributed system controller resource-awa
re job scheduler
17Case Studiesa self-organising distributed
webserver
location(dot,Host)- available(dot,Host),
free(Host,FreePct), FreePct gt 10.
18Case Studiesa self-organising distributed
webserver
19Case StudiesOpenPBS Cluster Control
background_task(pbslog_read,300)
event(node_down)
task(svc_restart)
20ChallengeMethodArchitectureImplementationC
ase StudiesConclusions
21Unique Features
practical generic solution for existing
systems reusable open, integrated modeling
language dependable builds on proven
technology adaptive generates solutions based on
runtime state
22Shortcomings and Future Work
23Concluding Remarks
practical, general purpose automation support for
existing hard- and software simplifies
interaction with the environment case-studies
seem promising, but implementation needs work
more info www.few.vu.nl/wdb/betagis
24A Semantic Knowledge Plane
25Case Studiesubiquitous printing service