Title: Cost Aware Resource Management for Decentralized Network Services
1Cost Aware Resource Management for Decentralized
Network Services
- Venugopalan Ramasubramanian (Rama)
- Microsoft Research Silicon Valley / Cornell
University
2Introduction
- decentralized services have become increasingly
important - e.g. name systems, CDNs, publish-subscribe
- low latency, constant availability, and high
scalability - current services often fall short of required
performance - ad hoc techniques
3Problems with Ad hoc Techniques
- no performance guarantees
- unable to quantify/bound performance
- unable to tune resource utilization to meet
performance targets - tailored to specific workloads
- e.g. opportunistic caching on 90/10 rule
- heavy-tailed popularity distributions
- mutable objects
4Principled Approach
- fundamental cost-performance tradeoff
- e.g. lookup latency vs. memory / bandwidth
consumption - resource allocation problem
- which node hosts which object?
- depends on popularity, size, update rate, etc.
5Prior Work
- Scalability
- high complexity even to express the problem
- number of objects x number of nodes (M x N)
- Decentralization
- objects are distributed among multiple nodes
- expensive to perform resource allocation centrally
6Cost-Aware Resource Management Framework
- high performance, robust, and scalable services
- Mathematical Optimization
- system-wide performance goals become constraints
to optimization problems - Min. cost s.t. performance meets target
- Max. performance s.t. cost limit
- Structured Overlays
- decentralization and self-organization
- well-defined topology with bounded diameter and
node degree
7Decentralized Internet Services
- name service for the Internet
- Cooperative Domain Name System (CoDoNS)
- content distribution network
- Cooperative Beehive Web (CoBWeb)
- on-line data monitoring
- Cornell On-line News Aggregator (CorONA)
8Scalable Resource Allocation
- structured overlay
- each object has a home node
- DAG rooted at home node reaching all nodes
- uniform branching-factor
- allocate resources at well-defined levels
- level l means all nodes l hops away from home
node - low complexity resource allocation
- Number of objects x Diameter (e.g. M x log N)
- practical and scalable
9Structured Overlays Pastry
prefix-matching logbN hops
2012
10Opportunistic Caching in Pastry
2012
11Structured Resource Allocation
- analytically model performance-overhead tradeoff
- object replicated at all nodes with l matching
prefix-digits - lookup latency l hops
- replicas N/bl
- inexpensive to locate and update replicas
0021
0112
0122
2012
12Outline
- Introduction
- Honeycomb Framework
- Optimization Analysis
- Implementation
- Applications
- Evaluation
- Conclusions
13Analytical Modeling
- level of allocation (l)
- object hosted at all nodes l hops from the home
node - optimization problem find optimal values of li
- min. ? Ci(li), s.t. ? Pi(li) ? T
- max. ? Pi(li), s.t. ? Ci(li) ? T
- performance variables
- lookup latency, update latency
- cost variables
- memory consumption, network overhead, number of
nodes
14Optimization Problem Lookup Latency
- min. ? ci . bli s.t., ? qi (D - li) ? TL
total overhead
avg. lookup latency
TL target lookup latency in hops qi relative
query frequency ci replication cost of object i
objects M, nodes N, branching factor b, diameter D
15Resource Allocation for Lookup Performance
- target avg. lookup latency hops
- sub-one hop, fractional values (e.g., 0.5 hops)
- indirectly specifies cache hit ratio
- worst case lookup latency
- lower bound on l
- optimizes multiple overhead metrics
- number of nodes c 1
- memory c size of object
- bandwidth c size x update rate
16Analytical Optimization (Beehive)
- Zipf popularity distribution (e.g. DNS, Web, RSS)
- analytically tractable (one parameter ?)
- closed-form solution
- inexpensive to compute and apply
Ramasubramanian and Sirer NSDI 04
17Numerical Optimization
- general-purpose approach
- any popularity distribution (including Zipf)
- many cost metrics (fine-grained bandwidth
consumption) - many performance metrics (update latency)
- optimization problem is NP-Hard
- Multiple choice Knapsack problem
- discrete, convex, and separable
- fast and accurate approximation algorithm
- O(M D log(M D)) running time
- at most one object per node (more or less than
optimum)
18Numerical Optimization 2
- Lagrange multiplier
- min. ? C(lm) ? ? P(lm) T
- bisection-based bracketing algorithm
- upper and lower bound solutions that differ in
one channel yields near-optimal solution - pre-computation and sorting of ?s before
iterating yields O(MD log (MD)) algorithm
19Honeycomb
- cost-aware resource allocation framework for
structured overlays - properties
- system-wide performance goals
- scalability and failure resilience
- quick adaptation to workload
- fast update propagation
20Scalable Resource Management
- independent decisions
- local aggregation
- estimate popularity
- communication only with overlay neighbors
- replicas managed by one-hop neighbors
21Scalable Resource Management
- independent decisions
- local aggregation
- estimate popularity
- communication only with overlay neighbors
- replicas managed by one-hop neighbors
22Decentralized Optimization
- global optimum requires global information
- Using local knowledge alone leads to sub-optimal
solutions - solution
- approximate tradeoffs for non-local channels
- aggregate coarse-grained information between
neighbors
23Decentralized Optimization 2
- approximate parameters
- cluster channels with similar values of P(l) /
C(l) - constant number of clusters per level
24Decentralized Optimization 3
- Aggregating Clusters
- Exchange clusters with one-hop neighbors
- Hierarchical aggregation through structured
overlay
25Adaptation to Workload Changes
- popularity of objects may change drastically
- flash-crowds, denial of service attacks
- nodes measure popularity for local objects and
aggregate popularity estimates with neighbors
26Adaptation to Workload Changes 2
- orders of magnitude difference in query rates of
popular and unpopular objects - solution combine inter-arrival times and query
counts - estimation times proportional to the query rate
of the object - monitoring overhead proportional to the query
rate of the object - quick detection of large increases in query rate
27Honeycomb Fast Update Propagation
- single integer (replication level) indicates
locations of all objects - no TTL required
- proactively propagate updates
- use neighbors in the underlying overlay
- increasing version numbers differentiate versions
- lazy updates in background
28Outline
- Introduction
- Honeycomb Framework
- Applications
- Name service (CoDoNS)
- Content distribution network (CoBWeb)
- On-line data monitoring system (CorONA)
- Evaluation
- Conclusions
29CoDoNS Cooperative Domain Name System
- legacy DNS has fundamental problems
- poor failure resilience due to limited
replication - high response times due to multi-hop lookups
- no support for spontaneous updates
- cooperative cache for DNS bindings
LegacyDNS
Ramasubramanian and Sirer SIGCOMM 04
30CoDoNS Cooperative Domain Name System
- structured, proactive caching of name-data
mappings - targets avg. lookup latency of (0.5 hops)
- minimizes memory consumption
- updates pushed proactively to all caching nodes
- self-certifying data to preserve integrity
(DNS-SEC) - incremental deployment path
- safety-net for legacy DNS
- deployed on Planet-Lab
31CobWeb Cooperative Beehive Web
- Web caches
- passive, client driven
-
- Content Distribution Networks
- active, replication driven
- e.g. Akamai, Digital Island (commercial), CoDeeN,
CoralCDN (academia) - web caching solutions based on heuristics
- ideal cache hit rate (60-70) Wolman et al. 01
- achieved cache hit rate (20-40) Breslao et al.
99, Wolman et al. 01
32CobWeb Cooperative Beehive Web
- CobWeb is a cooperative web cache
- high cache hit rate through structured, proactive
caching - low network overhead using object size and update
rate - adaptation to flash crowds
- CobWeb performance goals
- min. network bandwidth s.t. cache hit rate meets
a target - max. cache hit rate s.t. network bandwidth is all
consumed
33CobWeb Cooperative Beehive Web
- user interfaces
- append cob-web.org to urls
- e.g., http//slashdot.org.cob-web.org8888
- DNS redirection, URL rewriting
- Meridian finds closest node to the client
- deployed on Planet-Lab
- greater than10 million requests per day
34Corona Monitoring Online Data
- continuously monitoring and detecting changes is
crucial - e.g., web pages, sensors, databases
- content servers only provide query-based
interface - naïve approach through repeated, independent
polling - bad update performance
- high server load
35Corona Monitoring Online Data
- publish-subscribe interface for monitoring web
urls - cooperative polling
- resource allocation decides how many nodes poll
each channel
Ramasubramanian, Peterson, and Sirer NSDI 06
36Corona Performance Goals
- Corona Lite
- Min. update detection time s.t. network load is
bounded - Corona Fast
- Min. network load s.t. update detection time
meets a target - Corona Fair
- Min. relative update detection time s.t. network
load is bounded - ratio of update detection time to update interval
37Outline
- Introduction
- Honeycomb Framework
- Applications
- Evaluation
- Conclusions
38CoDoNS Lookup Latency
MIT-DNS trace 265111 queries, 30000 names, 65
nodes
CoDoNS gives 1.5 to 2 times better latency
39CoBWeb Lookup Performance
NLANR Workload 1024 nodes, 10,000 objects, 100,
000 queries
40CoBWeb vs. Opportunistic Caching
Lookup Latency
41CoBWeb vs. Opportunistic Caching
Storage Overhead
42CoBWeb Flash Crowd
Lookup Latency
43CoBWeb Flash Crowd
Network Bandwidth
44Corona Update Performance
Corona improves update detection time from 15 min
to 45 sec
Corona keeps load lower than Legacy RSS
45Corona Update Performance
Heuristics vs. Corona
46Conclusions
- enables high performance, robust, and scalable
network services -
- principled approach for achieving performance
goals in distributed systems - mathematical optimization and structured overlays
- CoDoNS, CobWeb, and Corona
47Other Research in Wireless Networks
- Sharp hybrid adaptive routing prorocol for mobile
ad hoc networks Mobihoc 03 - combines proactive and reactive approaches to
routing to achieve high performance efficiently - SRL bidirectional abstraction to support routing
protocols on asymmetric mobile ad hoc networks
INFOCOM 02 - Anonymous Gossip improving multicast reliability
on mobile ad hoc networks ICDCS 01
48(No Transcript)