SECURE INFORMATION DISCOVERY - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

SECURE INFORMATION DISCOVERY

Description:

http://beehive.cs.cornell.edu/ planned expansion to ISPs. e.g. CNNIC, China ... Cooperative Beehive Web. high cache hit rate through analysis driven caching ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 56
Provided by: anna93
Category:

less

Transcript and Presenter's Notes

Title: SECURE INFORMATION DISCOVERY


1
SECURE INFORMATION DISCOVERY WITH SELF-ORGANIZING
OVERLAYS
Emin Gün Sirer Cornell University
2
Introduction
  • high-performance systems involve non-trivial
    tradeoffs
  • performance achieved vs. resources consumed
  • typical systems use ad hoc techniques to resolve
    these tradeoffs
  • many heuristics are possible
  • usually determined by matching a trace
  • little to keep heuristics in check

3
Heuristics Considered Harmful
  • heuristics are ubiquitous in distributed systems
  • Akamai, other CDNs
  • DNS
  • publish-subscribe systems
  • filesystems
  • operating systems
  • undergraduate textbooks discuss heuristics at
    length
  • LRU vs. MRU vs. FIFO vs. LIFO vs. clock vs. ...
  • bad science
  • doesnt work well, cant say for sure

4
Approach
  • abolish all heuristics!
  • treat systems problems as mathematical
    optimizationproblems
  • Not like the usual optimization we do in
    computer science
  • formalize fundamental tradeoffs
  • maximize performance or minimize cost subject to
    constraints
  • handle multiple constraints
  • solve analytically or numerically

5
Domain
  • novel approach, general, applicable to many
    problems
  • in contrast with typical practices
  • from the OS level up to large scale distributed
    systems
  • particularly compelling for domains where
    resources are at a premium
  • network infrastructure services

6
Overview
  • examine tradeoffs in peer-to-peer systems
  • lookup performance vs. storage overhead
  • update performance vs. network overhead
  • analytically model performance-overhead tradeoff
  • pose optimization problems
  • min overhead s.t. performance ? target
  • max performance s.t. overhead ? availability

7
Self-Organizing Infrastructure Services
  • peer-to-peer architectures provide
  • failure resilience,
  • scalability
  • bounded worst case latency
  • but not good average-case performance!
  • Honeycomb analysis driven resource management
    framework for structured overlays
  • layered on top of a p2p substrate

8
Self-Organizing Overlays
object 0121 hash(www.cnn.com)
prefix-matching logbN hops fault-tolerant
scalable good worst-case performance - lg N
is still large
2012
9
Self-Organizing Overlays
2012
Standard approach to caching does not work!
10
Honeycomb Caching
  • analytically model performance-overhead tradeoff
  • object replicated at all nodes with i matching
    prefix-digits
  • lookup latency i hops
  • replicas N/bi
  • inexpensive to locate and update replicas

0021
0112
0122
2012
11
Outline
  • Introduction
  • Honeycomb Framework
  • Optimization Analysis
  • Implementation
  • Applications
  • Evaluation
  • Conclusions

12
Analytical Modeling
  • level of replication (l)
  • object replicated at all nodes l hops from the
    home node
  • optimization problem find optimal values of li
  • min. ? Ci(li), s.t. ? Pi(li) ? T
  • max. ? Pi(li), s.t. ? Ci(li) ? T
  • performance variables
  • lookup latency, time between updates
  • cost variables
  • storage overhead, network overhead, load

13
Optimization Problem Lookup Latency
  • min. ? ci . bli s.t., ? qi (K - li) ? T

total overhead
avg. lookup latency
T target lookup latency in hops qi relative
query frequency ci replication cost of object i
objects M, nodes N, branching factor b, diameter K
14
Analysis-Driven Caching for Lookup Performance
  • enables applications to tune performance
  • target avg. lookup latency hops, cache hit ratio
  • better than one hop (e.g., 0.5 hops)
  • failure resilience
  • lower bound on li forces minimal extent of
    replication
  • optimizes multiple overhead metrics
  • replicas ci 1
  • storage ci size of object
  • bandwidth ci size x update rate

15
Analytical Solution (Beehive)
  • Zipf popularity distribution (e.g. DNS, Web)
  • analytically tractable (one parameter ?)
  • closed-form solution
  • optimal
  • inexpensive to compute and apply

16
Latency vs. Overhead Tradeoff
100 x 106 objects
x106
17
Numerical Solution
  • optimization problem is NP-Hard
  • integral levels of replication
  • developed fast and accurate approximation
    algorithm
  • O(M log(M) log(N)) running time
  • at most one object per node (more or less than
    optimal)
  • advantages
  • any popularity distribution (including Zipf)
  • use object size to minimize storage overhead
  • use update rate to minimize network overhead

NSDI 06
18
Numerical Algorithm Overview
  • Lagrange multiplier (?)
  • min. ?Ci(li) - ?(?Pi(li) T)
  • bisection method on ?
  • limit ? values distinct Ci(li)/Pi(li) values (at
    most MK values)
  • only log(MK) iterations, each iteration O(MK)
    time
  • determine upper-bound solution and lower bound
    solution
  • differ in li for exactly one object

19
Honeycomb Implementation
  • built Honeycomb with Pastry as underlying overlay
  • properties of Honeycomb
  • high performance
  • scalability and failure resilience
  • quick adaptation to workload
  • fast update propagation

20
Honeycomb Scalability and Failure Resilience
  • scalability
  • independent decisions
  • bloom filters
  • failure resilience
  • communication with overlay neighbors

21
Honeycomb Adaptation to Workload Changes
  • popularity of objects may change drastically
  • flash-crowds, denial of service attacks
  • orders of magnitude difference in query rates of
    popular and unpopular objects
  • each object is monitored independently
  • monitoring overhead proportional to the query
    rate of the object

22
Honeycomb Fast Update Propagation
  • single integer (replication level) indicates
    locations of all objects
  • no TTL required
  • proactively propagate updates
  • use neighbors in the underlying overlay

23
Outline
  • Introduction
  • Honeycomb Framework
  • Applications
  • Cooperative Domain Name System (CoDoNS)
  • Cooperative Beehive CDN (CoBWeb)
  • Web MicroNews Aggregator (CorONA)
  • Evaluation
  • Conclusions

24
Problems with DNS
  • DNS is critical to the Internet
  • DNS architecture is based on delegations
  • control for names is delegated to nameservers
    designated by the name owner
  • delegations enable decentralized administration
    and large scale
  • what about security?

25
Dependencies for www.fbi.gov
zoneedit.com com gtld-servers.net nstld.com net
www.fbi.gov fbi.edgesuite.net a33.g.akamai.net
gov gov.zoneedit.com zoneedit.com
edgesuite.net akam.net g.akamai.net akamai.net aka
maitech.net
26
Most Vulnerable Names
27
Vulnerability to Security Flaws
28
State of DNS Domains
  • DNS TCB is large

29
Most Valuable Servers
30
CoDoNS
  • safety net for DNS
  • high failure resilience
  • self-organization around failures
  • no manual administration
  • low lookup latency
  • fast update propagation

LegacyDNS
31
CoDoNS Security
  • data integrity through DNSSEC
  • cryptographic certificates for bindings
  • use existing certificates where available
  • certification authority signs legacy DNS bindings
  • cryptographic delegation vs. physical delegation
  • opens up competition for name-space management

32
CoDoNS Deployment
  • incremental deployment path
  • backwards compatibility in interface
  • relies on legacy DNS bindings during transition
  • provides data integrity through DNSSEC
  • deployed on Planet-Lab (60-130 nodes)
  • http//beehive.cs.cornell.edu/
  • planned expansion to ISPs
  • e.g. CNNIC, China
  • 500,000 domain names under .cn

33
CoBWeb
  • Content Distribution Network for Web objects
  • e.g. Akamai
  • heuristics-driven caching
  • large number of heuristics have been proposed for
    web caching
  • e.g. CoDeeN, CoralCDN
  • Cooperative Beehive Web
  • high cache hit rate through analysis driven
    caching
  • adaptation to flash crowds
  • low network overhead using object size and update
    rate

34
CoBWeb Architecture
  • user interfaces
  • Web proxy
  • DNS redirection, URL rewriting
  • Just add .cobweb.org8888 to a URL
  • data integrity
  • threshold cryptography
  • trust distributed among a quorum of nodes
  • CoBWeb clients need to get redirected to the
    nearest server

35
Meridian Operation
  • Solve node selection directly without computing
    coordinates
  • Combine query routing with active measurements
  • Framework
  • Loosely structured overlay network
  • Algorithms
  • Solve network location problems in O(log D) hops
  • Language
  • General-purpose language for expressing network
    location requirements

36
CorONA
  • Cornell Online News Aggregator
  • publish-subscribe based delivery of Web MicroNews
  • news aggregation is currently done through
    polling
  • clients do not see updates quickly
  • servers have to handle huge load because of
    sticky traffic
  • CorONA provides publish-subscribe interface
  • asynchronous update notification
  • quick update detection through cooperative polling

37
Analytical Model for News Aggregation
  • minimize time between news updates, while not
    exceeding load on the content servers
  • min. ? 1/(ri . bli) s.t. ? ri . bli ? T

total load
avg. time between updates
ri server imposed limit on polling rate T total
load imposed by clients
objects M, nodes N, branching factor b, diameter K
38
CorONA Operation
  • instant messenger interface as the front-end GUI
  • user adds cornellcorona as a buddy
  • sends a URL that she would like to monitor
  • Corona monitors URL, sends IMs containing recent
    news
  • difference engine in the back-end
  • can extract relevant updates from news and blog
    sites
  • works with RSS feeds as well as regular URLs
  • No change to existing web publishing
    infrastructure
  • easy to deploy and use

39
CorONA Operation
  • a set of nodes cooperatively monitor data sources
    with near-optimal frequency

40
Outline
  • Introduction
  • Honeycomb Framework
  • Applications
  • Evaluation
  • lookup latency
  • adaptation to flash crowds
  • update intervals
  • Conclusions

41
CoDoNS Lookup Latency
MIT-DNS trace 265111 queries, 30000 names, 65
nodes
CoDoNS gives 2 to 18 times better latency
42
CoBWeb Lookup Latency
IR Cache Records 100000 queries, 25000 urls, 75
nodes
CoBWeb gives 1.5 to 2 times better latency
43
CorONA Update Performance
1000 channels, 100000 subscriptions, 600 clients,
60 nodes
CorONA gives 3 orders of magnitude improvement
44
CorONA Server Load
CorONA does not exceed bounds on server load
45
Conclusions
  • principled approach can provide qualitative
    improvements
  • heavy-tailed distributions long thought to be a
    difficult problem
  • Combining self-organizing substrates with
    cost-conscious resource management yields new
    services with unprecedented performance and
    robustness
  • mathematical optimization can be applied to many
    other systems contexts
  • latency, throughput, failure resilience
  • multiple constraints optimization

46
For more info
  • http//www.cs.cornell.edu/People/egs/

47
(No Transcript)
48
Infrastructure Services
  • name service for the Internet
  • Cooperative Domain Name System (CoDoNS)
  • content distribution system for the Web
  • Cooperative Beehive Web (CoBWeb)
  • network positioning (Meridian / closestnode.com)
  • publish-subscribe system for Web Micronews
  • Cornell On-line News Aggregator (CorONA)

49
CoDoNS Flash Crowds
reverse popularity injected
50
CoDoNS Lookup Latency (hops)
MIT-DNS trace 265111 queries, 30000 names, 65
nodes
51
CoBWeb Lookup Latency (hops)
52
CoBWeb Storage Overhead
53
CoDoNS Lookup Latency
MIT-DNS trace 265111 queries, 30000 names, 1024
nodes simulation
54
CoDoNS Network Overhead
55
CoBWeb Network Overhead
Write a Comment
User Comments (0)
About PowerShow.com