Querying and Routing in NextGeneration Networks - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Querying and Routing in NextGeneration Networks

Description:

Host-centric protocols defined in terms of IP addresses. ... Gnutella queries from 30 LimeWire Ultrapeers simultaneously on PlanetLab ... – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 56
Provided by: unkn492
Category:

less

Transcript and Presenter's Notes

Title: Querying and Routing in NextGeneration Networks


1
Querying and Routing in Next-Generation Networks
  • Boon Thau Loo
  • Ph.D. Qualifying Exam Proposal
  • 24 Aug 2004

2
Current Internet
  • Current Internet Architecture
  • Host-centric protocols defined in terms of IP
    addresses.
  • Routing functionality is embedded in
    infrastructure.
  • Two limitations
  • Difficult to locate dynamic objects by names.
  • Applications have little control over the path
    followed by their packets.

3
Data-centric Networks
  • Data independence Allow users to name (and
    query) data regardless of location.
  • E.g., Distributed Hash Tables (DHT).
  • Addresses the first limitation, but not the
    second (inflexible routing).

4
Evolution of my work
  • PIER Relational Query Processor on DHTs.
  • Querying the Internet with PIER.
  • Application
  • Enhancing P2P File-Sharing with PIER.
  • Network Monitoring
  • Querying (Gathering) Network Topologies.
  • Influence base network functionality
  • Customizable Routing with Queries.

5
Customizable Routing
  • Emerging topic in networking community.
  • Why is customizing routing important?
  • Flexibility support different application
    requirements.
  • Evolvability of routing infrastructure.
  • Some existing solutions
  • Overlay networks (multicast, RON).
  • Active Networks.
  • Recent ideas i3 Routing Service, NIRA.
  • Proposed solution Customizable Routing with
    Declarative (Recursive) Queries.

6
My Unifying Theme
  • The synergy between Query Processing and Routing
    in networks.
  • Three main contributions
  • Compare P2P search performance for two P2P
    architectures.
  • Querying Network Topologies with Declarative
    (Recursive) Queries.
  • Customizable Routing with Declarative (Recursive)
    Queries.

7
Roadmap
  • P2P Search A Comparative Study.
  • Querying and Routing in Networks with Recursive
    Queries.
  • Research Plans and Timeline.

8
P2P Search A Comparative Study
  • Qualifying Exam Proposal
  • Part I

9
Problem Statement
  • P2P Search
  • Flooding (Unstructured) vs DHTs (Structured) ?
  • Lots of debate, lots of papers, little
    consensus.
  • Why study P2P Search?
  • Canonical P2P application. Live workloads.
  • Good stress test on any P2P design.
  • More robustness. No single point of failure.
  • Social Issues
  • More resistent than centralized systems to
    censoring and manipulated rankings.
  • RIAA.

10
Distributed Hash Tables (DHT)
  • Hash table Interface
  • put(key,object), get(key)
  • Properties
  • If object exists in network, it can always be
    found.
  • Scalable O(log n) hops and state.
  • Robust self-configuring and resilient to
    failures and churn.
  • DHT Search
  • Inverted Lists indexed by Keyword.

11
Two Workloads
  • P2P Web Search (IPTPS 03)
  • P2P File Sharing (IPTPS 04, VLDB 04)
  • Less demanding application
  • Smaller dataset (millions of files).
  • Index by filenames and metadata.
  • Less stringent user requirements.
  • Replicas of items follow a long-tailed
    distribution.
  • Popular items at head of distribution.
  • Rare items at tail of distribution.

12
Gnutella Measurements
  • Main areas of study
  • Gnutella Topology
  • Crawl from multiple vantage points on PlanetLab.
  • Search Quality
  • Measure query results size and latency.
  • Reissue Gnutella queries from 30 LimeWire
    Ultrapeers simultaneously on PlanetLab
    Approximate perfect answer.

13
Summary of Measurements
  • Queries with few results are searching for rare
    items.
  • Searching on Gnutella
  • Highly effective for popular items.
  • Less effective for rare items.
  • Significant opportunity to do better.
  • Large fraction of queries return few or no
    results even when they exist.
  • Bad response times for queries on rare items.

14
Hybrid Solution
Flood-based Network (All items)
DHT (Index Rare Items)
15
PlanetLab Deployment
L
L
Horizon of P1
U2
L
L
P2
P1
U1
L
P3
L
L
L
L
L
L
Gnutella Leaf
Gnutella Ultrapeer
U
Gnutella links
Hybrid Ultrapeer (PIER Gnutella)
P
PIER links
16
Gnutella Measurement Study Important Lessons
  • Gathering Gnutella Topology requires recursive
    link traversals.
  • Knowledge of topology can improve the quality of
    search.
  • Challenge
  • Can we perform topology discovery efficiently and
    accurately?
  • If so, what other functionality can we provide?

17
Querying and Routing in Networks with Recursive
Queries
  • Qualifying Exam Proposal
  • Part II

18
Introduction
  • The Internet is made up of distributed graphs!
  • IP Routers.
  • Overlay networks.
  • WWW Hypertext structures.
  • Recursive queries could be used to discover
    topologies.
  • A recursive query engine is an attractive routing
    infrastructure!
  • Well see routing protocols (DV, DSR) are just
    recursive queries.
  • Customizable End-hosts can express their own
    route desires.
  • Efficient? Query Optimization Techniques.

19
Outline
  • Intro to Recursive Queries.
  • Applications.
  • Recursive Query Processing and Optimization
    Techniques.
  • Current Status.
  • Research Agenda and Timeline.

20
Background Datalog Program
R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)
  • R1 1-hop reachable
  • R2 ? 2 hop
  • reachable(a,N) reachable from node a

L(S,Z)
R(Z,D)
S
Z
D
R(S,Z), R(S,D)
21
Datalog Facts
  • Base Facts
  • Supplied to query processor.
  • node(nodeID, load, ), nodeID is either an IP
    address or DHT identifier.
  • link(source, destination, cost, )
  • Derived Facts
  • Intermediate data
  • reachable(source, destination)
  • path(source, destination, path, cost)
  • Result Facts
  • Sent back as query results, or stored in
    network.
  • E.g., ShortestPath, NextHop

22
Execution Model
  • Distributed Query Processing
  • Each node embeds query processing functionality.
    PIER is an example system but this model is not
    constrained to DHTs.
  • Each query processor has access to local base
    facts.
  • Query Execution
  • Recursive Query is issued by one of the nodes.
  • Disseminated to all or subset of other nodes for
    execution.
  • Wrapper for external networks
  • PIER runs a recursive crawl query to gather
    information on external network.
  • Each node is responsible monitoring a subset of
    external network nodes.

23
New Challenges
  • Different Metrics
  • Centralized I/O, CPU, Number of facts.
  • Distributed Communication Overhead, Latency.
  • Network is dynamic and soft-state.
  • Long running, concurrent queries.

24
Parallel and Distributed Deductive Databases
  • Parallel
  • Hash-based partitioning (data fragmentation).
  • Direct (matrix methods).
  • Distributed
  • Semantic Fragmentation (Disconnection Set).
  • Main differences
  • Multi-hop environment
  • Relationship to routing algorithms.
  • Setting up network state.
  • Dynamic graphs.
  • Long-running, concurrent queries.
  • Smaller scale compared to ours.

25
App 1 Network Topology Monitoring
  • Gnutella Monitoring Service
  • Search horizon statistics (number of nodes,
    files)
  • Diameter of the network.
  • Robustness of the network.
  • Direct search query towards high degree nodes.
  • Study a DHT under churn
  • Dynamic Resilience How many possible live
    paths are there between any two nodes?
  • Average Path Length Given routing algorithm,
    what is the average number of hops between any
    two nodes?
  • Check for invariants.

26
App 2 Customizable Routing Infrastructure
  • Best-Path routing
  • Shortest Path (Distance Vector)
  • Shortest-k-paths
  • Least-loaded path
  • Disjoint-Paths greedy routing
  • Disjoint-k-paths (edge and node disjoint)
  • Dynamic Source Routing (DSR)
  • Policy Decision
  • Paths that include/exclude certain nodes
  • Do not carry/trust traffic from certain nodes.
  • What are Datalogs limitations?

27
Recursive Query Processing in Networks
  • Introduction to Recursive Queries
  • Applications
  • Recursive Query Processing and Optimization
  • The Basics
  • Datalog ? Query Plan
  • Query Execution
  • Query Optimization Techniques
  • Work-Sharing
  • Queries over Dynamic Graphs
  • Current Status
  • Research Plan

28
Datalog ? Query Plan
  • R1 reachable(S,D) - link(S,D)
  • R2 reachable(S,D) - link(S,Z), reachable(Z,D)
  • Query reachable(M,N)

R2
Ship tuples to table.field
R1
29
Query Execution
l(a,b), l(a,c)
a
r(a,b), r(a,c)
0th Iteration
l(a,g), l(a,d), l(b,e)
  • R1 reachable(S,D) - link(S,D)
  • R2 reachable(S,D) - link(S,Z), reachable(Z,D)
  • Query reachable(M,N)

l(c,e)
c
b
r(c,e)
r(a,g), r(a,d), r(b,e)
l(e,f)
d
e
l(d,f)
r(e,f)
r(d,f)
l(f,h)
f
g
r(f,h)
l(g,f)
r(g,f)
l(h,i)
h
r(h,i)
i
30
Query Execution
a
r(a,b), r(a,c)
1st Iteration
l(a,c)
l(a,b)
r(b,d), r(b,e), r(b,g)
  • R1 reachable(S,D) - link(S,D)
  • R2 reachable(S,D) - link(S,Z), reachable(Z,D)
  • Query reachable(M,N)

c
r(c,e)
b
l(b,e)
l(c,e)
l(b,d)
d
e
r(e,f)
r(d,f)
l(b,g)
l(e,f)
l(d,f)
l(g,f)
f
r(f,h)
g
l(f,h)
r(g,f)
h
r(h,i)
l(h,i)
i
31
Network-Reachability Query
r(a,b), r(a,c), r(a,e), r(a,d), r(a,g)
a
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
l(a,b)
r(c,e), r(c,f)
  • R1 reachable(S,D) - link(S,D)
  • R2 reachable(S,D) - link(S,Z), reachable(Z,D)
  • Query reachable(M,N)

c
b
l(a,c)
r(e,f), r(e,h)
r(d,f), r(d,h)
d
e
l(b,e), l(c,e)
l(b,d)
l(d,f), l(g,f), l(e,f)
r(f,h), r(f,i)
f
g
r(g,f), r(g,h)
l(b,g)
r(h,i)
h
l(f,h)
i
l(h,i)
32
Network-Reachability Query
r(a,b), r(a,c), r(a,e), r(a,d), r(a,g), r(a,f)
a
3rd Iteration
l(a,b)
r(b,d), r(b,e), r(b,g), r(b,f), r(b,h)
r(c,e), r(c,f), r(c,h)
  • R1 reachable(S,D) - link(S,D)
  • R2 reachable(S,D) - link(S,Z), reachable(Z,D)
  • Query reachable(M,N)

c
b
l(a,c)
r(d,f), r(d,h), r(d,i)
l(b,e), l(c,e)
r(e,f), r(e,h), r(e,i)
d
e
l(b,d)
l(d,f), l(g,f), l(e,f)
f
r(f,h),r(f,i)
g
r(g,f), r(g,h), r(g,i)
l(b,g)
h
r(h,i)
l(f,h)
i
l(h,i)
33
Remarks
  • Resembles Distance Vector Protocol
  • Computation begins with initial reachable set and
    shipping it to all neighbors. (R1)
  • Neighbors update reachable set with its own
    neighborhood set, and forward resulting reachable
    set to neighbors. (R2)
  • Computes all-pairs paths Work is shared by all
    nodes.
  • Converges after Network-Diameter rounds of
    communication.

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
34
Distance Vector Routing
  • Routing Table Formation
  • An entry in the routing table nextHop(src,dst,nex
    thop,cost)
  • R1 path(S,D,D,C) - link(S,D,C)
  • R2 path(S,D,Z,C) - link(S,Z,C1),
    path(Z,D,W,C2), CC1C2
  • R3 bestPathLength(S,D,min) -
    path(S,D,Z,C)
  • R4 nextHop(S,D,Z,C) - nextHop(S,D,Z,C),
  • bestPathLength(S,D,C)
  • Changes
  • New rules R3 and R4.
  • Path stores the next hop in path.

35
Query Optimization Techniques
  • The Basics
  • Query Optimization
  • Reduce communication overhead
  • Magic Sets Rewrite
  • Left-Right Recursion Rewrite
  • Aggregate Selections
  • Reduce latency
  • Squaring Algorithm
  • Work Sharing
  • Queries over Dynamic Graphs

36
Opt 1 Magic Sets Rewrite
  • R1 magicSource(D) - magicSource(S), link(S,D)
  • R2 reachable(S,D) - magicSource(S), link(S,D)
  • R3 reachable(S,D) - magicSource(S), link(S,Z),
    reachable(Z,D)
  • R4 magicSource(b)
  • R5 magicSource(e)
  • Query reachable(M,N)

What if we do not need the recursive query to be
computed on a portion of the graph?
37
Combine common expressions
  • R1 magicSource(D) - s(S,D)
  • R2 reachable(S,D) - s(S,D)
  • R3 reachable(S,D) - s(S,Z), reachable(Z,D)
  • R4 magicSource(b)
  • R5 magicSource(e)
  • R6 s(S,D) - magicSource(S), link(S,D)
  • Query reachable(M,N)

38
Optimized Magic Rewrite Plan I
R1 magicNodes(D) - s(S,D) R2 reachable(S,D) -
s(S,D) R3 reachable(S,D) - s(S,Z), reachable(Z
,D) R4 magicSource(b) R5 magicSource(e) R6 s
(S,D) - magicSource(S), link(S,D)
Query ?-reachable(M,N)
R3
R2
R1
R6
39
Optimized Magic Rewrite Plan II
R1 magicNodes(D) - s(S,D) R2 reachable(S,D) -
s(S,D) R3 reachable(S,D) - s(S,Z), reachable(Z
,D) R4 magicSource(b) R5 magicSource(e) R6 s
(S,D) - magicSource(S), link(S,D)
Query ?-reachable(M,N)
R3
R1
R2
R6
40
Executing Magic Reachable
0th Iteration
a
l(a,b), l(a,c)
l(b,d), l(b,e), l(b,g)
l(c,e)
c
b
r(b,d), r(b,e), r(b,g)
l(e,f)
l(d,f)
d
e
r(e,f)
f
l(f,h)
g
l(g,f)
l(h,i)
h
i
41
Executing Magic Reachable
1st Iteration
a
r(b,d), r(b,e), r(b,g)
c
b
l(b,e)
l(b,d)
r(d,f)
d
e
l(b,g)
l(e,f)
f
r(f,h)
g
r(g,f)
h
i
42
Executing Magic Reachable
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
r(e,f), r(e,h)
r(d,f)
l(b,d)
l(b,e)
l(d,f)
l(g,f)
r(f,h)
l(e,f)
l(f,h)
r(g,f)
l(b,g)
r(h,i)
43
Remarks on Magic Rewrite
a
c
b
  • Magic sets limits computation to portion of
    graph.
  • Limit by source and/or destination.
  • Few problems
  • Does not help for undirected graphs.
  • Still some redundant fact generation.
  • Increased number of iterations

d
e
f
g
h
i
44
Opt 2 Left-Right Recursion Rewrite
  • R1 reachable(S,D) - magicSource(S), link(S,D)
  • R2 reachable(S,D) - reachable(S,Z), link(Z,D)
  • R3 magicSource(b)
  • R4 magicSource(e)
  • Query ?-reachable(M,N)

R2
R1
45
Executing Left-Recursion Plan
a
1st Iteration
r(b,d), r(b,e), r(b,g)
c
b
r(b,e)
r(b,d)
d
e
r(e,f)
l(d,f)
r(b,g)
r(e,f)
f
l(f,h)
g
l(g,f)
h
l(h,i)
i
46
Executing Left-Recursion Plan
a
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
c
b
r(e,f), r(e,h)
d
e
l(d,f)
r(b,f)
r(b,f)
r(b,f)
f
l(f,h)
g
r(e,h)
l(g,f)
h
i
47
Executing Left-Recursion Plan
a
3rd Iteration
r(b,d), r(b,e), r(b,g), r(b,f), r(b,h)
c
b
r(e,f), r(e,h), r(e,i)
d
e
l(d,f)
f
l(f,h)
g
r(b,h)
l(g,f)
l(h,i)
h
r(e,i)
i
48
Remarks on Left Rewrite
a
  • Resembles Dynamic Source Routing (DSR). Same
    query, different routing protocol !!
  • Lower communication overhead when there are few
    source nodes.
  • Sharing is not implicit in query. Each node
    computes its facts independently of other nodes.

c
b
e
d
R1 reachable(S,D) - magicSource(S), link(S,D)
R2 reachable(S,D) - reachable(S,Z), link(Z,D)
R3 magicSource(b) R4 magicSource(e) Query ?-r
eachable(M,N)
f
g
h
i
49
Work Sharing
  • Sharing among queries with identical rules
  • If all nodes running the same query, use
    right-recursion.
  • If only a few nodes running the same query, use
    left-recursion.
  • Switching from left-to-right recursion can be
    expressed as a rewrite

a
c
b
e
d
R1 reachable(S,D) - magicSource(S), link(S,D)
R2 reachable(S,D) - ?magicSource(Z),
reachable(S,Z), link(Z,D)
R3 reachable(S,D) - magicSource(Z),
reacjable(S,Z), reachable(Z,D)
f
g
h
i
50
Work Sharing
  • Merge common expressions for queries with
    different rules

R1 path(S,D,P,C,L) - link(S,D,C2,L2),
path(S,D,P1,C1,L1),
PconcatPath(link(X,Z), P1),
CFUN1(C1,C2), LFUN2(L1, L2)
R2 path(S,D,P,C) - link(X,Y,C),
PconcatPath((link(X,Y), nil) R3 bestPath(S,D,AG
G1(,AGG2()) - path(S,D,P,C,L)
51
Queries over Dynamic Graphs
  • In practice, queries (both routing and
    monitoring) are long running.
  • Timestamp derived facts
  • Each base fact is maintained as soft-state with
    timestamp.
  • Timestamp derived facts based on oldest base fact
    used in computation.
  • Maintain state of long-running queries
  • Incremental computation when new base facts are
    added.

52
Current Status
  • PIER supports cycles in query plans.
  • Naïve implementations of web and gnutella
    crawlers. Test-runs on PlanetLab.
  • PIER hand-optimized dataflows
  • Left vs Right recursion.
  • Magic Sets.
  • Semi-naïve vs Squaring algorithm.
  • Aggregate Selections.
  • Reachable, Shortest Path, Diameter Queries.
  • Static graphs, no sharing.
  • Initial PIER simulation results detailed in tech
    report
  • Querying Network Graphs with Recursive Queries.
    UC Berkeley Technical Report UCB//CSD-4-1332, Jun
    2004.

53
Datalog ? Optimized Plan
  • Bulk of research work.
  • Data placement.
  • Magic Sets
  • Directed vs Undirect.
  • How many participating nodes?
  • Left vs Right recursion
  • How many participating nodes?
  • Aggregate Selections
  • Density of network?
  • Monotonic aggregate?
  • Semi-naïve vs Squaring
  • Density of network.

54
Datalog ? Optimized Plan
  • Work-sharing / Materialization
  • How many participating nodes? Sources?
    Destinations?
  • Presence of similar overlapping queries?
  • Presence of queries with common rules or
    correlated metrics?
  • Presence of hub nodes?
  • Queries restricted to a domain?
  • Good vs Best plan?
  • What if conditions change during query execution?
    Optimal plan may become sub-optimal Adaptive
    optimization techniques.

55
Research Plan (22 month timeline)
Infrastructure
Applications
  • Phase I (Sept 04 Dec 04)
  • Hand-Optimized dataflows
  • Tradeoffs of different query plans
  • Basic sharing
  • Long-running queries
  • Phase II (Jan 05 Dec 05)
  • Datalog parser
  • Automatic plan generation
  • Enhanced Sharing
  • Other advanced features
  • Phase I (Nov 05 - Aug 05)
  • Gnutella Monitoring Service
  • Routing Infrastructure Service
  • Phase II (Sept 05 - Dec 05)
  • Decentralized Focused Web Crawler
  • Study of Different DHTs under churn

Jan-Jun 2006 Wrap up
Write a Comment
User Comments (0)
About PowerShow.com