Querying and Routing in NextGeneration Networks

About This Presentation

Title:

Querying and Routing in NextGeneration Networks

Description:

Host-centric protocols defined in terms of IP addresses. ... Gnutella queries from 30 LimeWire Ultrapeers simultaneously on PlanetLab ... – PowerPoint PPT presentation

Number of Views:180

Avg rating:3.0/5.0

Slides: 56

Provided by: unkn492

Category:

more less

Transcript and Presenter's Notes

Title: Querying and Routing in NextGeneration Networks

1
Querying and Routing in Next-Generation Networks

Boon Thau Loo
Ph.D. Qualifying Exam Proposal
24 Aug 2004

2
Current Internet

Current Internet Architecture
Host-centric protocols defined in terms of IP
addresses.
Routing functionality is embedded in
infrastructure.
Two limitations
Difficult to locate dynamic objects by names.
Applications have little control over the path
followed by their packets.

3
Data-centric Networks

Data independence Allow users to name (and
query) data regardless of location.
E.g., Distributed Hash Tables (DHT).
Addresses the first limitation, but not the
second (inflexible routing).

4
Evolution of my work

PIER Relational Query Processor on DHTs.
Querying the Internet with PIER.
Application
Enhancing P2P File-Sharing with PIER.
Network Monitoring
Querying (Gathering) Network Topologies.
Influence base network functionality
Customizable Routing with Queries.

5
Customizable Routing

Emerging topic in networking community.
Why is customizing routing important?
Flexibility support different application
requirements.
Evolvability of routing infrastructure.
Some existing solutions
Overlay networks (multicast, RON).
Active Networks.
Recent ideas i3 Routing Service, NIRA.
Proposed solution Customizable Routing with
Declarative (Recursive) Queries.

6
My Unifying Theme

The synergy between Query Processing and Routing
in networks.
Three main contributions
Compare P2P search performance for two P2P
architectures.
Querying Network Topologies with Declarative
(Recursive) Queries.
Customizable Routing with Declarative (Recursive)
Queries.

7
Roadmap

P2P Search A Comparative Study.
Querying and Routing in Networks with Recursive
Queries.
Research Plans and Timeline.

8
P2P Search A Comparative Study

Qualifying Exam Proposal
Part I

9
Problem Statement

P2P Search
Flooding (Unstructured) vs DHTs (Structured) ?
Lots of debate, lots of papers, little
consensus.
Why study P2P Search?
Canonical P2P application. Live workloads.
Good stress test on any P2P design.
More robustness. No single point of failure.
Social Issues
More resistent than centralized systems to
censoring and manipulated rankings.
RIAA.

10
Distributed Hash Tables (DHT)

Hash table Interface
put(key,object), get(key)
Properties
If object exists in network, it can always be
found.
Scalable O(log n) hops and state.
Robust self-configuring and resilient to
failures and churn.
DHT Search
Inverted Lists indexed by Keyword.

11
Two Workloads

P2P Web Search (IPTPS 03)
P2P File Sharing (IPTPS 04, VLDB 04)
Less demanding application
Smaller dataset (millions of files).
Index by filenames and metadata.
Less stringent user requirements.
Replicas of items follow a long-tailed
distribution.
Popular items at head of distribution.
Rare items at tail of distribution.

12
Gnutella Measurements

Main areas of study
Gnutella Topology
Crawl from multiple vantage points on PlanetLab.
Search Quality
Measure query results size and latency.
Reissue Gnutella queries from 30 LimeWire
Ultrapeers simultaneously on PlanetLab
Approximate perfect answer.

13
Summary of Measurements

Queries with few results are searching for rare
items.
Searching on Gnutella
Highly effective for popular items.
Less effective for rare items.
Significant opportunity to do better.
Large fraction of queries return few or no
results even when they exist.
Bad response times for queries on rare items.

14
Hybrid Solution
Flood-based Network (All items)
DHT (Index Rare Items)
15
PlanetLab Deployment
L
L
Horizon of P1
U2
L
L
P2
P1
U1
L
P3
L
L
L
L
L
L
Gnutella Leaf
Gnutella Ultrapeer
U
Gnutella links
Hybrid Ultrapeer (PIER Gnutella)
P
PIER links
16
Gnutella Measurement Study Important Lessons

Gathering Gnutella Topology requires recursive
link traversals.
Knowledge of topology can improve the quality of
search.
Challenge
Can we perform topology discovery efficiently and
accurately?
If so, what other functionality can we provide?

17
Querying and Routing in Networks with Recursive
Queries

Qualifying Exam Proposal
Part II

18
Introduction

The Internet is made up of distributed graphs!
IP Routers.
Overlay networks.
WWW Hypertext structures.
Recursive queries could be used to discover
topologies.
A recursive query engine is an attractive routing
infrastructure!
Well see routing protocols (DV, DSR) are just
recursive queries.
Customizable End-hosts can express their own
route desires.
Efficient? Query Optimization Techniques.

19
Outline

Intro to Recursive Queries.
Applications.
Recursive Query Processing and Optimization
Techniques.
Current Status.
Research Agenda and Timeline.

20
Background Datalog Program
R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

R1 1-hop reachable
R2 ? 2 hop
reachable(a,N) reachable from node a

L(S,Z)
R(Z,D)
S
Z
D
R(S,Z), R(S,D)
21
Datalog Facts

Base Facts
Supplied to query processor.
node(nodeID, load, ), nodeID is either an IP
address or DHT identifier.
link(source, destination, cost, )
Derived Facts
Intermediate data
reachable(source, destination)
path(source, destination, path, cost)
Result Facts
Sent back as query results, or stored in
network.
E.g., ShortestPath, NextHop

22
Execution Model

Distributed Query Processing
Each node embeds query processing functionality.
PIER is an example system but this model is not
constrained to DHTs.
Each query processor has access to local base
facts.
Query Execution
Recursive Query is issued by one of the nodes.
Disseminated to all or subset of other nodes for
execution.
Wrapper for external networks
PIER runs a recursive crawl query to gather
information on external network.
Each node is responsible monitoring a subset of
external network nodes.

23
New Challenges

Different Metrics
Centralized I/O, CPU, Number of facts.
Distributed Communication Overhead, Latency.
Network is dynamic and soft-state.
Long running, concurrent queries.

24
Parallel and Distributed Deductive Databases

Parallel
Hash-based partitioning (data fragmentation).
Direct (matrix methods).
Distributed
Semantic Fragmentation (Disconnection Set).
Main differences
Multi-hop environment
Relationship to routing algorithms.
Setting up network state.
Dynamic graphs.
Long-running, concurrent queries.
Smaller scale compared to ours.

25
App 1 Network Topology Monitoring

Gnutella Monitoring Service
Search horizon statistics (number of nodes,
files)
Diameter of the network.
Robustness of the network.
Direct search query towards high degree nodes.
Study a DHT under churn
Dynamic Resilience How many possible live
paths are there between any two nodes?
Average Path Length Given routing algorithm,
what is the average number of hops between any
two nodes?
Check for invariants.

26
App 2 Customizable Routing Infrastructure

Best-Path routing
Shortest Path (Distance Vector)
Shortest-k-paths
Least-loaded path
Disjoint-Paths greedy routing
Disjoint-k-paths (edge and node disjoint)
Dynamic Source Routing (DSR)
Policy Decision
Paths that include/exclude certain nodes
Do not carry/trust traffic from certain nodes.
What are Datalogs limitations?

27
Recursive Query Processing in Networks

Introduction to Recursive Queries
Applications
Recursive Query Processing and Optimization
The Basics
Datalog ? Query Plan
Query Execution
Query Optimization Techniques
Work-Sharing
Queries over Dynamic Graphs
Current Status
Research Plan

28
Datalog ? Query Plan

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

R2
Ship tuples to table.field
R1
29
Query Execution
l(a,b), l(a,c)
a
r(a,b), r(a,c)
0th Iteration
l(a,g), l(a,d), l(b,e)

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

l(c,e)
c
b
r(c,e)
r(a,g), r(a,d), r(b,e)
l(e,f)
d
e
l(d,f)
r(e,f)
r(d,f)
l(f,h)
f
g
r(f,h)
l(g,f)
r(g,f)
l(h,i)
h
r(h,i)
i
30
Query Execution
a
r(a,b), r(a,c)
1st Iteration
l(a,c)
l(a,b)
r(b,d), r(b,e), r(b,g)

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

c
r(c,e)
b
l(b,e)
l(c,e)
l(b,d)
d
e
r(e,f)
r(d,f)
l(b,g)
l(e,f)
l(d,f)
l(g,f)
f
r(f,h)
g
l(f,h)
r(g,f)
h
r(h,i)
l(h,i)
i
31
Network-Reachability Query
r(a,b), r(a,c), r(a,e), r(a,d), r(a,g)
a
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
l(a,b)
r(c,e), r(c,f)

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

c
b
l(a,c)
r(e,f), r(e,h)
r(d,f), r(d,h)
d
e
l(b,e), l(c,e)
l(b,d)
l(d,f), l(g,f), l(e,f)
r(f,h), r(f,i)
f
g
r(g,f), r(g,h)
l(b,g)
r(h,i)
h
l(f,h)
i
l(h,i)
32
Network-Reachability Query
r(a,b), r(a,c), r(a,e), r(a,d), r(a,g), r(a,f)
a
3rd Iteration
l(a,b)
r(b,d), r(b,e), r(b,g), r(b,f), r(b,h)
r(c,e), r(c,f), r(c,h)

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
Query reachable(M,N)

c
b
l(a,c)
r(d,f), r(d,h), r(d,i)
l(b,e), l(c,e)
r(e,f), r(e,h), r(e,i)
d
e
l(b,d)
l(d,f), l(g,f), l(e,f)
f
r(f,h),r(f,i)
g
r(g,f), r(g,h), r(g,i)
l(b,g)
h
r(h,i)
l(f,h)
i
l(h,i)
33
Remarks

Resembles Distance Vector Protocol
Computation begins with initial reachable set and
shipping it to all neighbors. (R1)
Neighbors update reachable set with its own
neighborhood set, and forward resulting reachable
set to neighbors. (R2)
Computes all-pairs paths Work is shared by all
nodes.
Converges after Network-Diameter rounds of
communication.

R1 reachable(S,D) - link(S,D)
R2 reachable(S,D) - link(S,Z), reachable(Z,D)
34
Distance Vector Routing

Routing Table Formation
An entry in the routing table nextHop(src,dst,nex
thop,cost)
R1 path(S,D,D,C) - link(S,D,C)
R2 path(S,D,Z,C) - link(S,Z,C1),
path(Z,D,W,C2), CC1C2
R3 bestPathLength(S,D,min) -
path(S,D,Z,C)
R4 nextHop(S,D,Z,C) - nextHop(S,D,Z,C),
bestPathLength(S,D,C)

Changes
New rules R3 and R4.
Path stores the next hop in path.

35
Query Optimization Techniques

The Basics
Query Optimization
Reduce communication overhead
Magic Sets Rewrite
Left-Right Recursion Rewrite
Aggregate Selections
Reduce latency
Squaring Algorithm
Work Sharing
Queries over Dynamic Graphs

36
Opt 1 Magic Sets Rewrite

R1 magicSource(D) - magicSource(S), link(S,D)
R2 reachable(S,D) - magicSource(S), link(S,D)
R3 reachable(S,D) - magicSource(S), link(S,Z),
reachable(Z,D)
R4 magicSource(b)
R5 magicSource(e)
Query reachable(M,N)

What if we do not need the recursive query to be
computed on a portion of the graph?
37
Combine common expressions

R1 magicSource(D) - s(S,D)
R2 reachable(S,D) - s(S,D)
R3 reachable(S,D) - s(S,Z), reachable(Z,D)
R4 magicSource(b)
R5 magicSource(e)
R6 s(S,D) - magicSource(S), link(S,D)
Query reachable(M,N)

38
Optimized Magic Rewrite Plan I
R1 magicNodes(D) - s(S,D) R2 reachable(S,D) -
s(S,D) R3 reachable(S,D) - s(S,Z), reachable(Z
,D) R4 magicSource(b) R5 magicSource(e) R6 s
(S,D) - magicSource(S), link(S,D)
Query ?-reachable(M,N)
R3
R2
R1
R6
39
Optimized Magic Rewrite Plan II
R1 magicNodes(D) - s(S,D) R2 reachable(S,D) -
s(S,D) R3 reachable(S,D) - s(S,Z), reachable(Z
,D) R4 magicSource(b) R5 magicSource(e) R6 s
(S,D) - magicSource(S), link(S,D)
Query ?-reachable(M,N)
R3
R1
R2
R6
40
Executing Magic Reachable
0th Iteration
a
l(a,b), l(a,c)
l(b,d), l(b,e), l(b,g)
l(c,e)
c
b
r(b,d), r(b,e), r(b,g)
l(e,f)
l(d,f)
d
e
r(e,f)
f
l(f,h)
g
l(g,f)
l(h,i)
h
i
41
Executing Magic Reachable
1st Iteration
a
r(b,d), r(b,e), r(b,g)
c
b
l(b,e)
l(b,d)
r(d,f)
d
e
l(b,g)
l(e,f)
f
r(f,h)
g
r(g,f)
h
i
42
Executing Magic Reachable
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
r(e,f), r(e,h)
r(d,f)
l(b,d)
l(b,e)
l(d,f)
l(g,f)
r(f,h)
l(e,f)
l(f,h)
r(g,f)
l(b,g)
r(h,i)
43
Remarks on Magic Rewrite
a
c
b

Magic sets limits computation to portion of
graph.
Limit by source and/or destination.
Few problems
Does not help for undirected graphs.
Still some redundant fact generation.
Increased number of iterations

d
e
f
g
h
i
44
Opt 2 Left-Right Recursion Rewrite

R1 reachable(S,D) - magicSource(S), link(S,D)
R2 reachable(S,D) - reachable(S,Z), link(Z,D)
R3 magicSource(b)
R4 magicSource(e)
Query ?-reachable(M,N)

R2
R1
45
Executing Left-Recursion Plan
a
1st Iteration
r(b,d), r(b,e), r(b,g)
c
b
r(b,e)
r(b,d)
d
e
r(e,f)
l(d,f)
r(b,g)
r(e,f)
f
l(f,h)
g
l(g,f)
h
l(h,i)
i
46
Executing Left-Recursion Plan
a
2nd Iteration
r(b,d), r(b,e), r(b,g), r(b,f)
c
b
r(e,f), r(e,h)
d
e
l(d,f)
r(b,f)
r(b,f)
r(b,f)
f
l(f,h)
g
r(e,h)
l(g,f)
h
i
47
Executing Left-Recursion Plan
a
3rd Iteration
r(b,d), r(b,e), r(b,g), r(b,f), r(b,h)
c
b
r(e,f), r(e,h), r(e,i)
d
e
l(d,f)
f
l(f,h)
g
r(b,h)
l(g,f)
l(h,i)
h
r(e,i)
i
48
Remarks on Left Rewrite
a

Resembles Dynamic Source Routing (DSR). Same
query, different routing protocol !!
Lower communication overhead when there are few
source nodes.
Sharing is not implicit in query. Each node
computes its facts independently of other nodes.

c
b
e
d
R1 reachable(S,D) - magicSource(S), link(S,D)
R2 reachable(S,D) - reachable(S,Z), link(Z,D)
R3 magicSource(b) R4 magicSource(e) Query ?-r
eachable(M,N)
f
g
h
i
49
Work Sharing

Sharing among queries with identical rules
If all nodes running the same query, use
right-recursion.
If only a few nodes running the same query, use
left-recursion.
Switching from left-to-right recursion can be
expressed as a rewrite

a
c
b
e
d
R1 reachable(S,D) - magicSource(S), link(S,D)
R2 reachable(S,D) - ?magicSource(Z),
reachable(S,Z), link(Z,D)
R3 reachable(S,D) - magicSource(Z),
reacjable(S,Z), reachable(Z,D)
f
g
h
i
50
Work Sharing

Merge common expressions for queries with
different rules

R1 path(S,D,P,C,L) - link(S,D,C2,L2),
path(S,D,P1,C1,L1),
PconcatPath(link(X,Z), P1),
CFUN1(C1,C2), LFUN2(L1, L2)
R2 path(S,D,P,C) - link(X,Y,C),
PconcatPath((link(X,Y), nil) R3 bestPath(S,D,AG
G1(,AGG2()) - path(S,D,P,C,L)
51
Queries over Dynamic Graphs

In practice, queries (both routing and
monitoring) are long running.
Timestamp derived facts
Each base fact is maintained as soft-state with
timestamp.
Timestamp derived facts based on oldest base fact
used in computation.
Maintain state of long-running queries
Incremental computation when new base facts are
added.

52
Current Status

PIER supports cycles in query plans.
Naïve implementations of web and gnutella
crawlers. Test-runs on PlanetLab.
PIER hand-optimized dataflows
Left vs Right recursion.
Magic Sets.
Semi-naïve vs Squaring algorithm.
Aggregate Selections.
Reachable, Shortest Path, Diameter Queries.
Static graphs, no sharing.
Initial PIER simulation results detailed in tech
report
Querying Network Graphs with Recursive Queries.
UC Berkeley Technical Report UCB//CSD-4-1332, Jun
2004.

53
Datalog ? Optimized Plan

Bulk of research work.
Data placement.
Magic Sets
Directed vs Undirect.
How many participating nodes?
Left vs Right recursion
How many participating nodes?
Aggregate Selections
Density of network?
Monotonic aggregate?
Semi-naïve vs Squaring
Density of network.

54
Datalog ? Optimized Plan

Work-sharing / Materialization
How many participating nodes? Sources?
Destinations?
Presence of similar overlapping queries?
Presence of queries with common rules or
correlated metrics?
Presence of hub nodes?
Queries restricted to a domain?
Good vs Best plan?
What if conditions change during query execution?
Optimal plan may become sub-optimal Adaptive
optimization techniques.