IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid

Description:

Knowledge Grid. GRID. To be explored. Web. Current Situation: Knowledge base ... Theorem 2: Constructing a comparison-based structured overlay is the same as ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 38

Provided by: phili87

Category:

more less

Transcript and Presenter's Notes

Title: IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid

1
IMAGINE-P2PA Scalable P2P Platform for the
Knowledge Grid

Hai Zhuge, Xiaoping Sun et al.
China Knowledge Grid Research Group
Institute of Computing Technology
Chinese Academy of Sciences

2
Main work

IMAGINE-P2P Integrated Multi-disciplinary
Autonomous Global Innovation Networking
Environment on P2P network

A platform to efficiently support index-based
path queries by incorporating a semantic overlay
on a structured P2P network
The deployment of a scalable distributed trie
index for broadcast queries on key strings
A decentralized load balancing method for
improving the system utilization
A replication method is used to improve the
availability of distributed index
3
Outline

Background
Design Rationale
Architecture of IMAGINE-P2P
Deployment of Distributed Trie Index
Performance Improvements
Experiment Results
Conclusion

4
Background

Motivation

Sharing Expend services of resource sharing and
cooperation from local distributed systems to
large-scale and geographically distributed
systems.

5
Background

A Challenge

Scalability A SIMPLE GOAL (Jim Gray, 2003) to
scale up and scale out systems in large-scale and
dynamic distributed environments.

6
Background

Current Situation

7
Background

Our Goal To build a scalable P2P platform of
the Knowledge Grid IMAGINE-P2P

Provide architectural extensibility for different
types of complex queries
Achieve scalable performance of queries
Improve the utilization and the availability
8
Design Rationale

Make reasonable trade-offs to achieve an
acceptable scalability of the whole system.

Distributed index Topology dependent vs.
Topology independent
Topology Complexity vs. Efficiency/Robustness
Query routing Complexity vs. Store/Query
Efficiency
Utilization Load balancing vs. Query Efficiency
Availability Fault-tolerance vs. Store/Query
Efficiency
9
Architecture of IMAGINE-P2P

Layered Architecture

Future Knowledge Grid applications built on
various distributed indexes
A distributed trie index supporting scalable
wild-card and broadcasting queries on objects
Distributed Trie index
Semantic Overlay
Distributed indexes supporting scalable
semantics-rich path queries on objects
Object Overlay
A P2P overlay network providing scalable
management of resources
10
Architecture of IMAGINE-P2P

Object Overlay Topology Consideration

Theorem 1 Comparison-based structured overlays
have to build a linear-order relation on their ID
spaces to allow a deterministic routing.
Theorem 2 Constructing a comparison-based
structured overlay is the same as sorting IDs of
nodes and objects by a linear-order relation,
which features a lower bound of O(N log N)
comparisons. N is the number of nodes.
Decision Ring topology is the most direct and
simple way to build comparison-based structured
overlay network. Chord is such a case.
11
Architecture of IMAGINE-P2P

Object Overlay Topology

Chord has O(log N) hops and proved correctness of
stabilization in dynamic environments
12
Architecture of IMAGINE-P2P

Semantic Overlay Basic structure

Distributed Indexing Structure
Object Overlay
Query for a sp(O1O2O6O7)
Indexing Node Object
Semantic Object SO (a, R, b)
N1 Physical node
O1
1
N2
n
21
O2
O3
Semantic Overlay
O5
O4
O6
2K
Ni
O7
2K1
Key
Semantic path a sp(a1R1a2R2an-1Rn-1an)
13
Architecture of IMAGINE-P2P

Semantic Overlay Querying

Semantic Object SO (a, R, b) , either a or b,
or both can be used as the keys by the DHT
function.
Semantic path a query q a1R1a2R2an-1Rn-1an
is decomposed into n - 1 subqueries, q1
a1 R1 a2, q2 a1a2 R2 a3, , and qn-1
a1a2an-1 Rn-1 an .
O (log N) for a semantic object. O (log N L)
for a semantic path of length L in the best
cases. O (log N L) for a semantic path of
length L in the worst cases.
14
Architecture of IMAGINE-P2P

Semantic Overlay Basic query operations

15
Deployment of Distributed Trie Index

Distributed Trie Index Basic Structure

A full trie index
Query dark
//
c
b
d
SO1(d, S, a)
a
i
r
a
c
e
g
r
k
a
k
/
/
big
t
/
back
dark
/
create
A trie path tp(dark)
LO(logmN), m the size of attribute set, N the
key number
16
Deployment of Distributed Trie Index

Trie Index Two basic types

A full trie index
A pruned trie index
//
//
c
b
d
c
b
d
a
i
r
o
/
/
/
a
o
/
c
dark
create
m
e
g
r
m
big
back
k
p
a
k
p
/
/
big
t
u
/
u
back
dark
/
t
t
create
i
e
/
/
n
computer
computing
r
g
/
computer
/
computing
17
Deployment of Distributed Trie Index

Trie Index Compressed pruned trie index

To avoid splitting and moving existing indexing
nodes
A pruned trie index
A compressed pruned trie index
c
d
b
//
/
/
c
b
/
/
/
d
/
/
/
computing
dark
create
o
/
back
big
dark
create
m
back
big
computer
p
A key object is defined as KO (a1a2aj, S, K),
where key K a1a2ajan and aj is the leaf trie
node of the trie path of K
u
t
/
/
computing
computer
18
Deployment of Distributed Trie Index

Trie Index Publish compressed pruned trie index

If there is no SO(a1, S, e) or SO(a1, S, a2),
SO(a1, S, e) is published and the key K is
published by KO(a1, S, K).
If there is SO(a1, S, e) but no KO(a1, S, K1)
where K1 a1b2b3bn (b2 ? a2), the key K is
published by KO(a1, S, K).
If there are already SO(a1, S, e) and a
KO(a1, S, K1) that shares some prefixes with K,
where K1 a1a2ajbj1bm, j 2, and
bj1 ? aj1, SO(a1, S, e) is changed to
SO(a1, S, a2) and two objects are published. One
is SO(a1a2, S, e), the other is KO(a1a2, S, K).
If there is already a SO(a1, S, a2), forward the
key K along the trie path tp(a1a2ame) until to
SO(a1 a2am, S, e) (m n). If there is no such a
KO(a1a2a3am, S, K2) that K2 a1a2amam1bm2bp,
just publish a KO(a1a2a3am, S, K). Else change
SO(a1 a2am, S, e) to SO(a1 a2am, S, am1) and
publish objects SO(a1a2a3amam1, S, e) and
KO(a1a2a3amam1, S, K).

Same colored objects share the same prefix and
thus can be published in one message.
19
Deployment of Distributed Trie Index

Trie Index Multi-access on physical nodes

Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
20
Deployment of Distributed Trie Index

Trie Index Avoiding multi-access

Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
21
Deployment of Distributed Trie Index

Trie Index Multi-access

Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
22
Deployment of Distributed Trie Index

Trie Index Avoid multi-access

Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
23
Performance Improvements

Utilization Improvement Decentralized load
balancing

Target for each node ni (i 1, 2, ?, N),
Action ni moves loads to neighbors nodes nj
selected from its neighbor node set
according to
Which object should be moved

and
When should the object be moved
Where should the object be moved
with
24
Performance Improvements

Availability Improvement Using path key
replication to improve availability of semantic
paths and distributed trie paths.

Duplicate a semantic object SO (a, R, b) by
using key a and key b to publish it.
A path key of a semantic object contains the path
information of the objects published before it on
the same path. And A semantic object can be
recovered from any latterly published semantic
object on the same semantic path.
25
Experiment Results

An event-driven simulation environment
Simulation on a ring network with 200 and 2000
nodes.
Different distributions of object loads and node
capacities are tested.

26
Experiment Results
Trie index properties compared with B-tree and
B-tree. Compressed trie index has very short
average depth.
27
Experiment Results
The size of a trie index is sensitive to only key
string distribution. The independence to the
network size and the number of keys make it
scalable in large-scale and dynamic environment.
28
Experiment Results
Average search hops of a broadcast query for all
the keys on the network using distributed trie
indexes in network with different size and key
number.
29
Experiment Results
An optimized search on trie indexes with 2349 PDF
file names as keys
30
Experiment Results
Load balancing process show the variance of the
system load decreasing with the load balancing
iterations in different load distributions.
31
Experiment Results
Chord uses virtual servers to improve the load
balance, where each physical node holds more than
one virtual server and data objects are mapped by
DHT function to virtual servers instead of
physical nodes. They proposed that log N virtual
servers per physical node can be optimal with
high probability when considering only the number
of keys.
32
Experiment Results
Load balancing process works effectively for
distributed trie indexes that cause heavily
imbalanced load distributions
33
Experiment Results
34
Experiment Results
If each extra hop incurred by the load balancing
does not significantly delay a query, the average
query latency under load balancing can be reduced
when only considering storage consumption of
objects.
35
Experiment Results
The availability of the full trie with the
replication is better than that of the pruned
trie because the pruned trie has much shorter
path length and there are fewer copies in path
key replication. The pruned trie however has
better availability without replication, because
it has much shorter search paths, i.e., it is
less probably broken under the same failure
distribution.
36
Conclusion

Publishing distributed indexes using semantic
overlay methods can be a solution to support
complex queries with high level semantics.
There are many conflicting factors that should be
compromised when designing P2P system to achieve
a scalable solution.
The distributed trie index can be scalable in
large-scale and dynamic environments where keys
string distribution is relatively stable.
Decentralized load balancing in large-scale and
dynamic distributed systems can work effectively.
Future work still faces challenging in building
more efficient distributed indexes, relieving hot
spots on distributed indexes, improving
availability while keeping system decentralized
and scalable.
Future theoretic work should show that to what
scale the trade-off can be made to achieve an
acceptable scalability.

This work has been published in IEEE Transaction
on Knowledge and Data Engineering
37
Questions and Comments
Thanks!
Full paper is available at IEEE Transactions on
Knowledge and Data Engineering

Write a Comment

User Comments (0)