IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid

Description:

Knowledge Grid. GRID. To be explored. Web. Current Situation: Knowledge base ... Theorem 2: Constructing a comparison-based structured overlay is the same as ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 38
Provided by: phili87
Category:

less

Transcript and Presenter's Notes

Title: IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid


1
IMAGINE-P2PA Scalable P2P Platform for the
Knowledge Grid
  • Hai Zhuge, Xiaoping Sun et al.
  • China Knowledge Grid Research Group
  • Institute of Computing Technology
  • Chinese Academy of Sciences

2
Main work
  • IMAGINE-P2P Integrated Multi-disciplinary
    Autonomous Global Innovation Networking
    Environment on P2P network

A platform to efficiently support index-based
path queries by incorporating a semantic overlay
on a structured P2P network
The deployment of a scalable distributed trie
index for broadcast queries on key strings
A decentralized load balancing method for
improving the system utilization
A replication method is used to improve the
availability of distributed index
3
Outline
  • Background
  • Design Rationale
  • Architecture of IMAGINE-P2P
  • Deployment of Distributed Trie Index
  • Performance Improvements
  • Experiment Results
  • Conclusion

4
Background
  • Motivation
  • Sharing Expend services of resource sharing and
    cooperation from local distributed systems to
    large-scale and geographically distributed
    systems.

5
Background
  • A Challenge
  • Scalability A SIMPLE GOAL (Jim Gray, 2003) to
    scale up and scale out systems in large-scale and
    dynamic distributed environments.

6
Background
  • Current Situation

7
Background
  • Our Goal To build a scalable P2P platform of
    the Knowledge Grid IMAGINE-P2P

Provide architectural extensibility for different
types of complex queries
Achieve scalable performance of queries
Improve the utilization and the availability
8
Design Rationale
  • Make reasonable trade-offs to achieve an
    acceptable scalability of the whole system.

Distributed index Topology dependent vs.
Topology independent
Topology Complexity vs. Efficiency/Robustness
Query routing Complexity vs. Store/Query
Efficiency
Utilization Load balancing vs. Query Efficiency
Availability Fault-tolerance vs. Store/Query
Efficiency
9
Architecture of IMAGINE-P2P
  • Layered Architecture

Future Knowledge Grid applications built on
various distributed indexes
A distributed trie index supporting scalable
wild-card and broadcasting queries on objects
Distributed Trie index
Semantic Overlay
Distributed indexes supporting scalable
semantics-rich path queries on objects
Object Overlay
A P2P overlay network providing scalable
management of resources
10
Architecture of IMAGINE-P2P
  • Object Overlay Topology Consideration

Theorem 1 Comparison-based structured overlays
have to build a linear-order relation on their ID
spaces to allow a deterministic routing.
Theorem 2 Constructing a comparison-based
structured overlay is the same as sorting IDs of
nodes and objects by a linear-order relation,
which features a lower bound of O(N log N)
comparisons. N is the number of nodes.
Decision Ring topology is the most direct and
simple way to build comparison-based structured
overlay network. Chord is such a case.
11
Architecture of IMAGINE-P2P
  • Object Overlay Topology

Chord has O(log N) hops and proved correctness of
stabilization in dynamic environments
12
Architecture of IMAGINE-P2P
  • Semantic Overlay Basic structure

Distributed Indexing Structure
Object Overlay
Query for a sp(O1O2O6O7)
Indexing Node Object
Semantic Object SO  (a, R, b)
N1 Physical node
O1
1
N2
n
21
O2
O3
Semantic Overlay
O5
O4
O6
2K
Ni
O7
2K1
Key
Semantic path a sp(a1R1a2R2an-1Rn-1an)
13
Architecture of IMAGINE-P2P
  • Semantic Overlay Querying

Semantic Object SO  (a, R, b) , either a or b,
or both can be used as the keys by the DHT
function.
Semantic path a query q a1R1a2R2an-1Rn-1an
is decomposed into n - 1 subqueries, q1
a1 R1 a2, q2 a1a2 R2 a3, , and qn-1
a1a2an-1 Rn-1 an .
O (log N) for a semantic object. O (log N L)
for a semantic path of length L in the best
cases. O (log N L) for a semantic path of
length L in the worst cases.
14
Architecture of IMAGINE-P2P
  • Semantic Overlay Basic query operations

15
Deployment of Distributed Trie Index
  • Distributed Trie Index Basic Structure

A full trie index
Query dark
//
c
b
d
SO1(d, S, a)
a
i
r
a
c
e
g
r
k
a
k
/
/
big
t
/
back
dark
/
create
A trie path tp(dark)
LO(logmN), m the size of attribute set, N the
key number
16
Deployment of Distributed Trie Index
  • Trie Index Two basic types

A full trie index
A pruned trie index
//
//
c
b
d
c
b
d
a
i
r
o
/
/
/
a
o
/
c
dark
create
m
e
g
r
m
big
back
k
p
a
k
p
/
/
big
t
u
/
u
back
dark
/
t
t
create
i
e
/
/
n
computer
computing
r
g
/
computer
/
computing
17
Deployment of Distributed Trie Index
  • Trie Index Compressed pruned trie index

To avoid splitting and moving existing indexing
nodes
A pruned trie index
A compressed pruned trie index
c
d
b
//
/
/
c
b
/
/
/
d
/
/
/
computing
dark
create
o
/
back
big
dark
create
m
back
big
computer
p
A key object is defined as KO (a1a2aj, S, K),
where key K  a1a2ajan and aj is the leaf trie
node of the trie path of K
u
t
/
/
computing
computer
18
Deployment of Distributed Trie Index
  • Trie Index Publish compressed pruned trie index
  • If there is no SO(a1, S, e) or SO(a1, S, a2),
    SO(a1, S, e) is published and the key K is
    published by KO(a1, S, K).
  • If there is SO(a1, S, e) but no KO(a1, S, K1)
    where K1  a1b2b3bn (b2 ? a2), the key K is
    published by KO(a1, S, K).
  • If there are already SO(a1, S, e) and a
    KO(a1, S, K1) that shares some prefixes with K,
    where K1  a1a2ajbj1bm, j  2, and
    bj1 ? aj1, SO(a1, S, e) is changed to
    SO(a1, S, a2) and two objects are published. One
    is SO(a1a2, S, e), the other is KO(a1a2, S, K).
  • If there is already a SO(a1, S, a2), forward the
    key K along the trie path tp(a1a2ame) until to
    SO(a1 a2am, S, e) (m  n). If there is no such a
    KO(a1a2a3am, S, K2) that K2  a1a2amam1bm2bp,
    just publish a KO(a1a2a3am, S, K). Else change
    SO(a1 a2am, S, e) to SO(a1 a2am, S, am1) and
    publish objects SO(a1a2a3amam1, S, e) and
    KO(a1a2a3amam1, S, K).

Same colored objects share the same prefix and
thus can be published in one message.
19
Deployment of Distributed Trie Index
  • Trie Index Multi-access on physical nodes

Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
20
Deployment of Distributed Trie Index
  • Trie Index Avoiding multi-access

Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
21
Deployment of Distributed Trie Index
  • Trie Index Multi-access

Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
22
Deployment of Distributed Trie Index
  • Trie Index Avoid multi-access

Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
23
Performance Improvements
  • Utilization Improvement Decentralized load
    balancing

Target for each node ni (i  1, 2, ?, N),
Action ni moves loads to neighbors nodes nj
selected from its neighbor node set
according to
Which object should be moved

and
When should the object be moved
Where should the object be moved
with
24
Performance Improvements
  • Availability Improvement Using path key
    replication to improve availability of semantic
    paths and distributed trie paths.

Duplicate a semantic object SO (a, R, b) by
using key a and key b to publish it.
A path key of a semantic object contains the path
information of the objects published before it on
the same path. And A semantic object can be
recovered from any latterly published semantic
object on the same semantic path.
25
Experiment Results
  • An event-driven simulation environment
  • Simulation on a ring network with 200 and 2000
    nodes.
  • Different distributions of object loads and node
    capacities are tested.

26
Experiment Results
Trie index properties compared with B-tree and
B-tree. Compressed trie index has very short
average depth.
27
Experiment Results
The size of a trie index is sensitive to only key
string distribution. The independence to the
network size and the number of keys make it
scalable in large-scale and dynamic environment.
28
Experiment Results
Average search hops of a broadcast query for all
the keys on the network using distributed trie
indexes in network with different size and key
number.
29
Experiment Results
An optimized search on trie indexes with 2349 PDF
file names as keys
30
Experiment Results
Load balancing process show the variance of the
system load decreasing with the load balancing
iterations in different load distributions.
31
Experiment Results
Chord uses virtual servers to improve the load
balance, where each physical node holds more than
one virtual server and data objects are mapped by
DHT function to virtual servers instead of
physical nodes. They proposed that log N virtual
servers per physical node can be optimal with
high probability when considering only the number
of keys.
32
Experiment Results
Load balancing process works effectively for
distributed trie indexes that cause heavily
imbalanced load distributions
33
Experiment Results
34
Experiment Results
If each extra hop incurred by the load balancing
does not significantly delay a query, the average
query latency under load balancing can be reduced
when only considering storage consumption of
objects.
35
Experiment Results
The availability of the full trie with the
replication is better than that of the pruned
trie because the pruned trie has much shorter
path length and there are fewer copies in path
key replication. The pruned trie however has
better availability without replication, because
it has much shorter search paths, i.e., it is
less probably broken under the same failure
distribution.
36
Conclusion
  • Publishing distributed indexes using semantic
    overlay methods can be a solution to support
    complex queries with high level semantics.
  • There are many conflicting factors that should be
    compromised when designing P2P system to achieve
    a scalable solution.
  • The distributed trie index can be scalable in
    large-scale and dynamic environments where keys
    string distribution is relatively stable.
  • Decentralized load balancing in large-scale and
    dynamic distributed systems can work effectively.
  • Future work still faces challenging in building
    more efficient distributed indexes, relieving hot
    spots on distributed indexes, improving
    availability while keeping system decentralized
    and scalable.
  • Future theoretic work should show that to what
    scale the trade-off can be made to achieve an
    acceptable scalability.

This work has been published in IEEE Transaction
on Knowledge and Data Engineering
37
Questions and Comments
Thanks!
Full paper is available at IEEE Transactions on
Knowledge and Data Engineering
Write a Comment
User Comments (0)
About PowerShow.com