Title: IMAGINEP2P:A Scalable P2P Platform for the Knowledge Grid
1IMAGINE-P2PA Scalable P2P Platform for the
Knowledge Grid
- Hai Zhuge, Xiaoping Sun et al.
- China Knowledge Grid Research Group
- Institute of Computing Technology
- Chinese Academy of Sciences
2Main work
- IMAGINE-P2P Integrated Multi-disciplinary
Autonomous Global Innovation Networking
Environment on P2P network
A platform to efficiently support index-based
path queries by incorporating a semantic overlay
on a structured P2P network
The deployment of a scalable distributed trie
index for broadcast queries on key strings
A decentralized load balancing method for
improving the system utilization
A replication method is used to improve the
availability of distributed index
3Outline
- Background
- Design Rationale
- Architecture of IMAGINE-P2P
- Deployment of Distributed Trie Index
- Performance Improvements
- Experiment Results
- Conclusion
4Background
- Sharing Expend services of resource sharing and
cooperation from local distributed systems to
large-scale and geographically distributed
systems.
5Background
- Scalability A SIMPLE GOAL (Jim Gray, 2003) to
scale up and scale out systems in large-scale and
dynamic distributed environments.
6Background
7Background
- Our Goal To build a scalable P2P platform of
the Knowledge Grid IMAGINE-P2P
Provide architectural extensibility for different
types of complex queries
Achieve scalable performance of queries
Improve the utilization and the availability
8Design Rationale
- Make reasonable trade-offs to achieve an
acceptable scalability of the whole system.
Distributed index Topology dependent vs.
Topology independent
Topology Complexity vs. Efficiency/Robustness
Query routing Complexity vs. Store/Query
Efficiency
Utilization Load balancing vs. Query Efficiency
Availability Fault-tolerance vs. Store/Query
Efficiency
9Architecture of IMAGINE-P2P
Future Knowledge Grid applications built on
various distributed indexes
A distributed trie index supporting scalable
wild-card and broadcasting queries on objects
Distributed Trie index
Semantic Overlay
Distributed indexes supporting scalable
semantics-rich path queries on objects
Object Overlay
A P2P overlay network providing scalable
management of resources
10Architecture of IMAGINE-P2P
- Object Overlay Topology Consideration
Theorem 1 Comparison-based structured overlays
have to build a linear-order relation on their ID
spaces to allow a deterministic routing.
Theorem 2 Constructing a comparison-based
structured overlay is the same as sorting IDs of
nodes and objects by a linear-order relation,
which features a lower bound of O(N log N)
comparisons. N is the number of nodes.
Decision Ring topology is the most direct and
simple way to build comparison-based structured
overlay network. Chord is such a case.
11Architecture of IMAGINE-P2P
Chord has O(log N) hops and proved correctness of
stabilization in dynamic environments
12Architecture of IMAGINE-P2P
- Semantic Overlay Basic structure
Distributed Indexing Structure
Object Overlay
Query for a sp(O1O2O6O7)
Indexing Node Object
Semantic Object SO (a, R, b)
N1 Physical node
O1
1
N2
n
21
O2
O3
Semantic Overlay
O5
O4
O6
2K
Ni
O7
2K1
Key
Semantic path a sp(a1R1a2R2an-1Rn-1an)
13Architecture of IMAGINE-P2P
- Semantic Overlay Querying
Semantic Object SO (a, R, b) , either a or b,
or both can be used as the keys by the DHT
function.
Semantic path a query q a1R1a2R2an-1Rn-1an
is decomposed into n - 1 subqueries, q1
a1 R1 a2, q2 a1a2 R2 a3, , and qn-1
a1a2an-1 Rn-1 an .
O (log N) for a semantic object. O (log N L)
for a semantic path of length L in the best
cases. O (log N L) for a semantic path of
length L in the worst cases.
14Architecture of IMAGINE-P2P
- Semantic Overlay Basic query operations
15Deployment of Distributed Trie Index
- Distributed Trie Index Basic Structure
A full trie index
Query dark
//
c
b
d
SO1(d, S, a)
a
i
r
a
c
e
g
r
k
a
k
/
/
big
t
/
back
dark
/
create
A trie path tp(dark)
LO(logmN), m the size of attribute set, N the
key number
16Deployment of Distributed Trie Index
- Trie Index Two basic types
A full trie index
A pruned trie index
//
//
c
b
d
c
b
d
a
i
r
o
/
/
/
a
o
/
c
dark
create
m
e
g
r
m
big
back
k
p
a
k
p
/
/
big
t
u
/
u
back
dark
/
t
t
create
i
e
/
/
n
computer
computing
r
g
/
computer
/
computing
17Deployment of Distributed Trie Index
- Trie Index Compressed pruned trie index
To avoid splitting and moving existing indexing
nodes
A pruned trie index
A compressed pruned trie index
c
d
b
//
/
/
c
b
/
/
/
d
/
/
/
computing
dark
create
o
/
back
big
dark
create
m
back
big
computer
p
A key object is defined as KO (a1a2aj, S, K),
where key K a1a2ajan and aj is the leaf trie
node of the trie path of K
u
t
/
/
computing
computer
18Deployment of Distributed Trie Index
- Trie Index Publish compressed pruned trie index
- If there is no SO(a1, S, e) or SO(a1, S, a2),
SO(a1, S, e) is published and the key K is
published by KO(a1, S, K). - If there is SO(a1, S, e) but no KO(a1, S, K1)
where K1 a1b2b3bn (b2 ? a2), the key K is
published by KO(a1, S, K). - If there are already SO(a1, S, e) and a
KO(a1, S, K1) that shares some prefixes with K,
where K1 a1a2ajbj1bm, j 2, and
bj1 ? aj1, SO(a1, S, e) is changed to
SO(a1, S, a2) and two objects are published. One
is SO(a1a2, S, e), the other is KO(a1a2, S, K). - If there is already a SO(a1, S, a2), forward the
key K along the trie path tp(a1a2ame) until to
SO(a1 a2am, S, e) (m n). If there is no such a
KO(a1a2a3am, S, K2) that K2 a1a2amam1bm2bp,
just publish a KO(a1a2a3am, S, K). Else change
SO(a1 a2am, S, e) to SO(a1 a2am, S, am1) and
publish objects SO(a1a2a3amam1, S, e) and
KO(a1a2a3amam1, S, K).
Same colored objects share the same prefix and
thus can be published in one message.
19Deployment of Distributed Trie Index
- Trie Index Multi-access on physical nodes
Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
20Deployment of Distributed Trie Index
- Trie Index Avoiding multi-access
Query abcde
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
(ab, c)
(abcde, e)
Node B
abcde
On a full trie index and a pruned trie index
21Deployment of Distributed Trie Index
Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
22Deployment of Distributed Trie Index
- Trie Index Avoid multi-access
Query abcdef
(a, b)
(abcd, e)
Node C
Node A
(abc, d)
abcdef
(ab, c)
(abcde, e)
Node B
On a compressed pruned trie index
23Performance Improvements
- Utilization Improvement Decentralized load
balancing
Target for each node ni (i 1, 2, ?, N),
Action ni moves loads to neighbors nodes nj
selected from its neighbor node set
according to
Which object should be moved
and
When should the object be moved
Where should the object be moved
with
24Performance Improvements
- Availability Improvement Using path key
replication to improve availability of semantic
paths and distributed trie paths.
Duplicate a semantic object SO (a, R, b) by
using key a and key b to publish it.
A path key of a semantic object contains the path
information of the objects published before it on
the same path. And A semantic object can be
recovered from any latterly published semantic
object on the same semantic path.
25Experiment Results
- An event-driven simulation environment
- Simulation on a ring network with 200 and 2000
nodes. - Different distributions of object loads and node
capacities are tested.
26Experiment Results
Trie index properties compared with B-tree and
B-tree. Compressed trie index has very short
average depth.
27Experiment Results
The size of a trie index is sensitive to only key
string distribution. The independence to the
network size and the number of keys make it
scalable in large-scale and dynamic environment.
28Experiment Results
Average search hops of a broadcast query for all
the keys on the network using distributed trie
indexes in network with different size and key
number.
29Experiment Results
An optimized search on trie indexes with 2349 PDF
file names as keys
30Experiment Results
Load balancing process show the variance of the
system load decreasing with the load balancing
iterations in different load distributions.
31Experiment Results
Chord uses virtual servers to improve the load
balance, where each physical node holds more than
one virtual server and data objects are mapped by
DHT function to virtual servers instead of
physical nodes. They proposed that log N virtual
servers per physical node can be optimal with
high probability when considering only the number
of keys.
32Experiment Results
Load balancing process works effectively for
distributed trie indexes that cause heavily
imbalanced load distributions
33Experiment Results
34Experiment Results
If each extra hop incurred by the load balancing
does not significantly delay a query, the average
query latency under load balancing can be reduced
when only considering storage consumption of
objects.
35Experiment Results
The availability of the full trie with the
replication is better than that of the pruned
trie because the pruned trie has much shorter
path length and there are fewer copies in path
key replication. The pruned trie however has
better availability without replication, because
it has much shorter search paths, i.e., it is
less probably broken under the same failure
distribution.
36Conclusion
- Publishing distributed indexes using semantic
overlay methods can be a solution to support
complex queries with high level semantics. - There are many conflicting factors that should be
compromised when designing P2P system to achieve
a scalable solution. - The distributed trie index can be scalable in
large-scale and dynamic environments where keys
string distribution is relatively stable. - Decentralized load balancing in large-scale and
dynamic distributed systems can work effectively. - Future work still faces challenging in building
more efficient distributed indexes, relieving hot
spots on distributed indexes, improving
availability while keeping system decentralized
and scalable. - Future theoretic work should show that to what
scale the trade-off can be made to achieve an
acceptable scalability.
This work has been published in IEEE Transaction
on Knowledge and Data Engineering
37Questions and Comments
Thanks!
Full paper is available at IEEE Transactions on
Knowledge and Data Engineering