Paraskevi Raftopoulou1,2 and Euripides G.M. Petrakis2 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Paraskevi Raftopoulou1,2 and Euripides G.M. Petrakis2

Description:

A Measure for Cluster Cohesion in Semantic Overlay Networks Paraskevi Raftopoulou1,2 and Euripides G.M. Petrakis2 1Max-Planck Institute for Informatics, Saarbruecken ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 26
Provided by: 97333
Category:

less

Transcript and Presenter's Notes

Title: Paraskevi Raftopoulou1,2 and Euripides G.M. Petrakis2


1
A Measure for Cluster Cohesion in Semantic
Overlay Networks
  • Paraskevi Raftopoulou1,2 and Euripides G.M.
    Petrakis2
  • 1Max-Planck Institute for Informatics,
    Saarbruecken, Germany
  • http//www.mpi-inf.mpg.de/
  • 2 Technical University of Crete, Chania, Greece
  • http//www.intelligence.tuc.gr/

2
Outline
  • Motivation Related work
  • Distributed resource sharing
  • iCluster architecture
  • Measuring clustering quality
  • Experimental evaluation
  • Conclusion

3
3 of 25
Motivation Related work
4
Motivation
4 of 25
  • Resource sharing is at the core of todays
    computing (Web, P2P, Grid)
  • Information retrieval functionality is needed
  • Overlay networks is a nice technology to built on
  • Measures are used for evaluating network
    organisation and retrieval efficiency

5
Related Work
5 of 25
  • Semantic Overlay Networks
  • Initial approaches include
  • KJ04, SMZ03, PMW07
  • Based on the idea of small-world networks
  • Smi04, LLS04, VSI06, DESENT
  • Concepts measures quantifying network
    organisation
  • (generalised) Clustering coefficient
  • WS98, HAH07
  • Extensions/modifications
  • FHJS02, BGW08, RMJ07, FH06

6
6 of 25
Distributed resource sharing
7
Semantic overlay networks
7 of 25
  • Self-organising overlay networks
  • The idea
  • Peers that are semantically, thematically, or
    socially close
  • (i.e., sharing similar interests or resources)
    are organised
  • into groups. Queries are routed to the
    appropriate group.
  • Peers hold routing indices with links to other
    peers
  • Peers connected to each other are called
    neighbours
  • Support rich data models and expressive query
    languages

8
Rewiring strategies
8 of 25
  • Techniques for self-organising peers
  • abandon old connections and create new ones
  • periodic process
  • Inspired by the small world effect
  • reach anybody in a small number of routing hops

9
Small-world networks
9 of 25
  • Peers are not neighbours of one another
  • Peers can be reached from every other peer by a
    small number of hops
  • Main characteristics
  • small average shortest path length
  • high clustering coefficient

10
10 of 25
iCluster architecture
11
iCluster basics
11 of 25
  • (i) intelligent (Cluster) clustering
    iClusterDL
  • Contributions
  • Architecture and protocols to support IR
    functionality
  • seamless and easy integration of peers, scalable
  • fast query processing
  • Self-organising peers based on SONs
  • support rich query models
  • benefits from loosely-connected peers

12
iCluster Protocols
12 of 25
  • Peer join/leave
  • Peer rewiring
  • Query processing
  • Document retrieval

13
Peer rewiring
13 of 25
  • A peer p
  • computes its intra-cluster similarity
  • (average similarity with its neighbours)
  • initiates rewiring if similarity lt threshold ?
  • sends a message (msg) with its interest to m
    neighbours
  • All peers receiving msg append their interest and
    forward msg to m neighbours
  • The message is sent back to p when TTL tR 0

14
Query processing
14 of 25
  • A peer p
  • compares q against its interests
  • selects the interest int most similar to q
  • if similarity threshold ? forwards a message
    (msg)
  • including q to all its neighbours with TTL tb
  • if similarity lt threshold ? forwards msg to the m
    of its neighbours most similar to q
  • All peers receiving msg do the same process
  • The message is forwarded until TTL tf 0

15
15 of 25
Measuring clustering quality
16
Clustering coefficient
16 of 25
  • The ratio of links between the peers within pis
    neighborhood with the number of links that could
    possibly exist between them
  • Takes values in the interval 0, 1
  • if ci 1, every peer connected to pi is also
    connected to every other peer within the
    neighborhood
  • If ci 0, no peer that is connected to pi
    connects to any other peer connected to pi

ci 1/6
ci 1/2
ci 1
ci 0
  • Takes into account only the immediate
  • neighbours of the peer
  • Takes high values when there are cliques
  • Loses the general view of the network

17
Clustering efficiency
17 of 25
  • A new measure that
  • quantifies network organisation and
  • reflects retrieval effectiveness
  • Based on the network organisation and on the
    query processing protocols
  • Consider that a peer pi s neighborhood consists
    of all peers by radius tb around pi

18
Clustering efficiency
18 of 25
  • The number of peers similar to pi that can be
    reached from pi within tb hops divided by the
    total number of similar peers
  • Takes values in the interval 0, 1
  • if ?i 1, the neighborhood of pi contains all
    peers similar to pi
  • If ?i 0, the neighborhood of pi contains none
    peer similar to pi

ci 0
?i 1
  • Gives information about the underlying
  • network organisation involving more
  • than just the immediate neighbors
  • Looks at how the network is organised at
  • a larger scale

19
19 of 25
Experimental evaluation
20
Experimental Evaluation
20 of 25
  • Used different parameters
  • Data corpus
  • Similarity threshold
  • Query TTL
  • Forwarding strategies
  • the start of the rewiring is randomly
  • chosen from the time interval 0, 4K
  • the periodicity is randomly selected
  • from a normal distribution of 2K

Parameter Symbol Value
peers N 2,000
short-range links s 8
long-range links l 4
similarity threshold ? 0.9
rewiring TTL tR 4
fixed forwarding TTL tf 6
broadcast TTL tb 2
message fanout m 2
  • OHSUMED TREC
  • 30,000 medical articles
  • 10 categories
  • TREC-6
  • 556,000 documents
  • 100 categories

The better the network organisation is, the
better the performance of retrievals should be!
  • Looked into the
  • Network organisation
  • Recall
  • The experiments are intended to
  • associate the performance of retrievals with the
    quality of network organisation
  • recommend the clustering measure that better
    represents this association

21
Experimental Evaluation
21 of 25
Clustering coefficient ci for different
forwarding strategies
22
Experimental Evaluation
22 of 25
Clustering efficiency ?i for different forwarding
strategies
23
Experimental Evaluation
23 of 25
Retrieval
24
24 of 25
Outlook
25
Conclusion
25 of 25
  • The idea
  • focus on IR on top of SON
  • look at how the network is organised at a large
    scale
  • Clustering efficiency
  • quantifies the underlying (dynamic) P2P structure
  • reflects retrieval effectiveness
  • The results indicate that clustering efficiency
    measure is better modeling network clustering
    quality compared to other existing measures
Write a Comment
User Comments (0)
About PowerShow.com