Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems

Description:

Athens University of Economics and Business. Athens, Greece. June 28, 2006. ICPS'2006 ... Mobile devices high storage capacity & wireless support ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 21
Provided by: kje87
Category:

less

Transcript and Presenter's Notes

Title: Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems


1
Taxonomy Caching A Scalable Low-Cost Mechanism
for Indexing Remote Contents in Peer-to-Peer
Systems
  • Kjetil Nørvåg
  • Norwegian University of Science and Technology
    Trondheim, NorwayChristos Doulkeridis and
    Michalis Vazirgiannis
  • Athens University of Economics and Business
  • Athens, Greece

2
Outline
  • Motivation and example application
  • Taxonomies and taxonomy-based querying
  • Taxonomy-based query routing
  • Taxonomy caching architecture and maintenance
  • Experimental results
  • Summary and further work

3
Motivation
  • Mobile devices high storage capacity wireless
    support
  • Contain multimedia documents that can be shared
  • Possibly other data/services
  • Temperature or other environmental data
  • Important challenge find the files services!
  • Problem
  • Dynamic contents, location, and visibility
  • Limited bandwidth
  • ? Centralized indexing/search engines not
    applicable
  • ? P2P network search

4
Example application MobiShare
  • Devices share resources by hosting web services
  • Device connected to a CAS
  • CASs connected P2P
  • More details in Valavanis et al., Web
    Intelligence2003

5
Outline of basic idea
  • 1) Describe contents according to taxonomy
  • 2) Taxonomy info cached at remote peers
  • 3) Use cached knowledge to route queriesto
    appropriate peers
  • Why?
  • 1) Should reduce latency
  • 2) Increase recall with same cost

6
Resource description
  • Taxonomy-based resource description
  • Also applicable for audio/video
  • More than one taxonomy might exist in system
  • Resource description Taxonomy ID and set of
    categories

7
Taxonomy-based querying
  • Query
  • 1) Request for all resources belonging to
    category Cj
  • or
  • 2) Request for all resources belonging to
    category Cj and satisfying some additional
    property
  • Example properties Text contents, metadata

8
Searching in unstructured P2P networks
  • Basic search technique Local execution of query
    then forwarding if TTLgt0
  • Naïve flooding (all neighbors)
  • Normalized flooding (only K neighbors)
  • Random walks only one random neighbor, but W
    walks initiated
  • Problem Only a limited of peers can be
    searched (query horizon)
  • Possible improvements
  • Routing indices
  • Summary indexing (bloom filters etc)
  • Result caching
  • However Still limited scalability and coverage

9
Taxonomy caching
  • Basic idea
  • Maintain taxonomic of remote contents in a
    taxonomy cache (TCache)
  • Mapping from taxonomic concept to set of peers
  • Advantages
  • Cheaper to maintain than full-text index
  • More applicable to multimedia data
  • More robust wrt. changes in contents
  • Used to improve query routing
  • ? Higher recall and reduced latency

10
Query routing using taxonomy cache (TCache)
  • Basis one of traditional routing strategies
  • Query forward peers PF
  • Starting point PF neighborsPNPN1,,PNn
  • Lookup in TCache Lookup(category)
    ?PCPC1,,PCm
  • PF PNPC
  • Query forwarded to (subset of) PF

11
Query forwarding alternatives (1)
  • Query forward peers PF
  • of neighbors (excl. previous) Nn
  • matches from lookup Nc
  • Ranking of peers in PC
  • Based on of resources within a category
  • High of resources considered experts
  • TCB
  • Highest ranked in PC the Nn neighbors in
    PN1,,PNn
  • Forwarding to peer in PC called jump
  • Jump can be to peer beyond query horizon!
  • TCA
  • If Nc Nn forward to Nn highest ranked peers in
    PC
  • If Nc lt Nn forward to all Nc peers in PC
    (Nn-Nc) randomly selected neighbors

12
Query forwarding alternatives (2)
  • TCCN
  • If Nc Nn forward to all Nc peers in PC
  • If Nc lt Nn forward to all Nc peers in PC
    (Nn-Nc) neighbors
  • TCDN
  • If Nc Nn forward to Nn/2 highest ranked peers
    in PC random selection of Nn/2 other peers in
    PC
  • If Nc lt Nn forward to all Nc peers in PC
    (Nn-Nc) neighbors

13
Distributing taxonomic information
  • Basic mechanism piggyback matching category with
    query result
  • Rsult returned through original path, possibly
    involving jumps
  • Makes revalidation of contents intermediate
    TCaches possible
  • Coverage will be gradually extended (beyond query
    horizon)
  • Lazy distribution by gossiping also possible

14
TCache architecture and maintenance
  • Aim Provide efficient mapping C ?PC1,,PCm
  • For each category Peers, of resources, and TTL
  • TTL
  • Regularly decremented
  • Reset to start value at revalidation
  • Caching policy Aggressive vs. selective
  • Compacting techniques Peer upgrade non-expert
    pruning

15
Experimental setup
  • Simulations
  • Excerpts of DMOZ taxonomy
  • Synthetic network topologies
  • Resource allocation 80/20 rule
  • Queries are taxonomic categories
  • A number of peers have role as querying peers
  • Measured Contacted peers, messages, recall and
    latency
  • In this presentation Results using flooding and
    TCDN query routing

16
Improvements in recall
NM (F) NM (TC) Recall (F) Recall (TC)
TTL1 7.8 7.0 0.0022 0.0019
TTL3 166.7 166.0 0.0117 0.0149
TTL5 524.7 523.9 0.0282 0.0717
TTL7 1058.6 1057.7 0.0506 0.1835
TTL9 1721.0 1719.6 0.0773 0.2930
TTL11 2566.3 2566.0 0.1104 0.4012
TTL13 3536.5 3535.8 0.1477 0.4891
TTL15 4560.2 4558.7 0.1864 0.5755
17
Primary reason for improvementMore intelligent
query forwarding
NC (F) NC (TC) Recall (F) Recall (TC)
TTL1 7.8 6.7 0.0022 0.0019
TTL3 45.3 53.4 0.0117 0.0149
TTL5 110.6 158.0 0.0282 0.0717
TTL7 199.9 346.8 0.0506 0.1835
TTL9 305.6 583.1 0.0773 0.2930
TTL11 437.7 840.3 0.1104 0.4012
TTL13 586.7 1120.6 0.1477 0.4891
TTL15 741.6 1372.4 0.1864 0.5755
18
Improvement and scalability
19
Latency reduction
  • TCache results in very fast retrieval of first
    results
  • Finding all results approximately similar
    performance because flooding in both techniques

20
Summary and further work
  • Presented motivation and context
  • Taxonomy-based querying and query routing
  • TCache architecture and maintenance
  • Experimental results proving our claims
  • Future/ongoing work
  • Employing the techniques for XML/XPath querying
    in P2P context (to appear at IEEE P2P2006)
  • Integration of different taxonomies
Write a Comment
User Comments (0)
About PowerShow.com