Title: Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems
1Taxonomy Caching A Scalable Low-Cost Mechanism
for Indexing Remote Contents in Peer-to-Peer
Systems
- Kjetil Nørvåg
- Norwegian University of Science and Technology
Trondheim, NorwayChristos Doulkeridis and
Michalis Vazirgiannis - Athens University of Economics and Business
- Athens, Greece
2Outline
- Motivation and example application
- Taxonomies and taxonomy-based querying
- Taxonomy-based query routing
- Taxonomy caching architecture and maintenance
- Experimental results
- Summary and further work
3Motivation
- Mobile devices high storage capacity wireless
support - Contain multimedia documents that can be shared
- Possibly other data/services
- Temperature or other environmental data
- Important challenge find the files services!
- Problem
- Dynamic contents, location, and visibility
- Limited bandwidth
- ? Centralized indexing/search engines not
applicable - ? P2P network search
4Example application MobiShare
- Devices share resources by hosting web services
- Device connected to a CAS
- CASs connected P2P
- More details in Valavanis et al., Web
Intelligence2003
5Outline of basic idea
- 1) Describe contents according to taxonomy
- 2) Taxonomy info cached at remote peers
- 3) Use cached knowledge to route queriesto
appropriate peers - Why?
- 1) Should reduce latency
- 2) Increase recall with same cost
6Resource description
- Taxonomy-based resource description
- Also applicable for audio/video
- More than one taxonomy might exist in system
- Resource description Taxonomy ID and set of
categories
7Taxonomy-based querying
- Query
- 1) Request for all resources belonging to
category Cj - or
- 2) Request for all resources belonging to
category Cj and satisfying some additional
property - Example properties Text contents, metadata
8Searching in unstructured P2P networks
- Basic search technique Local execution of query
then forwarding if TTLgt0 - Naïve flooding (all neighbors)
- Normalized flooding (only K neighbors)
- Random walks only one random neighbor, but W
walks initiated - Problem Only a limited of peers can be
searched (query horizon) - Possible improvements
- Routing indices
- Summary indexing (bloom filters etc)
- Result caching
- However Still limited scalability and coverage
9Taxonomy caching
- Basic idea
- Maintain taxonomic of remote contents in a
taxonomy cache (TCache) - Mapping from taxonomic concept to set of peers
- Advantages
- Cheaper to maintain than full-text index
- More applicable to multimedia data
- More robust wrt. changes in contents
- Used to improve query routing
- ? Higher recall and reduced latency
10Query routing using taxonomy cache (TCache)
- Basis one of traditional routing strategies
- Query forward peers PF
- Starting point PF neighborsPNPN1,,PNn
- Lookup in TCache Lookup(category)
?PCPC1,,PCm - PF PNPC
- Query forwarded to (subset of) PF
11Query forwarding alternatives (1)
- Query forward peers PF
- of neighbors (excl. previous) Nn
- matches from lookup Nc
- Ranking of peers in PC
- Based on of resources within a category
- High of resources considered experts
- TCB
- Highest ranked in PC the Nn neighbors in
PN1,,PNn - Forwarding to peer in PC called jump
- Jump can be to peer beyond query horizon!
- TCA
- If Nc Nn forward to Nn highest ranked peers in
PC - If Nc lt Nn forward to all Nc peers in PC
(Nn-Nc) randomly selected neighbors
12Query forwarding alternatives (2)
- TCCN
- If Nc Nn forward to all Nc peers in PC
- If Nc lt Nn forward to all Nc peers in PC
(Nn-Nc) neighbors - TCDN
- If Nc Nn forward to Nn/2 highest ranked peers
in PC random selection of Nn/2 other peers in
PC - If Nc lt Nn forward to all Nc peers in PC
(Nn-Nc) neighbors
13Distributing taxonomic information
- Basic mechanism piggyback matching category with
query result - Rsult returned through original path, possibly
involving jumps - Makes revalidation of contents intermediate
TCaches possible - Coverage will be gradually extended (beyond query
horizon) - Lazy distribution by gossiping also possible
14TCache architecture and maintenance
- Aim Provide efficient mapping C ?PC1,,PCm
- For each category Peers, of resources, and TTL
- TTL
- Regularly decremented
- Reset to start value at revalidation
- Caching policy Aggressive vs. selective
- Compacting techniques Peer upgrade non-expert
pruning
15Experimental setup
- Simulations
- Excerpts of DMOZ taxonomy
- Synthetic network topologies
- Resource allocation 80/20 rule
- Queries are taxonomic categories
- A number of peers have role as querying peers
- Measured Contacted peers, messages, recall and
latency - In this presentation Results using flooding and
TCDN query routing
16Improvements in recall
NM (F) NM (TC) Recall (F) Recall (TC)
TTL1 7.8 7.0 0.0022 0.0019
TTL3 166.7 166.0 0.0117 0.0149
TTL5 524.7 523.9 0.0282 0.0717
TTL7 1058.6 1057.7 0.0506 0.1835
TTL9 1721.0 1719.6 0.0773 0.2930
TTL11 2566.3 2566.0 0.1104 0.4012
TTL13 3536.5 3535.8 0.1477 0.4891
TTL15 4560.2 4558.7 0.1864 0.5755
17Primary reason for improvementMore intelligent
query forwarding
NC (F) NC (TC) Recall (F) Recall (TC)
TTL1 7.8 6.7 0.0022 0.0019
TTL3 45.3 53.4 0.0117 0.0149
TTL5 110.6 158.0 0.0282 0.0717
TTL7 199.9 346.8 0.0506 0.1835
TTL9 305.6 583.1 0.0773 0.2930
TTL11 437.7 840.3 0.1104 0.4012
TTL13 586.7 1120.6 0.1477 0.4891
TTL15 741.6 1372.4 0.1864 0.5755
18Improvement and scalability
19Latency reduction
- TCache results in very fast retrieval of first
results - Finding all results approximately similar
performance because flooding in both techniques
20Summary and further work
- Presented motivation and context
- Taxonomy-based querying and query routing
- TCache architecture and maintenance
- Experimental results proving our claims
- Future/ongoing work
- Employing the techniques for XML/XPath querying
in P2P context (to appear at IEEE P2P2006) - Integration of different taxonomies