Title: CBIR in P2P Systems and Percolation Search
1CBIR in P2P Systemsand Percolation Search
2References
- Comparison of Image Similarity Queries in P2P
Systems. UCLA. P2P06 - PRISM Indexing Multi Dimensional Data in P2P
Networks using Reference Vectors. UCSB. ACM
MM05. - Percolation Search in Power Law Networks Making
Unstructured Peer-To-Peer Networks Scalable.
UCLA. P2P04. Best Paper!
3Architecture
Full Text Search
Image Retrieval
P2P infrastructure
Unstructured P2P
Structured P2P
K Random Walk
Bloom filter
Super Peer
Percolation
PRISM
4Image Retrieval
Image Retrieval
P2P infrastructure
Unstructured P2P
Structured P2P
Percolation
PRISM
Super Peer
5Why applying (CB)IR to P2P?
- The recent success of image blogs (web 2.0) and
image sharing servers like flickr.com,
del.icio.us has shown that people have the wish
to share and to publish their images. - In absence of efficient algorithm for
sophisticated image feature operation, P2P may be
the best one can do.
6Annotation vs CBIR
- Annotation
- Subjective estimation
- No uniform schema
- Simple
- Converse to text search
- CBIRextract feature vector from images and
capture the similarity in multi-dimensional space - Capture image statistics
- Computationally costly
- Semantic gap
7Adopted Scheme
- Integrating aforementioned two schemes.
- It makes no difference between searching with
annotation and with text. (the latter is
aforementioned in my last presentation). - So how to apply CBIR onto P2P is what we need to
concern. - The feature Vector for CBIR
- 166D-HSV histogram
- Similarity and matching strategy
- Euclidean metric
- K-NN
8CBIR vs Text Search
- CBIR
- Image is high dimension, therefore, tremendous
cost for extracting feature. - Indexed by statistical feature vector.
- Matched by similarity or distance.
- Full Text Search
- Indexed by inverted file.
- Matched by results merging.( though the VSM are
introduced to compare with the similarity)
Transforming the inverted list to a chord key is
very feasible for the term is a good
representative of the whole list. As for CBIRs
feature vector, it is not so straightforward to
do such kind of transition.
9Adapt CBIR onto DHT
How to use a key to represent a feature vector
with respect to its content
- Requirements
- Index distribution should be based on the same
distance function as the retrieval. - Indices of similar objects should be stored at
the same peer. - It should be possible to identify the peers with
key that are likely to store the indices of
objects that are similar to a given query. - PRISM
10PRISM
Image Retrieval
P2P infrastructure
Unstructured P2P
Structured P2P
PRISM
Percolation
Super Peer
11Vector ?(PRISM) ? Key
- Compute the distances with reference vectors
respectively, yielding a serial numbers. - Sort them.
- Obtain the indices of each element in the sorted
distance list. - With a threshold k, take the top k indices, and
generate several pairs. - Calculate pairs into key.
12View from an example
- A vector X and several preassigned reference
vectors, r1, r2,, rn - Compute distance between X and ri as di(X, ri)
- Sort the array d1,d2, , dn ascend to dl1,dl2,
, dln - Obtain array with threshold K l1, l2, lk
- Pairs (l1, l2), (l1, l3.), (l2, l3)
- Compute key for every (li, lj.)
13How to publish and search with PRISM?
- One key match offers a chance to compare the
similarity. - Semantically, one key shows which of the two
reference vectors is closer to the feature vector?
14The measurement of experiment
- BaveRNEtraversedz/N
- BmaxRzNEout
- DaveCzNcopy/Nstore
- Dmax
- PaveRNfDave/z
- PmaxRNfDmax/z
15The measurement of PRISM without load balancing
recall
- Bave16.1bps
- Bmax100kbps
- Dave176KB
- Dmax840MB
- Pave670kFlOp/s
- Pmax 55GkFlOp/s
server/client fashion
16Apply CBIR onto Unstructured P2P
Image Retrieval
P2P infrastructure
Unstructured P2P
Structured P2P
Percolation
PRISM
Super Peer
17Percolation Search in Power Law Networks
- Based on the Power Law graph
- P(k)Ak-t
- Corollaries (t2)
- (k)Alnkmax, (k2)Akmax
- 3 steps
- Content list implantationshort random walk
- Query implantationshort random walk
- Bond percolationprobabilistic broadcast schema
- q?qc?(k)/(k2)lnkmax/kmax
18Percolation search
19Why q?(k)/(k2) ?A experimental perspective
P 0.6
P 0.5
- A square lattice in which squares are occupied
with an independent probability p, and unoccupied
with a probability 1-p. - A cluster is a complete collection of
interconnected sites - Threshold concentration ( Pc ) 0.5927 ( 2D
square site )
20The measurement
- BaveRNEqz/NRzNAln2 kmax/2kmax560bps
- BmaxRzNqkmaxRzNlnkmax2700kbps
- DaveCzNlog2N/N Czlog2N 300KB
- Dmax52MB
- PaveRNfDave/z7.9MFlOp/s
- PmaxRNfDmax/z 1.4GkFlOp/s
21Advantage of percolation search
- Scalable
- Search whole resources instead of part of them
- Low time and low bandwidth consuming
- When t2, time is strictly O(logN)!
22Super Peer
Image Retrieval
P2P infrastructure
Unstructured P2P
Structured P2P
Super Peer
Percolation
PRISM
23Super peer architecture
- Resources in leaf peers and searching index in
super peers. - Widely deployed
24Super node P2P measurement
- BaveRN(sN1)z/N?RzsN? RzN1/2560bps
- BmaxRzN31.2Mbps
- DaveCzN1/N Cz/s 16KB
- Dmax Cz/s?CzN1/211MB
- PaveRNfDave/z420kFlOp/s
- PmaxRNfDmax/z ?RCfNN1/2300MkFlOp/s
- We set sN1/2
25Conclusion(1)
- The performance of structured and unstructured
systems seem to be pretty close in our
application domain - Unstructured systems have the advantage of being
more flexible with respect to the queries they
allow. - I should mention that this conclusion is similar
to my recent presentation.
26Conclusion(2)
- CBIR over P2P
- Structured P2P
- Devise a mechanism to converse vector to
content-based key. (the performance is
unsatisfied) - Unstructured P2P
- Feasible.
- The content-based index of image is not
structured!
27QA
28Thank you!Richard TangNov. 2006