Integrating Semantics-Based Access Mechanisms with P2P File Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Integrating Semantics-Based Access Mechanisms with P2P File Systems

Description:

Integrating Semantics-Based Access Mechanisms with P2P File Systems Yingwu Zhu, Honghao Wang and Yiming Hu – PowerPoint PPT presentation

Number of Views:150

Avg rating:3.0/5.0

Slides: 27

Provided by: Jaso1242

Learn more at: http://fac-staff.seattleu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Integrating Semantics-Based Access Mechanisms with P2P File Systems

1
Integrating Semantics-Based Access Mechanisms
with P2P File Systems

Yingwu Zhu, Honghao Wang and Yiming Hu

2
Outline

Background
System Design
Related Work
Conclusions and Furture Work

3
Background

Current P2P file systems (e.g.,CFS and PAST)
Layer FS functionalities on a distributed hash
table (DHT), e.g., chord, pastry
Do not support semantics-based access
Because DHTs support only exact-match lookups

4
Background
Layer Responsibity
FS Stores/retrieves file objects into/from the DHT Presents a file system interface to applications/ users
DHT Supports a hash-table interface of get(fileID) and put(fileID, file)
Software layering in a P2P file system
5
Motivation

A problem of P2P file systems
Supports only exact-match lookups given a file
object identifier fileID
get(fileID) retrieves the file corresponding to
the fileID
put(fileID, file) stores the file with the
fileID as a DHT key
Extending exact-match lookups to semantic access
is non-trivial

6
Motivation

A challenge to P2P file systems
Provides convenient access to vast amount of
information
E.g., provide semantics-based search capabilities
to efficiently locate semantically close files
for browsing and purging, etc.

7
System Design

Targeted Application
System Architecture
Semantic Indexing and Locating
Evalutation

8
Targeted Application

Semantic search is expressed in natural language.
Query locate files similar to f1
The query results are materialized via semantic
directories
Not a simple keyword match loate files with k1,
k2 and k3k1, k2 and k3 are three distinct
keywords

9
System Architecture

Extends a P2P file system to support
semantics-based access
Major Components
Semantic Extractor Registry
Semantic Indexing and Locating Utility

10
System Architecture
Application/User
FS
Extractor Registry
Semantic Indexing and Locating Utility
DHT
Major components of the system architecture
11
Semantic Extractor Registry

A set of semantic extractors
Leverage IR algorithms, VSM and LSI
Represent a file as a semantic vector (SV),
typcially 200-300 keywords
Semantically close files have similar SVs

12
Semantic Indexing and Locating Untility

Provides semantics-based indexing and retrieval
capabilities
Relies on the property of Locality Sensitive Hash
Fucntions (LSH)
Derives a small number of semantic identifiers
(semID) from a files SV as the DHT keys for
indexing and locating

13
Semantic Indexing and Locating Untility

Goals
The indice of semantically close files are
clustered to the same peer nodes with high
probability (nearly 100)
Efficiently locate semantically close files by
searching a small number of peer nodes (e.g, 20)

14
Locality Sensitive Hashing

A family of hash functions F is locality
sensitive if ?h?F operating on two sets A and B,
we haveP h?F h(A)h(B) sim(A,B)
Min-wise independent permutations are LSH
sim(A,B) A? B / A? B

Similarity function
15
Semantic Indexing

Given a files SV

Step 1 Drive a small number of semIDs from the
SV using LSH

Step 2 Indexing the file by having these semIDs
as the DHT keys

16
Semantic Indexing

Using n groups of m hash functions
Results
The indice of semantically close files are hashed
to the same peers with probability ? 1-(1-pm)n
P is expected to be high for semantically close
files, so is the probability
psim(f1,f2), similarity between two filess
SVs

17
Semantic Indexing

Given a files SV A
proc sem_index (A)
convert A into A \\ A is a set of
integer by using SHA-1
for each gj do \\ gj is one of n
group of hash funcions
semIDj 0
for each hi in gj do \\ gj
has m hash functions
semIDj hi(A) \\
is a XOR operation
endfor
endfor
for each semIDj do
insert the tuple ltsemID, fileID, Agt
into DHT by having semIDj as the DHT
key \\ semantic indexing
endfor
endproc

18
Semantic Locating

Given a querys SV

Step 1 Derive a small number of semIDs from the
SV using LSH

Step 2 Locate those semantically close files by
having these semIDs as the DHT keys

Goal answer a query by consulting only a small
number of peer nodes

19
Demostration of Semantic Indexing and Locating
A
B
C
D
Peer node
A, B, C and D are semantically close files
User1
User2
Query locate files similar to D
20
Evaluation

Load distribution of semantic indexing
Semantic indices per peer node
Performance of semantic locating
Percentage of semantically close files that can
be located (Recall)

21
Semantic Indexing
Number of file indexes per node
Number of peer nodes
Load distribution when the system indexes 10,000
files
22
Semantic Indexing
Nmber of file indexes per node
Number of indexed files (x1000)
Load distribution in a 1000 node system
23
Perf. of Semantic Locating
5 10 15 20
5 84 92 94 96
2 94 99 100 100
n
recall
m
1 Apply n groups of m hash functions
2 Percentage of files located (128-byte
fingerprint limit as a SV)
3 m and n determine the performance of semantic
locating
24
Related Work

P2P file systems like CFS and PAST
Exact-match lookups in DHTs
Traditional semantic file systems like SFS and
HAC
IR algorithms as VSM and LSI
LSH and its related applications (e.g.,the
nearest neighbor problem, cached data location in
database)

25
Conclusions