Semantic Network Analysis 11.07.05 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Semantic Network Analysis 11.07.05

Description:

Compositions of weights along a path. Semantic random walkers. Public domain simulator ... Karl Aberer, Philippe Cudr -Mauroux and Tim van Pelt. ISWC 2004 ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 26
Provided by: philippecu
Category:

less

Transcript and Presenter's Notes

Title: Semantic Network Analysis 11.07.05


1
Semantic Network Analysis 11.07.05
  • Analyzing Semantic Interoperability in
    Bioinformatic Database Networks
  • Philippe Cudré-Mauroux, EPFL
  • Joint work with
  • Julien Gaugaz, Adriana Budura and Karl Aberer

2
Overview
  • Peer Data Management Systems (PDMS)
  • Semantic Interoperability in the Large
  • Generatingfunctionologic framework
  • The Sequence Retrieval System
  • Degree distribution
  • Analysis of giant component
  • Weighted analysis
  • Conclusions

3
Beyond Keyword Search
  • searching semantically richer objects in large
    scale heterogeneous networks

ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xapModi
fyDategt
date?
ltesDofCreationgt 05/08/2004 lt/esDofCreationgt
?
?
?
?
?
ltmyRDFDategt Jan 1, 2005 lt/myRDFDategt
4
Decentralized Data Integration
  • Large Scale Information Systems (e.g., WWW)
  • Number of sources gt 100
  • Unreliable data
  • Autonomy
  • Semi-structured data
  • E.g., XML/RDF
  • No integrity constraints
  • No transactions
  • Simple SP queries
  • E.g., triple patterns, ranking
  • Schemata created by end users
  • Network churn
  • Distributed Databases
  • Number of sources lt 100
  • Consistent data
  • Coordination
  • Structured data
  • E.g., Relational data model
  • Integrity constraints
  • Transactions
  • Powerful queries
  • E.g., SQL, aggregation
  • Schemas created by administrators
  • Relatively Fixed topology

5
Data Integration LAV/GAV
  • Traditional database techniques (e.g., LAV/GAV)
    rely on centralized schemas to integrate data
    sources
  • Not applicable to our context
  • Scale (upper ontologies?)
  • Churn
  • Autonomy

Date
m(Date) myDate
m(Date) yourDate
myDate
yourDate
6
Semantic Interoperability
Q2ltGUIDgtp/GUIDlt/GUIDgt FOR p IN T12 WHERE
p/Creator LIKE "Robi"
Q1ltGUIDgtp/GUIDlt/GUIDgt FOR p IN
/Photoshop_Image WHERE p/Creator LIKE "Robi"
Photoshop (own schema)
WinFS (known schema)
ltPhotoshop_Imagegt ltGUIDgt178A8CD8865lt/GUIDgt
ltCreatorgtRobinsonlt/Creatorgt ltSubjectgt ltBaggt
ltItemgt Tunbridge Wells lt/Itemgt
ltItemgtRoyal Councillt/Itemgt lt/Baggt
lt/Subjectgt lt/Photoshop_Imagegt
ltWinFSImagegt ltGUIDgt178A8CD8866lt/GUIDgt ltAuthorgt
ltDisplayNamegt Henry Peach Robinson
ltDisplayNamegt ltRolegtPhotographerlt/Rolegt
ltAuthorgt ltKeywordgt Tunbridge lt/Keywordgt
ltKeywordgtCouncillt/Keywordgt lt/WinFSImagegt
T12 ltPhotoshop_Imagegt ltGUIDgtfs/GUIDlt/GUIDgt
ltCreatorgt fs/Author/DisplayName
lt/Creatorgtlt/Photoshop_ImagegtFOR fs IN
/WinFSImage
  • ? Extending semantic interoperability techniques
    to decentralized settings

7
1. Peer Data Management Systems
escDate ? xapCreateDate
weather
article
  • Pairwise mappings
  • Peer Data Management Systems (PDMS)
  • Local mappings overcome global heterogeneity
  • Iterative query rewriting

8
Semantic Mediation Layer
Semantic Mediation Layer
Correlated / Uncorrelated
Overlay Layer
Correlated / Uncorrelated
Physical layer
9
Schema-to-Schema Graph
  • Inter-organization of the different schemas used
    by the peers
  • Logical model
  • Directed
  • Weighted
  • Redundant

10
The Semantic Connectivity Graph
  • Definition (Semantic Interoperability)
  • Two peers are said to be semantically
    interoperable if they can forward queries to each
    other in the Schema-to-Schema graph, potentially
    through series of semantic translation links
  • Idea
  • As for physical network analyses, create a
    connectivity layer to account for semantic
    interoperability
  • The semantic connectivity Graph S
  • Unweighted, irreflexive and non-redundant version
    of the Schema-to-Schema graph

11
Observations
  • Theorem
  • Peers in a set Ps are semantically
    interoperable iff Ss is strongly connected, with
    Ss ? s ?p ? Ps, p?s
  • Observation 1
  • A set of peers Ps cannot be semantically
    interoperable if
  • Es lt Vs
  • Observation 2
  • A set of peers Ps is semantically
    interoperable if
  • Es gt Vs (Vs-1) - (Vs-1)

12
2. Semantic Interoperability in the Large
  • Question
  • How can we analyze semantic interoperability in
    large-scale PDMS?
  • Idea use percolation theory to detect the
    emergence of a strongly connected component in S
  • Necessary condition for vertex-strong
    connectivity
  • Necessary condition for semantic interoperability

13
The Model
  • Adaptation of a recent graph-theoretic framework
  • Newman, Strogatz, Watts 2001
  • Large-scale semantic graphs as random graphs with
    arbitrary degree distribution
  • Exponentially distributed, small-world,
    scale-free graphs
  • Specificities of our model
  • Strong clustering (clustering coefficient cc)
  • Bidirectionality (bidirectionality coefficient
    bc) (for directed networks)
  • Based on generatingfunctionology
  • Percolation ci gt 0

14
Size of the giant component
  • With u the smallest non-negative solution of
  • And G1 the distribution of edges from first to
    second-order neighbors

15
3. The Sequence Retrieval System (SRS)
  • Commercial information indexing and retrieval
    system
  • Bioinformatic libraries
  • EMBL
  • SwissProt
  • Prosite
  • Etc.
  • Schemas described in a custom language (Icarus)
  • Mappings (links) from one database to others

16
Why is SRS interesting?
  • Applying our heuristics on a real large-scale
    corpus of interconnected databases
  • More than 380 databanks
  • More than 500 (undirected) links
  • Data used by professionals on a daily basis

17
Crawling the SRS schema-to-schema graph
  • Custom crawler
  • As of May 2005
  • (EBI repository)
  • 388 nodes
  • 518 edges
  • Giant connected component 187 nodes
  • Power-law distribution of node degrees
  • Clustering coefficient 0.32
  • Diameter 9

18
Results
  • Connectivity indicator ci 25.4
  • Super-critical state
  • Size of the giant component
  • 0.47 (derived)
  • 0.48 (observed)

19
Graphs with same power-law degree distr.
  • Varying number of edges

20
10x Bigger Graph
21
Analyzing weighted networks
  • Do we have a sufficient number of good mappings?
  • Introducing quality measures from the mappings
  • Weights
  • Attribute / schema level
  • Cf. Chatty Web (WWW03)
  • Semantic query forwarding
  • Per-hop forwarding behaviors
  • Only forward if wi gt ?
  • ? 0 flooding
  • ? 1 exact answers

22
Weighted Results
  • Same degree distribution (388 nodes)
  • Uniformly distributed weights between 0 and 1

23
4. Conclusions
  • Analyzing a real network of bioinformatic
    databases
  • Accurate results (even for relatively small
    networks)
  • Weighted / unweighted
  • Current works
  • Compositions of weights along a path
  • Semantic random walkers
  • Public domain simulator
  • Future works
  • Analyzing other forwarding behaviors
  • Implementation in a real PDMS (self-organizing
    mappings)
  • GridVine

24
References
A Necessary Condition for Semantic
Interoperability in the Large Philippe
Cudré-Mauroux and Karl Aberer ODBASE
2004 GridVine Building Internet-Scale Semantic
Overlay Networks Karl Aberer, Philippe
Cudré-Mauroux and Tim van Pelt ISWC
2004 Semantic Overlay Networks (Tutorial) Karl
Aberer and Philippe Cudré-Mauroux VLDB 2005
complete reference list at http//lsirpeople.epf
l.ch/pcudre/
25
Thank you for your attention
  • Questions ?
Write a Comment
User Comments (0)
About PowerShow.com