Title: Semantic Network Analysis 11.07.05
1Semantic Network Analysis 11.07.05
- Analyzing Semantic Interoperability in
Bioinformatic Database Networks - Philippe Cudré-Mauroux, EPFL
- Joint work with
- Julien Gaugaz, Adriana Budura and Karl Aberer
2Overview
- Peer Data Management Systems (PDMS)
- Semantic Interoperability in the Large
- Generatingfunctionologic framework
- The Sequence Retrieval System
- Degree distribution
- Analysis of giant component
- Weighted analysis
- Conclusions
3Beyond Keyword Search
- searching semantically richer objects in large
scale heterogeneous networks
ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xapModi
fyDategt
date?
ltesDofCreationgt 05/08/2004 lt/esDofCreationgt
?
?
?
?
?
ltmyRDFDategt Jan 1, 2005 lt/myRDFDategt
4Decentralized Data Integration
- Large Scale Information Systems (e.g., WWW)
- Number of sources gt 100
- Unreliable data
- Autonomy
- Semi-structured data
- E.g., XML/RDF
- No integrity constraints
- No transactions
- Simple SP queries
- E.g., triple patterns, ranking
- Schemata created by end users
- Network churn
- Distributed Databases
- Number of sources lt 100
- Consistent data
- Coordination
- Structured data
- E.g., Relational data model
- Integrity constraints
- Transactions
- Powerful queries
- E.g., SQL, aggregation
- Schemas created by administrators
- Relatively Fixed topology
5Data Integration LAV/GAV
- Traditional database techniques (e.g., LAV/GAV)
rely on centralized schemas to integrate data
sources - Not applicable to our context
- Scale (upper ontologies?)
- Churn
- Autonomy
Date
m(Date) myDate
m(Date) yourDate
myDate
yourDate
6Semantic Interoperability
Q2ltGUIDgtp/GUIDlt/GUIDgt FOR p IN T12 WHERE
p/Creator LIKE "Robi"
Q1ltGUIDgtp/GUIDlt/GUIDgt FOR p IN
/Photoshop_Image WHERE p/Creator LIKE "Robi"
Photoshop (own schema)
WinFS (known schema)
ltPhotoshop_Imagegt ltGUIDgt178A8CD8865lt/GUIDgt
ltCreatorgtRobinsonlt/Creatorgt ltSubjectgt ltBaggt
ltItemgt Tunbridge Wells lt/Itemgt
ltItemgtRoyal Councillt/Itemgt lt/Baggt
lt/Subjectgt lt/Photoshop_Imagegt
ltWinFSImagegt ltGUIDgt178A8CD8866lt/GUIDgt ltAuthorgt
ltDisplayNamegt Henry Peach Robinson
ltDisplayNamegt ltRolegtPhotographerlt/Rolegt
ltAuthorgt ltKeywordgt Tunbridge lt/Keywordgt
ltKeywordgtCouncillt/Keywordgt lt/WinFSImagegt
T12 ltPhotoshop_Imagegt ltGUIDgtfs/GUIDlt/GUIDgt
ltCreatorgt fs/Author/DisplayName
lt/Creatorgtlt/Photoshop_ImagegtFOR fs IN
/WinFSImage
- ? Extending semantic interoperability techniques
to decentralized settings
71. Peer Data Management Systems
escDate ? xapCreateDate
weather
article
- Pairwise mappings
- Peer Data Management Systems (PDMS)
- Local mappings overcome global heterogeneity
- Iterative query rewriting
8Semantic Mediation Layer
Semantic Mediation Layer
Correlated / Uncorrelated
Overlay Layer
Correlated / Uncorrelated
Physical layer
9Schema-to-Schema Graph
- Inter-organization of the different schemas used
by the peers - Logical model
- Directed
- Weighted
- Redundant
10The Semantic Connectivity Graph
- Definition (Semantic Interoperability)
- Two peers are said to be semantically
interoperable if they can forward queries to each
other in the Schema-to-Schema graph, potentially
through series of semantic translation links - Idea
- As for physical network analyses, create a
connectivity layer to account for semantic
interoperability - The semantic connectivity Graph S
- Unweighted, irreflexive and non-redundant version
of the Schema-to-Schema graph
11Observations
- Theorem
- Peers in a set Ps are semantically
interoperable iff Ss is strongly connected, with
Ss ? s ?p ? Ps, p?s - Observation 1
- A set of peers Ps cannot be semantically
interoperable if - Es lt Vs
- Observation 2
- A set of peers Ps is semantically
interoperable if - Es gt Vs (Vs-1) - (Vs-1)
122. Semantic Interoperability in the Large
- Question
- How can we analyze semantic interoperability in
large-scale PDMS? - Idea use percolation theory to detect the
emergence of a strongly connected component in S - Necessary condition for vertex-strong
connectivity - Necessary condition for semantic interoperability
13The Model
- Adaptation of a recent graph-theoretic framework
- Newman, Strogatz, Watts 2001
- Large-scale semantic graphs as random graphs with
arbitrary degree distribution - Exponentially distributed, small-world,
scale-free graphs - Specificities of our model
- Strong clustering (clustering coefficient cc)
- Bidirectionality (bidirectionality coefficient
bc) (for directed networks) - Based on generatingfunctionology
-
- Percolation ci gt 0
14Size of the giant component
- With u the smallest non-negative solution of
- And G1 the distribution of edges from first to
second-order neighbors
153. The Sequence Retrieval System (SRS)
- Commercial information indexing and retrieval
system - Bioinformatic libraries
- EMBL
- SwissProt
- Prosite
- Etc.
- Schemas described in a custom language (Icarus)
- Mappings (links) from one database to others
16Why is SRS interesting?
- Applying our heuristics on a real large-scale
corpus of interconnected databases - More than 380 databanks
- More than 500 (undirected) links
- Data used by professionals on a daily basis
17Crawling the SRS schema-to-schema graph
- Custom crawler
- As of May 2005
- (EBI repository)
- 388 nodes
- 518 edges
- Giant connected component 187 nodes
- Power-law distribution of node degrees
- Clustering coefficient 0.32
- Diameter 9
18Results
- Connectivity indicator ci 25.4
- Super-critical state
- Size of the giant component
- 0.47 (derived)
- 0.48 (observed)
19Graphs with same power-law degree distr.
2010x Bigger Graph
21Analyzing weighted networks
- Do we have a sufficient number of good mappings?
- Introducing quality measures from the mappings
- Weights
- Attribute / schema level
- Cf. Chatty Web (WWW03)
- Semantic query forwarding
- Per-hop forwarding behaviors
- Only forward if wi gt ?
- ? 0 flooding
- ? 1 exact answers
22Weighted Results
- Same degree distribution (388 nodes)
- Uniformly distributed weights between 0 and 1
234. Conclusions
- Analyzing a real network of bioinformatic
databases - Accurate results (even for relatively small
networks) - Weighted / unweighted
- Current works
- Compositions of weights along a path
- Semantic random walkers
- Public domain simulator
- Future works
- Analyzing other forwarding behaviors
- Implementation in a real PDMS (self-organizing
mappings) - GridVine
24References
A Necessary Condition for Semantic
Interoperability in the Large Philippe
Cudré-Mauroux and Karl Aberer ODBASE
2004 GridVine Building Internet-Scale Semantic
Overlay Networks Karl Aberer, Philippe
Cudré-Mauroux and Tim van Pelt ISWC
2004 Semantic Overlay Networks (Tutorial) Karl
Aberer and Philippe Cudré-Mauroux VLDB 2005
complete reference list at http//lsirpeople.epf
l.ch/pcudre/
25Thank you for your attention