Title: SemDis: Discovering and Ranking Complex Relationships
1SemDis Discovering and Ranking Complex
Relationships
2Semantic Analytics
- Using Semantic Metadata for search, browsing and
analysis - Uncover meaningful complex relationships
- Application areas 8
- Terrorist threat assessment
- Anti-money laundering
- Financial compliance
3Semantic Associations 3
- Complex relationships between entities
- Sequence of properties connecting intermediate
entities
e1
e5
4Semantic Associations Defined
- Semantic Connectivity
- An alternating sequence of properties and
entities (semantic path) exists between two
entities - Semantic Similarity
- An existing pair of matching property sequences
where entities in question are respective origins
or respective terminuses - Semantic Association
- Two entities are semantically associated if they
are either semantically connected or semantically
similar
5Why Undirected Edges?
- Consider 3 statements
- Actor ? acts_in ? Movie
- Studio ? produces ? Movie
- Studio ? owned_by ? Person
- Instances
6Association Discovery
- Discovering anomalous patterns, rules, complex
relationships - No predefined patterns or rules
- Limitations
- Information overloadextremely large result sets
- Cannot determine significance/relevance
7r (Actor_6025, Capt_8979)
8Ranking
- User specified criteria
- User specifies what is considered significant
- Criteria can be statistical or semantic 1
- Relevance model
- Predefined criteria
- Rank based on novelty or rarity 6
- May not be of interest
9Semantic in Ranking
- Schematic context
- Specify classes and properties of interest
- Create multiple contexts for a single search
- Schematic structure
- Rank based on property and/or class subsumption
- Trust
- How well trusted is an explicit relationships
- How well can a complex relationship be trusted
- Refraction 3
- How well does a path conform to a given schema
10Ranking Complex Relationships
11 12BRAHMS
- BRAHMS - a workBench Rdf store And
High-performance Memory System for Semantic
Association Discovery (ISWC 2005) - Main-memory RDF storage with a rich API for very
fast access to all of the knowledge in an RDF
ontology, including the information in the
associated RDF Schema - Written in C
- bindings for Java (new SemDis API standard)
- Optimized for maximum speed, minimize and strict
control memory usage - Created as a general framework for testing graph
algorithms on RDF/S knowledge base - Suitable for representation of large RDF
ontologies (in the order of 10 million triples)
13BRAHMS Design
- Indexing for speed in basic operations
- full indexing of statements allows linear-time
merges of triples during search - S?PO, S?OP, O?SP, O?PS, P?SO, P?OS
- Minimize memory usage
- storage designed for main memory (also available
memory mapped file on Unix) - Read-only knowledge base
- precomputed and compacted indexes
- indirect addressing (by node ID, not pointer)
- Knowledge base as memory snapshot
- RDF parsing and indexing happens only once
14BRAHMS Design
- Separation of instances base and schema
- different types of classes for different resource
types (instance, literals, schema class,
property) - specialized statements to handle separately
instance resources, literals and schema do not
need to check for resource type during algorithm
execution - each resource is uniquely identified in its group
by numeric identifier - identifiers are contiguous 0..n in each group,
allowing straightforward sorting and indexing - Taxonomy
- precomputed full taxonomy for classes and
properties (including all ancestors and
descendants)
15Future of Brahms
- SPARQL currently implemented most of
functionality over BRAHMS - create querying extension for regular expressions
on graphs - Distributed storage (current work)
- handle very large dataset (10s Gb) partitioned
to cluster of computers - efficient distributed SPARQL query model and
implementation
16BRAHMS Results
- Speed
- outperform Sesame, Jena and Redland in k-hop
limited semantic association searches using
main-memory RDF model - big impact using large datasets, when other
datastores either perform slowly or cannot
execute algorith at all - Handling datasets
- size limited by main-memory (physical) and/or
system (32 Vs. 64bit) - able to efficiently run algorithms on large
datasets, that other RDF storages cannot handle
using memory-model - tested SWETO 255Mb, Lehigh University
Univ(50, 0) 556Mb, synthetic 9Gb /64bit
machine/
17Test dataset statistics
18Results - timing
Timing results of bi-directional Breadth-First
Search for paths of length 5 to 11 on Univ(10,0)
dataset 105Mb
19Results - scalability
Timing results of bi-directional Breadth-First
Search for paths of length 4 to 8 on Univ(700,0)
dataset 7.8Gb
20Spatiotemporal and Thematic Semantic Analytics
21Querying Theme, Space, and Time
Thematic Temporal Analysis temporal query
operators for RDF data, use of temporal
relationships in thematic analysis
Thematic Spatial Analysis spatial query
operators for RDF data, use of spatial
relationships in thematic analysis and vice versa
Thematic Analysis RDF Query Languages, Semantic
Associations, thematic proximity
Temporal Analysis temporal logics, temporal
proximity (time difference, topological Allens
intervals)
Spatial Analysis GIS Operations, spatial
proximity (Distance, topological relations)
Ontologies and RDF are the glue between the three
dimensions and relationship-centric nature of
data underpin new analytical operators
Thematic, Spatial, Temporal Analysis Spatial
and temporal query operators for RDF data, mutual
influence of relationships
22Proposed Model 3 Dimensions (Thematic,
Geospatial, Temporal)
23Thematic Context for Spatial Extent
Spatial extent of non-spatial entities is derived
from thematic context
15 Spring Street
Wal-Mart 25
Lives At
(x3, y3)
Works For
Bill Allen
(x2, y2)
Fred Smith
Lives At
Georeferenced Coordinate Space
Dynamic Entity Named Place
150 Elm Street
Context path expression connecting dynamic
entity type to static entity type / event
(x1, y1)
Spatial extent in context of employment and in
context of residency
Example Context Residency of Co-Workers
works_for.works_for.lives_at
24Queries based on spatiotemporal contexts
- When was the 3rd Armored Division within Iraq?
- Where were bombing targets of the US Air Force in
April 2003? - How did the distribution of US airstrips in Iraq
change during March 2003? - Show the dates and locations of battles of the
101st Airborne Division - How does the battle pattern of the 3rd Armored
Division compare to the pattern of the 1st
Armored Division? - When and where were the 101st Airborne and the
82nd Airborne likely to have interacted?
25Spatiotemporal Semantic Associations
- Define setting as a region of space in
combination with an interval of time - How is entity X related to ST setting S? ( ?
(entity, setting))
Al-Qaeda
Account_1234
125 Broad Street
Fred
Jim
Attack Site
How is Al-Qaeda connected to the setting of the
expected attack?
26Spatiotemporal Semantic Associations
How are entity X and entity Y related w.r.t ST
setting S? ? (entity, entity, setting)
Hezbollah
Al-Qaeda
Account_1234
Fred
Jim
125 Broad Street
How are Al-Qaeda and Hezbullah connected with
respect to the attack site?
27Spatiotemporal Semantic Associations
- Idea of Virtual Links between entities based on
Spatiotemporal information - Possible definition of rules to define a virtual
link type - Collaboration entity X and Y are in close ST
proximity more often than a given threshold - Knows entity X and Y are in close ST proximity
regularly
28Other Aspects
- How do temporal relationships affect association
semantics - 2 works_for relationships (overlapping times,
disjoint times, etc) - Complex queries based on all 3 dimensions
- Which location is the most likely storage
facility - Thematic (correct capabilities, linked to correct
people) - Spatial (where was the material last seen)
- Temporal (how long can the material stay out of
storage)