SemDis: Discovering and Ranking Complex Relationships - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SemDis: Discovering and Ranking Complex Relationships

Description:

... existing pair of matching property sequences where entities in ... Spatial extent in context of employment and in context ... Spatial (where was the ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: chrishalas
Category:

less

Transcript and Presenter's Notes

Title: SemDis: Discovering and Ranking Complex Relationships


1
SemDis Discovering and Ranking Complex
Relationships
2
Semantic Analytics
  • Using Semantic Metadata for search, browsing and
    analysis
  • Uncover meaningful complex relationships
  • Application areas 8
  • Terrorist threat assessment
  • Anti-money laundering
  • Financial compliance

3
Semantic Associations 3
  • Complex relationships between entities
  • Sequence of properties connecting intermediate
    entities

e1
e5
4
Semantic Associations Defined
  • Semantic Connectivity
  • An alternating sequence of properties and
    entities (semantic path) exists between two
    entities
  • Semantic Similarity
  • An existing pair of matching property sequences
    where entities in question are respective origins
    or respective terminuses
  • Semantic Association
  • Two entities are semantically associated if they
    are either semantically connected or semantically
    similar

5
Why Undirected Edges?
  • Consider 3 statements
  • Actor ? acts_in ? Movie
  • Studio ? produces ? Movie
  • Studio ? owned_by ? Person
  • Instances

6
Association Discovery
  • Discovering anomalous patterns, rules, complex
    relationships
  • No predefined patterns or rules
  • Limitations
  • Information overloadextremely large result sets
  • Cannot determine significance/relevance

7
r (Actor_6025, Capt_8979)
8
Ranking
  • User specified criteria
  • User specifies what is considered significant
  • Criteria can be statistical or semantic 1
  • Relevance model
  • Predefined criteria
  • Rank based on novelty or rarity 6
  • May not be of interest

9
Semantic in Ranking
  • Schematic context
  • Specify classes and properties of interest
  • Create multiple contexts for a single search
  • Schematic structure
  • Rank based on property and/or class subsumption
  • Trust
  • How well trusted is an explicit relationships
  • How well can a complex relationship be trusted
  • Refraction 3
  • How well does a path conform to a given schema

10
Ranking Complex Relationships
11
  • kemafor

12
BRAHMS
  • BRAHMS - a workBench Rdf store And
    High-performance Memory System for Semantic
    Association Discovery (ISWC 2005)
  • Main-memory RDF storage with a rich API for very
    fast access to all of the knowledge in an RDF
    ontology, including the information in the
    associated RDF Schema
  • Written in C
  • bindings for Java (new SemDis API standard)
  • Optimized for maximum speed, minimize and strict
    control memory usage
  • Created as a general framework for testing graph
    algorithms on RDF/S knowledge base
  • Suitable for representation of large RDF
    ontologies (in the order of 10 million triples)

13
BRAHMS Design
  • Indexing for speed in basic operations
  • full indexing of statements allows linear-time
    merges of triples during search
  • S?PO, S?OP, O?SP, O?PS, P?SO, P?OS
  • Minimize memory usage
  • storage designed for main memory (also available
    memory mapped file on Unix)
  • Read-only knowledge base
  • precomputed and compacted indexes
  • indirect addressing (by node ID, not pointer)
  • Knowledge base as memory snapshot
  • RDF parsing and indexing happens only once

14
BRAHMS Design
  • Separation of instances base and schema
  • different types of classes for different resource
    types (instance, literals, schema class,
    property)
  • specialized statements to handle separately
    instance resources, literals and schema do not
    need to check for resource type during algorithm
    execution
  • each resource is uniquely identified in its group
    by numeric identifier
  • identifiers are contiguous 0..n in each group,
    allowing straightforward sorting and indexing
  • Taxonomy
  • precomputed full taxonomy for classes and
    properties (including all ancestors and
    descendants)

15
Future of Brahms
  • SPARQL currently implemented most of
    functionality over BRAHMS
  • create querying extension for regular expressions
    on graphs
  • Distributed storage (current work)
  • handle very large dataset (10s Gb) partitioned
    to cluster of computers
  • efficient distributed SPARQL query model and
    implementation

16
BRAHMS Results
  • Speed
  • outperform Sesame, Jena and Redland in k-hop
    limited semantic association searches using
    main-memory RDF model
  • big impact using large datasets, when other
    datastores either perform slowly or cannot
    execute algorith at all
  • Handling datasets
  • size limited by main-memory (physical) and/or
    system (32 Vs. 64bit)
  • able to efficiently run algorithms on large
    datasets, that other RDF storages cannot handle
    using memory-model
  • tested SWETO 255Mb, Lehigh University
    Univ(50, 0) 556Mb, synthetic 9Gb /64bit
    machine/

17
Test dataset statistics
18
Results - timing
Timing results of bi-directional Breadth-First
Search for paths of length 5 to 11 on Univ(10,0)
dataset 105Mb
19
Results - scalability
Timing results of bi-directional Breadth-First
Search for paths of length 4 to 8 on Univ(700,0)
dataset 7.8Gb
20
Spatiotemporal and Thematic Semantic Analytics
21
Querying Theme, Space, and Time
Thematic Temporal Analysis temporal query
operators for RDF data, use of temporal
relationships in thematic analysis
Thematic Spatial Analysis spatial query
operators for RDF data, use of spatial
relationships in thematic analysis and vice versa
Thematic Analysis RDF Query Languages, Semantic
Associations, thematic proximity
Temporal Analysis temporal logics, temporal
proximity (time difference, topological Allens
intervals)
Spatial Analysis GIS Operations, spatial
proximity (Distance, topological relations)
Ontologies and RDF are the glue between the three
dimensions and relationship-centric nature of
data underpin new analytical operators
Thematic, Spatial, Temporal Analysis Spatial
and temporal query operators for RDF data, mutual
influence of relationships
22
Proposed Model 3 Dimensions (Thematic,
Geospatial, Temporal)
23
Thematic Context for Spatial Extent
Spatial extent of non-spatial entities is derived
from thematic context
15 Spring Street
Wal-Mart 25
Lives At
(x3, y3)
Works For
Bill Allen
(x2, y2)
Fred Smith
Lives At
Georeferenced Coordinate Space
Dynamic Entity Named Place
150 Elm Street
Context path expression connecting dynamic
entity type to static entity type / event
(x1, y1)
Spatial extent in context of employment and in
context of residency
Example Context Residency of Co-Workers
works_for.works_for.lives_at
24
Queries based on spatiotemporal contexts
  • When was the 3rd Armored Division within Iraq?
  • Where were bombing targets of the US Air Force in
    April 2003?
  • How did the distribution of US airstrips in Iraq
    change during March 2003?
  • Show the dates and locations of battles of the
    101st Airborne Division
  • How does the battle pattern of the 3rd Armored
    Division compare to the pattern of the 1st
    Armored Division?
  • When and where were the 101st Airborne and the
    82nd Airborne likely to have interacted?

25
Spatiotemporal Semantic Associations
  • Define setting as a region of space in
    combination with an interval of time
  • How is entity X related to ST setting S? ( ?
    (entity, setting))

Al-Qaeda
Account_1234
125 Broad Street
Fred
Jim
Attack Site
How is Al-Qaeda connected to the setting of the
expected attack?
26
Spatiotemporal Semantic Associations
How are entity X and entity Y related w.r.t ST
setting S? ? (entity, entity, setting)
Hezbollah
Al-Qaeda
Account_1234
Fred
Jim
125 Broad Street
How are Al-Qaeda and Hezbullah connected with
respect to the attack site?
27
Spatiotemporal Semantic Associations
  • Idea of Virtual Links between entities based on
    Spatiotemporal information
  • Possible definition of rules to define a virtual
    link type
  • Collaboration entity X and Y are in close ST
    proximity more often than a given threshold
  • Knows entity X and Y are in close ST proximity
    regularly

28
Other Aspects
  • How do temporal relationships affect association
    semantics
  • 2 works_for relationships (overlapping times,
    disjoint times, etc)
  • Complex queries based on all 3 dimensions
  • Which location is the most likely storage
    facility
  • Thematic (correct capabilities, linked to correct
    people)
  • Spatial (where was the material last seen)
  • Temporal (how long can the material stay out of
    storage)
Write a Comment
User Comments (0)
About PowerShow.com