RDF languages and storages part 2 - indexing semi-structure data PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: RDF languages and storages part 2 - indexing semi-structure data


1
RDF languages and storagespart 2 - indexing
semi-structure data
  • Maciej JanikConrad IbanezCSCI 8350, Fall 2004

2
Outline
  • Jena storage
  • Indexing techniques

3
Jena
  • Implemented in Java
  • One of the most popularly used RDF storages and
    query engines
  • Supports RDF, RDFS and OWL
  • In memory and persistent storage (Oracle, MySQL,
    PostgreSQL)
  • RDQL
  • Reasoning/inference engine

4
Jena - storage schema
  • Previous version used normalized relational DB
    tables
  • statements
  • literals
  • resources
  • Taken approach to store triples as (Subject,
    Predicate, Object) in denormalized tables
  • Optimization for common statement patterns -
    grouping of properties

5
Jena - storage
  • Normalized tables
  • Denormalized

Efficient RDF Storage and Retrieval in Jena2 -
Wilkinson et al.
6
Jena - storage
  • Do certain trade-off for space and search time
  • Cluster properties that are likely to be accessed
    together - optimize for common patterns
  • Special treatment of reified statements

7
Jena - graph abstraction
  • Graph interface is separated from (persistent)
    triple storage layer
  • Special support for different types of graphs -
    optimized for performance
  • Support operations like add, delete, find.

8
Jena - query processing
  • Converting multiple patterns in query into one
    query to DB
  • Use DB query optimizer instead of executing
    multiple queries from Jena level (as it was in
    Jena1)
  • Associate a table with pattern (best) or span
    pattern between tables (requires join operation)
  • Query may span between different graphs, but it
    can be optimized only if they are in the same
    database

9
What to index? How to index?
10
Indexing semistructured data
  • XML cannot be indexed directly as relational DB
  • Indexing may take advantage of tree structure
  • depth of node
  • common path from the root
  • convert each path to string expression
  • precalculate the path tree

11
Indexing semistructured data
  • Idea is based on Particias trie
  • Index should scale with the growth of data
  • Path together with leaf is encoded into string
    -gt the Index Fabric

A Fast Index for Semistructured Data - Brian F.
Cooper et al.
12
A Layered Index
A Fast Index for Semistructured Data - Brian F.
Cooper et al.
13
Index Fabric
  • Index is used to accelerate path expressions -
    mainly for queries that ask for root-to-leaf path
  • Idea of prefix encoding
  • xml ltAgtalphaltBgtbetaltCgtgammalt/Cgtlt/Bgtlt/Agt
  • paths ltAgtalpha ltAgtltBgtbeta ltAgtltBgtltCgtgamma
  • encoded A alpha A B beta A B C gamma
  • infix (not common) A alpha B beta C gamma
  • Convert path to string for fast searches
  • Replace tags with non-terminal characters (like
    in automata)

14
Index Fabric - raw paths
A Fast Index for Semistructured Data - Brian F.
Cooper et al.
15
Graphs - how to index?
Backbone
  • http//www.aisee.com/

16
Graphs - how to index?
Tree-type - prefixes - tries
  • http//www.aisee.com/

17
Graphs - how to index?
T-index Path templates
Index Structure for Path Expressions - Tova
Milo, Dan Suciu
18
Graphs - how to index?
Landmarks
  • http//www.aisee.com/

19
Indexing - summary
  • Indexing semistructure data
  • index fabric - encoding, multilayered
  • common prefixes - trie structure
  • backbone - highways between points
  • landmarks - county division
  • path templates - precalculated expressions
  • clustering - grouping by theme access
  • Indexing such data is NOT easy, solution depends
    how you want to search the graph

20
References
  • Efficient RDF Storage and Retrieval in Jena2 -
    Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave
    Reynolds
  • A Fast Index for Semistructured Data - Brian F.
    Cooper, Neal Sample, Michael J. Franklin, Gisli
    Hjaltason, Moshe Shadmon
  • Index Structures for Path Expressions - Tova
    Milo, Dan Suciu
Write a Comment
User Comments (0)
About PowerShow.com