Title: Jena Persistent Storage Property Table Design
1Jena Persistent StorageProperty Table Design
- Kevin Wilkinson
- HP Labs Palo Alto
2Islands of Information
LegacyRelationalContinent
D2RQShipping Co.
Can ship from RDBMS to Jena (takes effort). Cant
get back. Goal a nearly seamless bridge.
Jena
3Topics
- Motivation Why Property Tables?
- Creating Property Table
- Accessing Property Tables
- Accessing Legacy Database Tables
- Implementation
4Background RDF Persistence
- Task persistent storage for an RDF graph
- Conventional solution - Triple Store
- Problems with Triple Store approach
- Property tables how they help
- Property tables in Jena today
5RDF Triple Store Approach
- Statement table and Symbols table (RDBMS)
- efficient in space
- - retrieval requires 3-way join
- - data patterns ignored (everyone has a name,
addr)
Statement table
Symbols table
6Problems with Triple Store
- Doesnt leverage patterns in data
- Cant leverage locality (spatial/temporal)
- Excessive load time (cant use db loader)
- Database optimizer useless no statistics
(?var, exempId, 123) vs. (?var, exgender,
M) - Alternatives native RDF store, object-relational
store, property tables
7Whats a Property Table?
- A table that stores patterns of RDF statements
- n-column prop tbl stores n-1 statements (1 col
per prop) - Augments, doesnt replace, triple store
- Partitioned statements a statement is stored in
TS or a prop tbl, never both - Partitioned properties all values for a given
property are in TS or a prop tbl
8Triple Store plus Property Tables
Triple Store Only
Person Property Table
Triple Store
9Property Table Pro/Con
- Advantages
- efficiencies in storage and access
- transparent to application
- enables access to legacy relational tables (which
can be modeled as property tables) - bridges the RDF-relational divide
- Disadvantages
- exhaustive search if property unknown (Tony
Blair, -, -) - queries dont compile to single SQL statement
- loss of flexibility fixed schema, typed property
values
10Property Tables in Jena Today
- Two tables created for each Jena2 graph
- Stmt table a triple store
- Reif table property table for reified stmts
(the only property table currently supported) - Our goal generalize the existing framework
11Creating Property Tables
- Types of property tables
- Column encoding
- Table specification
12Types of Property Tables
- Single-valued property table stores several
single-valued properties for a subject - Multi-valued property table stores one
multi-valued property for a subject - Property-class table stores class
membershiprdftype only property allowed in
multiple tables
13Property Table Column Encoding
- Issue how to encode values in columns?
- Option1 Jena encoding or symbol ids
- enhttp//www.hp.com/exfoo or 1234
- Option2 native db encoding
- foo
- Choice support both
- Option2 needed to access legacy database tables
14Property Table Creation
- Property tables are
- user-defined
- sharable across graphs
- created when graph is created
- specified in a meta-graph (RDF stmts)
- table name, type, column descriptors, etc.
15Accessing Property Tables
- Graph operations add, delete, find, query
- Recall, properties are partitioned over tables
- add, delete is applied to table for stmt prop
- find is applied to each table, results merged
- query requires special processing
16Add Stmt on Property Tables
- Add 1 statement create new row in tableuse null
for unknown property values - Add n statements (bulk add) order stmts by
subject and add one row for each subject - Delete is similar
17Find Operation on Prop Tbls
- Find (s,p,o), each s,p,o is value or
dont-carereturns all matching RDF statements - Goal process find with one SQL statement
- Triple-store 8 possible find patterns
- (-,-,-) (s,-,-) (-,p,-) (s,p,-) (-,-,o) (s,-,o)
(-,p,o) (s,p,o) - Predefine 8 SQL queries, one for each find
pattern - Property table of p props
- 4(p1) possible queries
- Dont predefine queries, generate and cache them
18Query Processing on Prop Tbls
- Goal of 1 SQL stmt for query not achievable
-
- e.g., the query
- ( Tony Blair, -, ?var
) - must search all tables and merge
results.This cant be done with a union query
19Query Proc on Prop Tbls contd
- But, some joins can be eliminated
- e.g., given a person name-address property table,
- the query
- (?var,name,-) (?var,addr,-)
- can be processed as an SQL select (no join)
- Over a triple store, this query requires a join.
20Accessing Legacy Database Tables
- Goal access legacy relational db tables
- Note D2RQ provides read-only access
- We want
- support for updates
- seamless querying across Jena and legacy db
- Challenge extend Jena property tables to support
legacy tables
21Legacy Table Columns
- Legacy table has a key and n value columns
- The key identifies some object (i.e., resource)
- If key is only 1 column, looks like a property
table - If key is gt 1 column (i.e., a compound key), need
a work-around (called virtual bnodes)
key
val1
val2
valn
22Support for Compound Keys
- Assume key components (keyi) identify objects
- So, a compound key represents a relationship
among the key components - RDF models compound relationships with bnodes
- Legacy table has no bnode, so we have to fake it
key
23Virtual Bnodes for Compound Keys
- Virtual bnode surrogate for a compound key
- identifies a table row, i.e., a relationship
instance - can be used in querying
- generated dynamically upon retrieval e.g.,
concatenate key components - compound key properties map from virtual bnode to
key components
24Example inventory table
Inventory table
Inventory relationship table Compound key
(storeId,partId) RDF graphs for each
row Compound key properties exstoreId,
expartId Virtual bnode ids (e.g., _i1) generated
dynamically
_i1
_i2
25Implementation
- In progress, first testing in a month or so
- Legacy database table support to follow(another
month) - Initially, RDQL support only
- Performance evaluation synthetic, scalable
dataset and benchmark queries
26Summary
- Property tables
- leverage patterns in RDF datasets
- performance benefit for some applications
- better enable use of relational tools (loaders,
optimizers) - Legacy relational tables
- can be updated
- look like property tables
- virtual bnodes to represent compound key