Indexing Data Relationships - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing Data Relationships

Description:

Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc. Overview Data relationships can be complex. Hierarchical views ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 22
Provided by: ValuedGa240
Category:

less

Transcript and Presenter's Notes

Title: Indexing Data Relationships


1
Indexing Data Relationships
  • Michael J. Franklin
  • University of California, Berkeley
  • RightOrder Inc.

2
Overview
  • Data relationships can be complex.
  • Hierarchical views XML, LDAP,
  • Semistructure dynamic schema
  • ApproachEncode paths as tagged strings
  • raw paths encode structure
  • refined paths accelerate lookups
  • Index strings in a highly-compact structure.
  • Live on top of, next to or inside DBMS.
  • Benefits
  • Performance, Scalability Adaptivity
  • Leverages mature DBMS technology

3
Raw paths w/Designators
4
Refined paths
  • Optimize specific access paths

Find invoices where X sold to Y
Find invoices where X bought Y and Z
Find invoices where a buyer bought X, Y and Z
5
Index Fabric
  • An index structure for long strings.
  • Provides fast lookups
  • Handles long strings
  • Ideal substrate for designated keys
  • Based on Patricia tries
  • Highly compressed string representation
  • Cost in index independent of string length
  • But, need to balance.

6
Patricia tries
Indexes first point of difference between keys
greenbeans
greentea
D. R. Morrison. PATRICIA Practical algorithm
to retrieve information coded in alphanumeric.
J. ACM, 15 (1968) pp. 514-534
7
Multiple Hierarchical Views
  • Can store multiple permulations of relationships
  • Find animals and the plants they eat
  • Find plants and the animals that eat them
  • Represent as a new set of keys
  • Store data once using permutation records

8
Example
a
b
a
w
o
c
b
a
c
c
9
Example
a
b
a
w
o
c
b
a
c
c
a
b
10
Balancing Patricia tries
11
Balancing Patricia tries
Step 1 divide trie into blocks
12
Balancing Patricia tries
Step 2 build another layer
g
e
Layer 1 Layer 0
13
Balancing Patricia tries
Search for cash
greenbeans
g
e
Layer 1 Layer 0
14
Balancing Patricia tries
Search for cash
0
g
c
g
2
2
e
a
w
r
e
2
t
grass
corn
cow
b
greenbeans
greenbeans
greentea
Layer 1 Layer 0
15
Balancing Patricia tries
Search for cash
0
g
c
g
2
2
e
a
w
r
greenbeans
e
2
t
grass
corn
cow
b
greenbeans
greentea
Layer 1 Layer 0
16
Balancing Patricia tries
17
Performance
  • Number of layers is small
  • Fixed (small) space per key
  • High branching factor per block
  • Bushy, shallow tree
  • Example
  • 8 KB blocks
  • 32 bit pointers 2 bytes for keys/structure
  • 1000 pointers per block
  • 3 layers for 1 billion pointers to data (10003)
  • Upper layers are tiny (10 megabytes), in RAM
  • Only layer 0 on disk
  • Usually one index I/O per key lookup

Data
18
Find publications by co-authors
10,000 queries
RDBMS Edge mapping
19
Find publications by co-authors
10,000 queries
20
Conclusion
  • Index arbitrary relationships
  • Encode as designated strings
  • Relationships and structures can be complex
  • Index many data access paths
  • No need for DTD or pre-defined schema
  • Index Fabric
  • Special data structure for long keys
  • High performance key lookups
  • Supports designator encoding

21
For more information
  • technology_at_rightorder.com
  • www.rightorder.com
Write a Comment
User Comments (0)
About PowerShow.com