Title: Semantic Optimization of OQL Queries
1Semantic Optimization of OQL Queries
Greek State Scholarships Foundation
National Bank of Greece
- Agathoniki Trigoni
- Dr Ken Moody, Dr Gavin Bierman
- September 2002
2Outline
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
3Outline Part 1
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
4ODMGs OQL
Path Expressions
Object comparison
Complex objects
Object Query Language
Multiple Collection Types
Method Invocation
Polymorphic collections
Late binding
Casting Mechanism
select s.age() from Students s
select p.activities from Persons p
select ((Student)p).grade from Persons p where
course of study in p.activities
first (Customer_Queue) in set(Jean, John, Jeff)
Maria.project John.project
DBAdm.projects0.department.address.street
5Outline Part 2
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
6Type Checking vs Type Inference
OQL query
OQL query
Schema
Type Checking Model
Type Inference Model
Specific Query Type
General Schema
General Query Type
type error
type error
7Type Checking Type system
Initial Type System
Subtyping Relation - Examples
8Type Checking Type Rules
Class information
Query definitions
Named objects
Type environment
example
9Type InferenceMotivation
- Queries addressed against multiple schemata
- Schema reconciliation interoperation
- Semi-structured data
- Generic queries and query definitions
DB/Schema A
DB/Schema B
Query Applications
Query
Query
Query
Query
DB/Schema C
Semi-Structured data
10Type Inference Objectives
- Principal Type of an OQL Query
- Minimum Schema Type Requirements
define Dept_Managers (dept) as select e from
Employees e where e.positionmanager and
e.departmentdept select d from Departments
d where count (Dept_Managers (d)) gt 5
Employee Department department String
position Department
Employee int department String
position Departments List (int)
Employee String department String
position Department int id
11Outline Part 3
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
12Monoid Comprehension Calculus
qualifiers
accumulator
head
13Comprehensions andWell-formed homomorphisms
(C1 union C2) x
bagof
X
X
14Translating OQL to Monoid Calculus Expressions
- Nonhomomorphic functions
- bagof (e)
- listof (e)
- arrayof (e)
- orderof (e)
New Monoids and OQL Constructs
New translation rules from OQL to MC
15Normalization of Calculus Expressions
A set of rewrite rules are defined in order to
transform calculus expressions into canonical
forms
16Outline Part 4
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
17Object Algebra
Examples of existing algebraic operators
Examples of new algebraic operators
18Translation of Monoid Calculus to Algebraic
Operators
19Benefits of the Algebraic Representation
- Operators are annotated with monoid information
- New operators have been added to address
- nonhomomorphic transformations
- merge operations
- The proposed translation algorithm from calculus
to algebra is defined so that - generators of nested comprehensions are processed
locally as much as possible - predicates relating variables of inner and outer
comprehensions are pushed down
20Outline Part 5
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
21Background Semantic Optimization
- Semantic Optimization Heuristics
- H1 Removing constraints
- H2 Adding constraints on indices
select x from Employees as x where
x.salarygt35,000
select x from Employees as x where
x.salarygt35,000 and x.year_of_birthlt1975
and x.year_of_employmentlt1995
22Semantic Optimization1st Issue
a) What if association rules have exceptions?
Eoid2,oid5
salarygt35,000 ? year_of_birthlt1975
23Semantic Optimization2nd Issue
b) What if none of the rules can be used to apply
the heuristics?
select x from Employees x where x.salarygt35,000
and x.year_of_birthlt1975
- salarygt35,000 ? positionmanager
- position manager ? year_of_birthlt1970
24Semantic OptimizerObjectives
What if
rules have exceptions?
none of the rules is applicable?
The proposed algorithm applies the semantic
optimization heuristics
- identifying direct or indirect rules useful for
the optimization - considering the exceptions of all rules involved
in the optimization solution
25Algorithm - Step 1Backtracking
C1-R13
C2-R21
C5-R52
C1-R11
C4-R41
Ccomp2
C5-R54
Ccomp1
C3-R31
C5-R53
C5-R51
26Step 2 Combining annotations
E4
E2
C2-R21
C5-R52
E5
C1-R11
C4-R41
Ccomp2
E1
E3
C5-R54
Ccomp1
C3-R31
Epath1E1
Epath2E2
Epath3E2
Epath3f(E2,E4)
Epath(12) f(E1,E2)
Epath(12)f(E1,E2,E3)
Epath(123)f(E1,E2,E3,E4)
Epath(123)f(E1,E2,E3,E4,E5)
27Output of the algorithm
C1-R11
C4-R41
Epath(123)f(E1,E2,E3,E4,E5)
E(A?B)
E(B?C)
A
B
C
E(A?C) e e ?E(A?B), not(C(e)) union e
e ?E(B?C), A(e)
28Transforming OQL Queries
H1 Constraint Elimination Heuristic
select x from Employees as x where C1-R11 and
C4-R41
solutionilt C1-R11 , Epath(123)gt
(select x from Employees as x where C1-R11
) except Epath(123)
29Transforming OQL Queries
H2 Constraint Introduction Heuristic
select x from Employees as x where C1-R11
solutionilt C4-R41 , Epath(123) gt
(select x from Employees as x where C1-R11 and
C4-R41 ) union Epath(123)
30Selecting the Optimal Solution
Constraint Elimination Heuristic
select from where c1 and c2 and c3 c1 ? c2
? c3 (E1) c1 ? c3 ? c2 (E2) (select from
where c1 and c2 ) except E1 (select from
where c1 and c3 ) except E2
Solution with the Fewest Exceptions
31Selecting the Optimal Solution
select from where c1 and c2 c1 ? c31
(E1) c2 ? c32 (E2) (select from where c1
and c2 and c31) union E1 (select from where
c1 and c2 and c32) union E2
Constraint Introduction Heuristic
Solution with the Highest Index Selectivity and
the Fewest Exceptions
32Benefits of the heuristics
- Constraint Elimination Heuristic
- CPU-related benefits
- except operation (for exceptions)
- queries optimized once, executed frequently
- Constraint Introduction Heuristic
- data access time benefits
- union operation (for exceptions)
- ad-hoc queries
33Semantic Optimization Summary
- We provided an algorithm that applies the
optimization heuristics - using association rules with exceptions instead
of integrity rules, - taking advantage of indirect associations.
- The implementation of this framework showed that
- benefits (H2) gtgt benefits (H1)
- cost (H1 and H2) depends on the complexity of
the graph of constraints and the number of
exceptions per rule
34Outline Part 6
- ODMG OQL (Object Query Language)
- Type Checking and Type Inference
- Calculus Representation and Normalization
- Algebraic Representation
- Association Rules - Semantic Optimization
- Complexity and Optimization
- Conclusion
35Optimization of semantic optimizer
- Initial version
- Multiple traversal of edges
- Backtracking beyond source constraints
- Combining path annotations Eliminating
duplicates - Redundant evaluation of exceptions for all paths
- Redundant evaluation of exceptions against all
constraints
- 2nd version
- Expensive operations
- Extending and propagating path annotations in
step 1 - Sorting path annotations in step 2
3rd version
36Semantic Optimizer Complexity
maximum conjuncts per composite constraint
source constraints
edges
composite constraints in the graph of
association rules
37Semantic Optimization Discussion
- Cost of finding incomplete paths from sources to
targets - and combining them into complete paths is
negligible. - Dominant cost fetching objects from the
database. - fetching association rules (it can be ignored)
- filtering exceptions
- Based on association rule i we add an index
constraint. - We fetch Si instead of N objects from the extent.
- We also retrieve Ei exceptions.
- The optimal association minimizes Ei Si .
- The optimizer proceeds if N gt Ei Si .
-
38QueryEngine
Physical Execution
Semantic Optimizer
Algebra
Mining Algorithm
Conventional OQL Processor
Normalization
Calculus
Typability
Birch Cluster Algorithm
Poet Java Binding /
Storage Manager
Pent III 450 MHz 256 Mb SDRAM
Census Data
39Performance of Semantic Optimization Algorithm
40Performance of Semantic Optimization Algorithm
41Conclusion
Framework for the efficient execution of OQL
queries
Type Inference
Calculus
Normalized Calculus
Algebra
Association Rules
Semantic Optimizer