Title: SBA StackBased Approach and SBQL StackBased Query Language
1SBA (Stack-Based Approach) and SBQL
(Stack-Based Query Language)
Presentation prepared for OMG Object Database
Technology Working Group OMG TECHNICAL
MEETING, Anaheim, CA USASeptember 25-29,
2006 by Prof. Kazimierz Subieta
Polish-Japanese Institute of Information
Technology, Warsaw, Poland subieta_at_pjwstk.edu.pl
http//www.ipipan.waw.pl/subieta SBA/SBQL pages
http//www.sbql.pl
2What is SBA and SBQL?
- SBA is a conceptual frame for developing O-O
database query/programming languages - Query languages are programming languages.
- SBQL is a model query language according to SBA.
- It has the same role and meaning as object
algebras, but it is formally sound and much more
universal. - SBA/SBQL deal with various data models and all
imaginable and reasonable query constructs. - Abstract implementation is the basic paradigm of
formal specification of semantics.
3General architecture of query processing
- Actually, we do not fix the architecture
- It can be similar to SQL or ODMG architectures
(server-side query processing, ODBC, ADO or JDBC
style, queries embedded in popular programming
languages) - It can be similar to Oracle PL/SQL (programs
integrated with queries, client-side query
processing) - Shifting query processing and optimization to the
client side - Lower workload for the server ? better overall
performance. - More flexible for query optimization.
4Detailed client-server architecture
Software development environment (editor,
debugger, etc.)
Client
Parser of queries and programs
Syntactic tree of a query/program
Optimization by rewriting
Optimization by indices
Interpreter of queries programs
Strong type checker
ENVS
Static ENVS
QRES
Static QRES
Volatile (non-shared) objects
Local metabase
Network
Register of indices
Register of views
Object manager
Server
Metabase of persistent objects
Processing persistent abstractions (views, stored
procedures, triggers)
Administration Transactions
Persistent (shared) objects
5Object model and database schema
- are inevitable parts of a query language.
- The application programmer must be aware what the
database contains and how it is organized. - Usually, an object model and a database schema
language are presented at the beginning of the
given specification, c.f. ODMG - The model involves such concepts as types,
classes, interfaces, joined into a coherent whole
as a schema language, c.f. ODL. - However, the concepts are difficult, especially
types. - Introducing them at the beginning usually results
in inconsistencies. - Hence, we must first understand the semantics of
a query language on the ground of an abstract
object store model. - First, realize what is the semantics of a query
language, then define the corresponding type
system.
6SBA semantics of QL-s general point of view
- Query - all syntactically correct queries
- State - all states (not only database states)
- Result - all possible query results.
- Semantics of any query is a function that maps
State ? Result - Closure property assumes that a state and a
result are sets of objects - In SBA a state contains objects (but not only
objects) and a result never contains objects - Closure property is conceptual nonsense.
7What is State?
- State includes all data or programming features
that can influence the result of some query, in
particular - Database state
- Local objects used in queries on the client side
- Computer and software environment (e.g. date,
time) - Libraries, procedures, functions, classes, views,
etc. - State also includes structures that determine the
run-time environment of computations. - In SBA there is one such structure environment
stack (ENVS) - an extended and modified call
stack. - state object store ENVS
8Is ENVS purely implementation notion?
- No. The environment stack is a conceptual notion.
- ENVS makes it possible to specify precisely the
semantics of query languages, - the mechanisms of classes, roles, static and
dynamic inheritance, ... - (recursive) procedures, parameter passing,
database views,... - etc.
- In SBA we deal with ENVS on an abstract level. We
are not interested in its physical
implementation. - Implementation can be different, introducing many
optimizations. - Usually ENVS is a client-side data structure
stored in main memory. - The main roles of ENVS determining scopes for
names and binding names occurring in queries.
9What is Result?
- Query can return any stored or computed value.
- For instance, query 22 returns 4.
- Query can return references (OID, file name,
memory address, etc.). - For instance, query Person returns references to
person objects. - Queries can return nested complex values
consisting of atomic values, references, names,
structure constructors and collection
constructors. - SBQL queries never return objects.
- Objects are stored within the object store only.
10Query result stack, QRES
- Temporary and final query results are accumulated
on the query result stack, QRES. - QRES is a client-side structure stored in main
memory. - QRES must be prepared to store in a single
section any complex query result. - QRES is not a component of State
- because the result of a new query does not
depend on the previous QRES state. - In SBA precise specification of the QRES
mechanism is fundamental.
11Example of QRES state
15 i17 struct x(i61), y(i93) bag
struct n("Doe"), s(i9),
struct n("Poe"), s(i14),
struct n("Lee" ), s(i18)
top
the only visible stack section
invisible stack sections
bottom
12Total internal identification
- Each database or program entity, which could be
separately retrieved, updated, inserted, deleted,
authorized, indexed, protected, locked, should
possess a unique internal identifier. - We are not interested in the form and meaning of
internal identifiers. - Unique internal identifiers should be assigned to
all components of objects, including atomic ones.
- The principle makes it possible to make
references and pointers to all possible entities,
thus to avoid conceptual problems with binding,
scoping, updating, deleting, parameter passing,
and other functionalities that require references
as query primitives. - ODMG does not follow the idea.
- ODMG literals (components of objects) have no
identifiers. - I consider this a fundamental conceptual flaw.
13Object relativism
- If some object O1 can be defined, then object O2
having O1 as a component can also be defined. - No limitations concerning the number of hierarchy
levels of objects. - Objects on any hierarchy level should be treated
uniformly. - An atomic object (having no attributes) should be
allowed as a regular data structure. - Object relativism implies the relativism of
corresponding query capabilities. - There is no need for attributes, sub-attributes,
etc. - all are objects too. - The idea radically reduces a database model, cuts
the size of specification of query languages, the
size of implementation, and the size of
documentation. - It much supports query optimization and strong
typing.
14Abstract Object Store Models
- A component of State is an object store.
- To define the semantics of a query language we
have to define an object store precisely, but on
the abstract level. - Because various object models introduce a lot of
incompatible notions, SBA assumes some family of
object store models which are enumerated M0, M1,
M2 and M3. - M0 covers relational, nested-relational and
XML-oriented databases. M0 assumes hierarchical
objects and binary links between objects. - Advanced store models introduce classes and
static inheritance (M1), object roles and dynamic
inheritance (M2), and encapsulation (M3). - All the models are served by SBQL.
- These store models are pivots - they can be
extended and modified, depending on features that
one would like to cover.
15Notions common to store models
- Internal object identifier (OID)
- Uniquely identifies an object in the store.
- Assigned automatically, no external meaning.
- Used as a reference or a pointer to an object.
- External object name
- Usually bears some external semantics of an
object, e.g. Person, Customer. - Explicitly assigned by a database designer,
programmer, etc. - It is usually not unique, e.g. many objects named
Person. - Atomic object value
- Cannot be subdivided into smaller parts
- E.g. 2, 3.14, Doe, Hello, World!.
- The size is not constrained from 1 bit to
gigabytes. - So far we neglect types (we deal with types
later).
16M0 Complex Objects and Pointer Links
I - a set of internal identifiers N - a set of
external names V - a set of atomic values
lt i, n, v gt - atomic object lt i1, n, i2 gt -
pointer object lt i, n, T gt - complex object,
T is a set of objects R ? I start
identifiers
lt i, n, f gt
object
object ID
object name
object value
- No record, tuple, array, set, etc. constructors
in the model essentially all of them are
collections of objects. - External names are not unique modeling
collections (bags). - Uniform treatment of relational, nested
relational, etc. databases.
17M0 object store - example
Objects
lt i9, Emp, lt i10, name, Lee gt,
lt i11, sal, 900 gt, lt
i12, address, lti13, city, Rome gt,
lti14, street,
Boogie gt,
lti15, house, 13 gt gt,
lt i16, worksIn, i22 gt gt
18M0 object store graphical view
i5 Emp
i1 Emp
i9 Emp
i6 name Poe
i2 name Doe
i10 name Lee
i11 sal 900
i7 sal 2000
i3 sal 2500
i8 worksIn
i4 worksIn
i12 address
i13 city Rome
i14 street Boogie
i15 house 13
i16 worksIn
i22 Dept
i17 Dept
i23 dname Ads
i18 dname Trade
i24 loc Rome
i19 loc Paris
i25 employs
i20 loc Rome
i26 employs
i21 employs
19A relational database in M0
Relational schema Emp( name, sal, worksIn )
Model M0 Objects lt i1 , Emp, lt i2, name,
Doe gt, lt i3, sal, 2500
gt, lt i4, worksIn,
Production gt gt, lt i5 , Emp, lt i6, name,
Poe gt, lt i7, sal, 2000
gt, lt i8, worksIn, Sales
gt gt, lt i9 , Emp, lt i10, name, Lee gt,
lt i11, sal, 2000 gt,
lt i12, worksIn, Sales gt gt Start
identifiers i1 , i5 , i9
Relation Emp
- A similar mapping can be applied to hierarchical
DB, nested relational DB, XML, RDF,
20Environment Stack, ENVS
- ENVS is also known as call stack.
- For query processing we modified and generalized
it - ENVS is used to binding objects that are stored
at a server, hence ENVS contains references to
objects rather than object values. - The same object can be referenced from different
stack sections. - For collections the binding is macroscopic, for
instance, if Emp is bound, the binding returns
many references. - In PLs the stack has usually two incarnations
static (compile time) and dynamic (run-time). - Because database objects are always dynamically
bound, some properties of a static stack must be
shifted to a dynamic stack. - We deal with the static stack when we consider
strong typing. - Besides classical roles of the stack, SBA
provides many new roles of it, in particular,
processing non-algebraic operators.
21Naming, scoping, binding
- SBA is based on the naming, scoping and binding
paradigm - Every name occurring in a query is bound to run
time program or database entities, according to
the actual scope for the name. - Binding is substituting a name occurring in a
query by a run-time program entity (or entities). - This concerns all names, in particular
- Names of persistent or volatile objects,
subobjects (attributes), pointers, procedures,
functions, methods, views, parameters. - Names of entities from the computer or software
environment - Any auxiliary names that are defined and used in
queries - ENVS presents a universal scoping and binding
mechanism. - No name occurring in a query can be bound
otherwise. - ENVS stores binders, i.e. pairs n(r), where n ?
N, r ? Result.
22Opening a new section of ENVS (1)
- In PLs opening a new scope on ENVS is caused by
entering a new procedure (function, method) or
entering a new block. - Respectively, removing the scope is performed
when the control leaves the body of the
procedure/block. - To these classical situations we add a new one.
- It is the essence of SBA. The idea is that some
query operators (called non-algebraic) behave on
the stack similarly to program blocks. - In the SBQL query
- Emp where ( name Poe and sal gt 1000 )
- the part ( name Poe and sal gt 1000 ) behaves
as a program block executed in an environment
consisting of the interior of an Emp object. - Binding concerns also names name and sal.
- Hence, we push on ENVS a section with the
interior of the currently processed Emp object
(next slide).
23Opening a new section of ENVS (2)
condition
Emp where
(name Poe and sal gt 1000)
binding
binding
name(i10) sal(i11) address(i12)
worksIn(i16) Emp(i1) Emp(i5) Emp(i9)
Dept(i17) Dept(i22)
Interior of the 3-rd object Emp
Emp(i1) Emp(i5) Emp(i9) Dept(i17)
Dept(i22)
Initial ENVS state. bind( Emp ) i1, i5, i9
ENVS during evaluation of the condition for the
third object Emp. bind( name ) i10 bind(
sal ) i11
24Function nested computing objects interior
- Function nested acts on an object reference and
returns its interior as a set of binders. For
instance - The result of nested is then pushed at ENVS.
i9 Emp
i10 name Lee
i11 sal 900
i12 address
i13 city Rome
i14 street Boogie
i15 house 13
i16 worksIn
nested( i9 ) name( i10 ), sal( i11 ),
address( i12 ), worksIn( i16 )
25Generalization of function nested
- In general, it can be applied to any element of
Result. - For a complex object lti, n, lti1, n1,...gt, lti2,
n2,...gt, ... , ltik, nkgt gt it holds nested(
i ) n1(i1), n2(i2), ... , nk(ik) - The case is illustrated on the previous slide.
- If i is an identifier of a pointer object lti, n,
i1gt, and the object store contains the object
lti1, n1, ... gt, then nested( i ) n1(i1) - This accomplishes navigation according to a
pointer. - For a binder n(x) holds nested( n(x) ) n(x)
- According to understanding of auxiliary names
introduced in queries. - For a structure nested returns the union of the
results of the nested function applied for
elements of the structure
nested( struct x1, x2, ... )
nested(x1) ?? nested(x2) ? ... - For other arguments nested returns the empty set.
26Definition of Result for SBQL
- Any atomic value belongs to Result.
- Any reference (OID) belongs to Result.
- If x belongs to Result, then any binder n(x)
belongs to Result. - If x1, x2, x3, ... belong to Result, then the
structure struct x1, x2, x3, ...
belongs to Result. - In contrast to typical structures, we do not
assume that all elements of a structure must be
named. - Empty structures are not allowed.
- If x1, x2, x3, ... belong to Result, then bag
bagx1, x2, x3, ... and sequence sequencex1,
x2, x3, ... belong to Result. - bag and sequence are collection constructors.
- Other collection constructors are possible.
27Summing up what we have defined so far?
- We know precisely what is an object store, atomic
object, complex object, pointer object and
collection. - We know precisely what is the construction of an
environment stack ENVS, what it is for, what is
binding, and how a new section on the stack is
constructed (binders, function nested). - We know precisely what is a query result and a
query result stack QRES. - Abstract implementation of a query language has
the form of the recursive procedure eval
(evaluation of a query). - This is all the semantic equipment to define SBQL
and its abstract implementation for the M0 store
model. - For details see http//www.sbql.pl
28Examples of SBQL queries for M0
- Get references of departments for employee named
Doe - (Emp where name Doe).worksIn.Dept
- Get names of departments together with their
average salaries - (Dept join avg(employs.Emp.sal) as avgsal) .
(dname, avgsal) - Names and cities for employees working in the
department managed by Kim - (Dept where (boss.Emp.name) Kim).employs.Emp.
(name, if exists(Address)
then Address.city else No address) - Get departments employing a professional for any
job in the company. - Dept where ?distinct(Emp.job) as j (?employs.Emp
(j job)) - Names and salaries of employees earning more than
their bosses. - (Emp where sal gt (worksIn.Dept.boss.Emp.sal)).(nam
e, sal)
29M1 Classes and static inheritance
- Classes, methods and inheritance require
extension of M0. - Classes have two incarnations as pieces of a
source code and as run-time database entities. - Usually programming languages deal with classes
as second-class citizens, i.e. in the source code
only. - In our model we are (so far) not interested in
this point of view. - We deal with them when we consider static binding
and strong typing. - In the M1 store model classes are first class
entities storing invariant properties of their
objects, i.e. methods (but not only). - Hence in our model classes are objects too,
connected with their member objects by a special
relationship. - Classes are also connected with classes by
another relationship know as inheritance.
30Classes as objects in M1
i40 PersonClass
i41 age (...code...)
...
inherits from
i50 EmpClass
member of
i51 changeSal (...code...)
i52 netSal (...code...)
...
i1 Person
member of
member of
i2 name Doe
i9 Emp
i5 Emp
...
i10 name Lee
i6 name Poe
i11 sal 900
i7 sal 2000
i16 worksIn
i8 worksIn
...
...
i33
i22
31SBQL semantics for M1
- Changes concern only ENVS and non-algebraic
operators - When a non-algebraic operator processes an object
lti, gt, which is a member of a class ltiC1, gt,
which inherits from a class ltiC2, gt, etc. then
the ENVS is augmented (starting from the top) by
nested(i), nested(iC1), nested(iC2), up to the
most general class. - When a non-algebraic operator finishes processing
the object lti, gt, all these sections are removed
from ENVS.
During processing the object lti, gt
nested( i ) nested(iC1) nested (iC2) ..
Before processing the object lti, gt
After processing the object lti, gt
Previous ENVS state
Previous ENVS state
Previous ENVS state
32Example Processing an object in M1
- (Emp where name Poe) . (name, netSal, age)
- ENVS during processing the subquery after the dot
name(i6) sal(i7) worksIn(i8) changeSal(i51)
netSal(i52) ... age(i41) ... Person(i1) ...
Emp(i5) Emp(i9) .. ...
nested(i5) - internals of the currently
processed Poes object nested (i50) internals
of EmpClass nested (i40) internals of
PersonClass Binders to database objects
Sections pushed by the dot
33Some peculiarities of M1
- Binding and processing methods
- Invocation of a method means that a new section
(activation record) is additionally pushed at top
of ENVS. - The section contains parameters of the method
(evaluated previously), its local environment and
a return track. - Some peculiarities connected with encapsulation.
- A problem - multiple inheritance
- M1 allows for multiple inheritance, but in case
of name conflict there is no solution. - This is a general problem, not specific to M1.
- Another problem - collections
- They violate object-oriented principles such as
substitutability and open-close (reuse,
conceptual continuation). - Possible solutions require specific extensions of
M1.
34Examples of SBQL queries for M1 - schema
- UML-like, but
- Cardinalities assigned to all database entities
- Nested classes
- Pointers rather than association roles
Dept0.. d dname loc1.. budget()
employs1..
worksIn
manages0..1
boss
35Examples of SBQL queries for M1
- Get names of departments and the average age of
their employees (inheritance of the method age). - Dept . (dname, avg(employs.Emp.age))
- Get employees that for sure live in the cities
where their departments are located (inheritance
of Address). - Emp where ? Address as a (? (worksIn.Dept.loc) as
l (a.city l)) - For each employee get name and the percent of the
annual budget of his/her department that is
consumed by his/her sal. - Emp . (name, (((if exists(sal) then sal else 0)
as s). ((s 12
100)/(worksIn.Dept.budget))) - For each person having no salary give the minimal
salary in his/her department. - for each (Emp where not exists(sal)) as e do
e.changeSal(
min(e.works_in.Dept.employs.Emp.sal) )
36M2 Dynamic roles and dynamic inheritance
- The object model with dynamic object roles
removes essential conceptual drawbacks of the
classical static inheritance. - The idea is that an object during its life can
acquire and lose its roles without changing its
identity. - Objects business semantics depends on a
currently considered role. - SBQL is the first (and only) QL dealing with
dynamic roles. - Dynamic object roles and dynamic inheritance
require extension of M1 and extension of the
semantics of non-algebraic operators.
Person
Employee
Club-member
Patient
Student
Student
Dog-owner
Tax-payer
37Example of the M2 store model
i1 Person
i4 Person
i7 Person
i2 name Doe
i5 name Poe
i8 name Lee
i3 born 1948
i6 born 1975
i9 born 1951
is member of inherits from dynamically inherits
from
38SBQL semantics for M2
- Changes concern only ENVS and non-algebraic
operators - The order of sections of roles and classes on
ENVS is determined by a simple rule (c.f. full
description of SBA/SBQL). - Some new operators dealing with roles (dynamic
cast, has role). - (Emp where name Lee) . (sal, born, age)
Properties of the currently processed Emp role
Properties of the EmpClass Properties of the
Person super-role of the Emp role Properties of
the PersonClass Database section
sal(i17) worksIn(i18) changeSal(i51)
netSal(i52 ) ... name(i8) born(i9) age(i41)
... ......... Person(i1) Person(i4) Person(i7)
Emp(i13) Emp(i16) Student(i19) ... .........
Sections pushed by the dot
39Examples of SBQL queries for M2 - schema
40Examples of SBQL queries for M2
- Get employees older than 60 who live in Warsaw
(dynamic inheritance of the attribute Address and
static inheritance of the method age). - Emp where age gt 60 and ?Address (city
Warsaw) - For each person get name and the sum of all the
incomings (salary and scholarships). - (Person as p).
(p.name, sum(bag(0, ((Student)p).scholarship,
((Emp)p).sal))) - Get students who live in the same city as the
city of their school. - Student where ?Address (city (studiesAt.School.c
ity)) - Get name, faculty and school name for each person
studying at two or more faculties. - (((Person as p) join ((((Student)p) group as s)))
where count(s) 2). (p.name, s.(faculty,
(studiesAt.School.name)))
41Conclusions
- To make a high quality standard for
object-oriented databases, the specification of
semantics is the must, - to avoid the fate of SQL-99 and ODMG standards,
perceived as loose recommendations rather than
technical specifications. - SBA offers the unique method of query languages
construction and semantic specification. - SBA is a holistic database theory, it doesnt
give up any (even the most advanced) feature of
current practical O-O database QL/PL. - Efficiency has been proven by several
implementations. - The new standardization activity should not trust
the currently well-known concepts concerning O-O
query languages. - IMO limited, imprecise, immature, inconsistent.
- Following them ? standards qualities will be
among nice wishes. - So far SBA has no serious competitive approach.
4210 unique qualities of SBA/SBQL for a new O-O
database standard
- Orthogonal syntax, full compositionality of
queries. - Universal formal semantics based on abstract
implementation. - Computational universality, advanced data
structures, integration with PL constructs. - Strong typing of advanced O-O queries and
programs. - Several advanced implementations, next are
pending. - Fully transparent O-O virtual updatable views.
- Strong potential for query optimization.
- All O-O notions treated formally and uniformly.
- Sound and manageable metamodel.
- The potential for distributed query processing.