Title: An Abstract Framework for Generating Maximal Answers to Queries
1An Abstract Framework for Generating Maximal
Answers to Queries
- Sara Cohen, Yehoshua Sagiv
2Motivation
Queries and Databases
Answers and Semantics
Graph Properties
3The Problem
- In many different domains, we are given the
option to query some source of information - Usually, the user only gets results if the query
can be completely answered (satisfied) - In many domains, this is not appropriate, e.g.,
- The user is not familiar with the database
- The database does not contain complete
information - There is a mismatch between the ontology of the
user and that of the database - The query is a search that is not expected to
be correct
4Search for papers by Smith that appeared in
ICDT 2004
5Sorry, no matching record found
6Search for buses from Haifa-Technion to Ben
Gurion Airport
7There is no direct bus line between the required
destinations
8Search for buses to Ben Gurion Airport
9Must choose From and To
10What Do Users Need?
- Users need a way to get interesting partial
answers to their queries, especially if a
complete answer does not exist - These partial answers should contain maximal
information - Main Problems
- What should be the semantics of partial answers?
- How can all partial answers be efficiently
computed?
11Previous Work
- Many solutions have been given for the main
problems - solutions differ, according to the problem domain
- Examples
- Full disjunctions Galindo-Legaria (94),
Rajaraman, Ullman (96), Kanza, Sagiv (03) - Queries with incomplete answers over
semistructured data Kanza, Nutt, Sagiv (99) - FleXPath Amer-Yahia, Lakshmanan, Pandit (04)
- Interconnections Cohen, Kanza, Sagiv (03)
12Our Contribution
- In the past, for each semantics considered, the
query evaluation problem had to be studied anew.
In this paper, we - Present a general framework for defining
semantics for partial answers - Framework is general enough to cover most
previously studied semantics - Query evaluation problem can be solved once
within this framework and reused for new
semantics - Results improve upon previous evaluation
algorithms - Presents relationship between this problem and
that of the maximal P-subgraph problem
13Motivation
Queries and Databases
Answers and Semantics
Graph Properties
14Databases
- Databases are modeled as data graphs (V, E, r,
lV, lE) - r Can have a designated root
- lV Labels on the vertices
- lE Labels on the edges
- Note
- Nodes correspond to data items
- Even databases that do not have an inherent graph
structure can be modeled as graphs, e.g.,
relational databases
15XML as a Data Graph
University
Name
Dept
Dept
Technion
Name
Name
Faculty
Faculty
Computer Science
Biology
Professor
Lecturer
Teaches
Teaches
Teaches
Name
Name
Avi Levy
Chana Israeli
Bioinformatics
Databases
Molecular Biology
16Relational Database as a Data Graph
Sites
Climates
Country Climate Country Climate
Canada diverse
UK temporate
USA temporate
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
17Relational Database as a Data Graph
Sites
Climates
Country Climate Country Climate
Canada diverse
UK temporate
USA temporate
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
18Relational Database as a Data Graph
Sites
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
19Relational Database as a Data Graph
Sites
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
20Relational Database as a Data Graph
21Queries
- Queries are modeled as query graphs (V, E, r,
CV, CE, s) - r Can have a designated root
- CV Vertex constraints on the vertices
(basically, a boolean function on vertices) - CE Edge constraints on the edges (basically, a
boolean function on pairs of vertices) - s A structural constraint, one of the letters C,
R, N (defines the required structure of answers,
i.e., connected, rooted or none) - Note Nodes correspond to query variables
22XML Query as a Graph
- Returns faculty members from the Biology
Department
University
Is Descendent
Dept and ContainsText(Biology)
Is Child
Structural Constraint Rooted
Faculty
Is GrandChild
Name
23Join Query as a Graph
Structural Constraint Connected
Belongs to C
q1
C.Country A.Country
C.Country S.Country
q2
q3
Belongs to A
Belongs to S
A.Country S.Company and A.City S.City
24Motivation
Queries and Databases
Answers and Semantics
Graph Properties
25Assignment Graphs
- Assignment graphs are used to compactly represent
assignments of query nodes to database nodes - Basically, assignment graph for Q and D, written
Q?D has - Node (q,d) for each pair q? Q and d? D such that
d satisfies the constraint on q - Edge ((q,d), (q,d)) if there is an edge (q,q)
in Q and (d,d) satisfies the constraint on
(q,q) - May also have a root (details omitted)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(q3, s1)
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
32Partial Assignment
- A partial assignment is any subgraph of Q?D that
does not contain two different nodes (q,d) and
(q,d) - otherwise, would map the node q to two different
database nodes - Can distinguish special types of partial
assignments - vertex complete
- edge complete
- structurally consistent
Every query node must appear in the partial
assignment
The partial assignment satisfies the querys
structural constraint
Every edge constraint between query variables in
the partial assignment holds
33Example
(q3, s1)
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
34Semantics
- All partial assignments for Q over D that satisfy
the vertex and edge constraints are encoded in
Q?D - A semantics defines which subgraphs of the answer
graph (i.e., which partial assignments) are in
fact answers, e.g., - Sves allows all partial assignments that are
vertex complete, edge complete and structurally
consistent - Ses allows all partial assignments that are edge
complete and structurally consistent - Ss allows all partial assignments that are
structurally consistent - Usually, we are only interested in maximal
partial assignemnts
35Example Join
(q3, s1)
Using semantics Sves we get the natural join
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
36Example Join becomes a Full Disjunction
(q3, s1)
Using semantics Ses we get the full disjunction
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
37Other Examples
- Queries with incomplete answers over
semistructured data Kanza, Nutt, Sagiv (PODS 99) - Weak semantics modeled by Ses Or-semantics
modeled by Ss - FleXPath Amer-Yahia, Lakshmanan, Pandit (Sigmond
04) - Modeled by Ses
- Interconnections Cohen, Kanza, Sagiv (03)
- Complete interconnection can be modeled by Ses
Reachable interconnection can be modeled by Ss
38Motivation
Queries and Databases
Answers and Semantics
Graph Properties
39Semantics are a type of Graph Property
- A graph property P is a set of graphs, e.g.,
- is a clique
- is a bipartite graph
- A semantics defines a set of graphs, for every Q,
D (these graphs are subgraphs of Q?D) - Therefore, semantics are a type of graph property
40Hereditary Graph Properties and their Variants
- There are several interesting types of graph
properties that have been studied in graph theory - A graph property P is hereditary if every induced
subgraph of a graph in P, is also in P (e.g.,
clique, is a forest) - A graph property P is connected-hereditary if
every connected induced subgraph of a graph in P,
is also in P (e.g., is a tree) - Can define rooted-hereditary similarly
41Semantics are usually Hereditary
- Most semantics for partial answers considered in
the past are hereditary (in some sense), i.e.,
subgraphs of a partial answer are also partial
answers - Many semantics require connectivity of results
(e.g., full disjunctions) - Some require answers to be rooted (e.g., FlexPath)
42Maximal P-Subgraph Problem
- Given a graph property P, and a graph G The
maximal P-subgraph problem is Find all maximal
induced subgraphs of G that have property P - Therefore, the problem of finding all maximal
answers for a query over a database, under a
given semantics, is a special case of the maximal
P-subgraph problem
43Efficient Query Evaluation
- There are efficient algorithms that find all
maximal P-subgraphs for hereditary, connected
hereditary and rooted hereditary properties - Efficient in terms of the input and the output
(i.e., incremental polynomial time) - Use these algorithms to find maximal query
answers, e.g., to find full disjunctions, weak
answers, or-answers, etc. - Improves upon previous results
44Conclusion
- Presented abstract framework
- Can model many different types of queries,
databases and semantics in the framework - Semantics in the framework are graph properties
- Solve the maximal P-subgraph problem once and
reuse it to find maximal query answers
45Future Work
- It is convenient to define ranking functions and
return answers in ranking order - How/when can this be done in our framework?
- Note From the modeling it is immediately
apparent that ranking cannot always be performed
efficiently - The problem of finding a maximal P-subgraph of
size k is NP complete for hereditary and
connected-hereditary graph properties
(Yannakakis, STOC 78)
46Thank you!Questions?