An Abstract Framework for Generating Maximal Answers to Queries PowerPoint PPT Presentation

presentation player overlay
1 / 46
About This Presentation
Transcript and Presenter's Notes

Title: An Abstract Framework for Generating Maximal Answers to Queries


1
An Abstract Framework for Generating Maximal
Answers to Queries
  • Sara Cohen, Yehoshua Sagiv

2
Motivation
Queries and Databases
Answers and Semantics
Graph Properties
3
The Problem
  • In many different domains, we are given the
    option to query some source of information
  • Usually, the user only gets results if the query
    can be completely answered (satisfied)
  • In many domains, this is not appropriate, e.g.,
  • The user is not familiar with the database
  • The database does not contain complete
    information
  • There is a mismatch between the ontology of the
    user and that of the database
  • The query is a search that is not expected to
    be correct

4
Search for papers by Smith that appeared in
ICDT 2004
5
Sorry, no matching record found
6
Search for buses from Haifa-Technion to Ben
Gurion Airport
7
There is no direct bus line between the required
destinations
8
Search for buses to Ben Gurion Airport
9
Must choose From and To
10
What Do Users Need?
  • Users need a way to get interesting partial
    answers to their queries, especially if a
    complete answer does not exist
  • These partial answers should contain maximal
    information
  • Main Problems
  • What should be the semantics of partial answers?
  • How can all partial answers be efficiently
    computed?

11
Previous Work
  • Many solutions have been given for the main
    problems
  • solutions differ, according to the problem domain
  • Examples
  • Full disjunctions Galindo-Legaria (94),
    Rajaraman, Ullman (96), Kanza, Sagiv (03)
  • Queries with incomplete answers over
    semistructured data Kanza, Nutt, Sagiv (99)
  • FleXPath Amer-Yahia, Lakshmanan, Pandit (04)
  • Interconnections Cohen, Kanza, Sagiv (03)

12
Our Contribution
  • In the past, for each semantics considered, the
    query evaluation problem had to be studied anew.
    In this paper, we
  • Present a general framework for defining
    semantics for partial answers
  • Framework is general enough to cover most
    previously studied semantics
  • Query evaluation problem can be solved once
    within this framework and reused for new
    semantics
  • Results improve upon previous evaluation
    algorithms
  • Presents relationship between this problem and
    that of the maximal P-subgraph problem

13
Motivation
Queries and Databases
Answers and Semantics
Graph Properties
14
Databases
  • Databases are modeled as data graphs (V, E, r,
    lV, lE)
  • r Can have a designated root
  • lV Labels on the vertices
  • lE Labels on the edges
  • Note
  • Nodes correspond to data items
  • Even databases that do not have an inherent graph
    structure can be modeled as graphs, e.g.,
    relational databases

15
XML as a Data Graph
University
Name
Dept
Dept
Technion
Name
Name
Faculty
Faculty
Computer Science
Biology
Professor
Lecturer
Teaches
Teaches
Teaches
Name
Name
Avi Levy
Chana Israeli
Bioinformatics
Databases
Molecular Biology
16
Relational Database as a Data Graph
Sites
Climates
Country Climate Country Climate
Canada diverse
UK temporate
USA temporate
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
17
Relational Database as a Data Graph
Sites
Climates
Country Climate Country Climate
Canada diverse
UK temporate
USA temporate
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
18
Relational Database as a Data Graph
Sites
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
Accommodations
Country City Hotel Country City Hotel Country City Hotel
UK London Plaza
Canada Montreal Hitlon
Canada Toronto Ramada
19
Relational Database as a Data Graph
Sites
Country City Site Country City Site Country City Site
UK London Buckingham
USA NY Metropolitan
20
Relational Database as a Data Graph
21
Queries
  • Queries are modeled as query graphs (V, E, r,
    CV, CE, s)
  • r Can have a designated root
  • CV Vertex constraints on the vertices
    (basically, a boolean function on vertices)
  • CE Edge constraints on the edges (basically, a
    boolean function on pairs of vertices)
  • s A structural constraint, one of the letters C,
    R, N (defines the required structure of answers,
    i.e., connected, rooted or none)
  • Note Nodes correspond to query variables

22
XML Query as a Graph
  • Returns faculty members from the Biology
    Department

University
Is Descendent
Dept and ContainsText(Biology)
Is Child
Structural Constraint Rooted
Faculty
Is GrandChild
Name
23
Join Query as a Graph
  • C A S

Structural Constraint Connected
Belongs to C
q1
C.Country A.Country
C.Country S.Country
q2
q3
Belongs to A
Belongs to S
A.Country S.Company and A.City S.City
24
Motivation
Queries and Databases
Answers and Semantics
Graph Properties
25
Assignment Graphs
  • Assignment graphs are used to compactly represent
    assignments of query nodes to database nodes
  • Basically, assignment graph for Q and D, written
    Q?D has
  • Node (q,d) for each pair q? Q and d? D such that
    d satisfies the constraint on q
  • Edge ((q,d), (q,d)) if there is an edge (q,q)
    in Q and (d,d) satisfies the constraint on
    (q,q)
  • May also have a root (details omitted)

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(q3, s1)
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
32
Partial Assignment
  • A partial assignment is any subgraph of Q?D that
    does not contain two different nodes (q,d) and
    (q,d)
  • otherwise, would map the node q to two different
    database nodes
  • Can distinguish special types of partial
    assignments
  • vertex complete
  • edge complete
  • structurally consistent

Every query node must appear in the partial
assignment
The partial assignment satisfies the querys
structural constraint
Every edge constraint between query variables in
the partial assignment holds
33
Example
(q3, s1)
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
34
Semantics
  • All partial assignments for Q over D that satisfy
    the vertex and edge constraints are encoded in
    Q?D
  • A semantics defines which subgraphs of the answer
    graph (i.e., which partial assignments) are in
    fact answers, e.g.,
  • Sves allows all partial assignments that are
    vertex complete, edge complete and structurally
    consistent
  • Ses allows all partial assignments that are edge
    complete and structurally consistent
  • Ss allows all partial assignments that are
    structurally consistent
  • Usually, we are only interested in maximal
    partial assignemnts

35
Example Join
(q3, s1)
Using semantics Sves we get the natural join
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
36
Example Join becomes a Full Disjunction
(q3, s1)
Using semantics Ses we get the full disjunction
(q2, a1)
(q1, c1)
(q2, a2)
(q1, c2)
(q1, c3)
(q2, a3)
(q3, s2)
37
Other Examples
  • Queries with incomplete answers over
    semistructured data Kanza, Nutt, Sagiv (PODS 99)
  • Weak semantics modeled by Ses Or-semantics
    modeled by Ss
  • FleXPath Amer-Yahia, Lakshmanan, Pandit (Sigmond
    04)
  • Modeled by Ses
  • Interconnections Cohen, Kanza, Sagiv (03)
  • Complete interconnection can be modeled by Ses
    Reachable interconnection can be modeled by Ss

38
Motivation
Queries and Databases
Answers and Semantics
Graph Properties
39
Semantics are a type of Graph Property
  • A graph property P is a set of graphs, e.g.,
  • is a clique
  • is a bipartite graph
  • A semantics defines a set of graphs, for every Q,
    D (these graphs are subgraphs of Q?D)
  • Therefore, semantics are a type of graph property

40
Hereditary Graph Properties and their Variants
  • There are several interesting types of graph
    properties that have been studied in graph theory
  • A graph property P is hereditary if every induced
    subgraph of a graph in P, is also in P (e.g.,
    clique, is a forest)
  • A graph property P is connected-hereditary if
    every connected induced subgraph of a graph in P,
    is also in P (e.g., is a tree)
  • Can define rooted-hereditary similarly

41
Semantics are usually Hereditary
  • Most semantics for partial answers considered in
    the past are hereditary (in some sense), i.e.,
    subgraphs of a partial answer are also partial
    answers
  • Many semantics require connectivity of results
    (e.g., full disjunctions)
  • Some require answers to be rooted (e.g., FlexPath)

42
Maximal P-Subgraph Problem
  • Given a graph property P, and a graph G The
    maximal P-subgraph problem is Find all maximal
    induced subgraphs of G that have property P
  • Therefore, the problem of finding all maximal
    answers for a query over a database, under a
    given semantics, is a special case of the maximal
    P-subgraph problem

43
Efficient Query Evaluation
  • There are efficient algorithms that find all
    maximal P-subgraphs for hereditary, connected
    hereditary and rooted hereditary properties
  • Efficient in terms of the input and the output
    (i.e., incremental polynomial time)
  • Use these algorithms to find maximal query
    answers, e.g., to find full disjunctions, weak
    answers, or-answers, etc.
  • Improves upon previous results

44
Conclusion
  • Presented abstract framework
  • Can model many different types of queries,
    databases and semantics in the framework
  • Semantics in the framework are graph properties
  • Solve the maximal P-subgraph problem once and
    reuse it to find maximal query answers

45
Future Work
  • It is convenient to define ranking functions and
    return answers in ranking order
  • How/when can this be done in our framework?
  • Note From the modeling it is immediately
    apparent that ranking cannot always be performed
    efficiently
  • The problem of finding a maximal P-subgraph of
    size k is NP complete for hereditary and
    connected-hereditary graph properties
    (Yannakakis, STOC 78)

46
Thank you!Questions?
Write a Comment
User Comments (0)
About PowerShow.com