MONADIC QUERIES over TREE-STRUCTURED DATA - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

MONADIC QUERIES over TREE-STRUCTURED DATA

Description:

Joint work with Christoph Koch, Robert Baumgartner, and ... firstchild2, nextsibling2, lastchild2, label[a]1, root1, leaf1 a. Monadic Queries over Trees ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 71
Provided by: koc87
Category:

less

Transcript and Presenter's Notes

Title: MONADIC QUERIES over TREE-STRUCTURED DATA


1
MONADIC QUERIES overTREE-STRUCTURED DATA
  • Georg Gottlob
  • TU Wien Oxford University
  • Joint work with Christoph Koch, Robert
    Baumgartner, and Marcus Herzog, and Reinhard
    Pichler

2
Talk Outline
  • Semistructured data HTML, XML
  • Monadic Queries
  • Monadic datalog over trees
  • Xpath
  • Web information extraction (wrapping)
  • Lixto

3
Strings, Trees, Graphs, Logic
A few well-known results
  • Büchi MSOREG over strings
  • Rabin decidability of S2S
  • Thatcher and Wright MSO REG over ranked trees
    (tree automata)
  • Brüggemann-Klein/Wood/Murata MSO REG over
    unranked trees
  • Fagin ESO NP
  • Note over graphs ESO NP-hard, MSO
    hard for Pol. Hierarchy.
  • Grädel/Immerman/Vardi ESO(Horn)DatalogLFPPTI
    ME
  • (on ordered structures)
  • Courcelle MSO in LinTime on tree-like
    structures (treewidth lt k)
  • Clarke, Emerson, Pnueli, et al CTL, LTL

4
Web documents are trees !
  • HTML Hypertext Markup Language
  • XML Extensible Markup Language
  • HTML, XML Context free languages.
  • Represent a document by its parse tree.
  • Tags vertex labels
  • Labeled trees.

5
HTML Example
lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"gt lthtmlgt ltbodygt lth1gtPeople _at_
DBAIlt/h1gt lttable border"1" cellpadding"3"
cellspacing"1"gt lttrgt lttdgtGeorg Gottloblt/tdgt
lttdgtgottlob_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18420lt/tdgt lt/trgt lttrgt
lttdgtChristoph Kochlt/tdgt
lttdgtkoch_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18449lt/tdgt lt/trgt lt/tablegt lt/bodygt lt/htmlgt
People _at_ DBAI
Georg Gottlob gottlob_at_ 18420
Christoph Koch koch_at_ 18449
6
HTML Example
lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"gt lthtmlgt ltbodygt lth1gtPeople _at_
DBAIlt/h1gt lttable border"1" cellpadding"3"
cellspacing"1"gt lttrgt lttdgtGeorg Gottloblt/tdgt
lttdgtgottlob_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18420lt/tdgt lt/trgt lttrgt
lttdgtChristoph Kochlt/tdgt
lttdgtkoch_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18449lt/tdgt lt/trgt lt/tablegt lt/bodygt lt/htmlgt
People _at_ DBAI
Georg Gottlob gottlob_at_ 18420
Christoph Koch koch_at_ 18449
7
HTML Example
lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"gt lthtmlgt ltbodygt lth1gtPeople _at_
DBAIlt/h1gt lttable border"1" cellpadding"3"
cellspacing"1"gt lttrgt lttdgtGeorg Gottloblt/tdgt
lttdgtgottlob_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18420lt/tdgt lt/trgt lttrgt
lttdgtChristoph Kochlt/tdgt
lttdgtkoch_at_dbai.tuwien.ac.atlt/tdgt
lttdgt18449lt/tdgt lt/trgt lt/tablegt lt/bodygt lt/htmlgt
People _at_ DBAI
Georg Gottlob gottlob_at_ 18420
Christoph Koch koch_at_ 18449
8
ltpaperDBgt ltpapergt ltauthorgt
ltchandra/gt
ltmerlin/gt lt/authorgt
lttitle Conjunctive Queries/gt lt/papergt
lt/paperDBgt
XML Example
paperDB
. . . . .
paper
title
author
.
.
9
Ordered Trees as finite structures
Child-relation is a priori unordered
paper
fc
fc first child
ns
ns next sibling
author
title
fc
fc
ns
Conj. Queries
chandra
merlin
10
Core XPath
  • simple location steps paper/title
  • loc. steps with explicit axes paper/descendantm
    erlin
  • qualifiers paper..
  • Boolean logic ...chandra and merlin and (not
    harel)

Full Xpath
  • node set comparisons and operations
  • order functions (first, last, position) , etc.
  • arithmetic and string operations

Implementations in the context of XSLT
processors Xalan,
XT, MS Internet Explorer (IE6)
11
XPath Examples
/descendanta/childb
/descendanta/childb descendantc and
not(following-siblingd)
c
a
/descendanta/childb following-siblingd
a
c
b
d
b
c
a
a
b
c
a
a
c
b
d
b
b
d
b
b
c
b
c
c
c
12
Ordered Trees as finite structures
Child-relation is a priori unordered
fc first child
ns next sibling
?Ultfirstchild2, nextsibling2, lastchild2,
labela1, root1, leaf1gt
a??
13
Monadic Queries over Trees
Select some nodes of a tree Unary query
f Trees ? 2dom
No Joins or combinations of objects Yardstick
Monadic Second Order Logic (MSO)
Select titles of articles authored by Chandra and
Merlin
Two important applications
  • Web Information Extraction (? later)
  • Monadic XML Queries

14
Monadic Datalog over Trees
Select titles of articles authored By Chandra and
Merlin
15
Monadic Datalog over Trees
paperDB
fc
paper
paper
ns
ns
fc
ns
author
title
fc
fc
ns
Conj. Queries
chandra
merlin
paper(X) ? root(R) firstchild(R,X). paper(X) ?
paper(Y) nextsibling(Y,X). output(X)?
paper(P) firstchild(P,A)
firstchild(A,Z) labelChandra(Z)
nextsibling(Z,V) labelMerlin(V)
nextsibling(A,T) firstchild(T,X).
16
How expressive is monadic Datalog?
  • It was known that
  • Monadic Datalog ? ?1-MSO
  • Full Datalog P

Theorem G. Koch 2002
Over ?U, Monadic Datalog MSO
A unary query is definable in MSO iff it is
definable via a monadic datalog program.
17
Proof idea Simulate Unranked Query Automata
(UQA) by Neven and
Schwentick in mon. Datalog
UQA ?Unary MSO Queries
Neven Schwentick 01
18
Example Even-query
Proof idea Simulate Unranked Query Automata
(UQA) by Neven and
Schwentick in mon. Datalog
Up transition
19
Example Even-query
Proof idea Simulate Unranked Query Automata
(UQA) by Neven and
Schwentick in mon. Datalog
Up transition
0
0
1
0
20
Example Even-query
Proof idea Simulate Unranked Query Automata
(UQA) by Neven and
Schwentick in mon. Datalog
Up transition
0
0
1
0
qodd(X) - 0(Y), lastchild(X, Y).
21
How complex is Monadic Datalog?
  • Previously known facts on full Datalog over
    Graphs
  • Data Complexity of Datalog P-complete (impl. in
    Vardi 88)
  • Combined Complexity EXPTIME-complete (impl.
    Vardi 88)
  • Comb. Compl. of sirups EXPTIME-cplt.
    (G.Papadimitriou 99)

Theorem G. Koch 2002
Monadic Datalog over ?U has combined complexity
O(dataquery)
Data Complexity P-complete and linear-time.
22
Proof idea
1.) Transform datalog program input tree in
linear time into a ground
propositional logic program
  • Exploit functional dependencies
  • nextsibling(X,Y) has only a linear
    number
  • of ground instances nextsibling(ni,nj),
    etc.
  • Decouple independent atoms of rule bodies

p(X) ?q(X) r(Y) nextsibling(X,Z) s(Z).
p(X) ?q(X) r nextsibling(X,Z) s(Z). r
? r(Y).
2.) Execute ground program in linear time by
using well-known algorithms
DowlingGallier Minoux
23
XPath
W3C-standard kernel of XSLT, XQUERY, etc.
//paperauthorchandra and merlin/title
Unabbreviated syntax with explicit axes
/descendantpaperchildauthorchildchandra
and
childmerlin/childtitle
/descendantchandra/following-siblingmerlin/anc
estorpaper/childtitle
24
Core XPath A tree morphism problem
root
desc.
chandra
foll-s.
merlin
anc.
paper
child
title
data tree
query tree w. location steps
/descendantchandra/following-siblingmerlin/anc
estorpaper/childtitle
25
Core XPath A tree morphism problem
root
desc.
chandra
foll-s.
merlin
anc.
paper
child
title
data tree
query tree w. location steps
?
/descendantchandra/nextsiblingmerlin/ancestor
paper/childtitle
26
Core XPath
  • simple location steps paper/title
  • loc. steps with explicit axes paper/descendantm
    erlin
  • qualifiers paper..
  • Boolean logic ...chandra and merlin and (not
    harel)

Full Xpath
  • node set comparisons and operations
  • order functions (first, last, position) , etc.
  • arithmetic and string operations

Implementations in the context of XSLT
processors Xalan,
XT, MS Internet Explorer (IE6)
27
Core XPath
  • simple location steps paper/title
  • loc. steps with explicit axes paper/descendantm
    erlin
  • qualifiers paper..
  • Boolean logic ...chandra and merlin and (not
    harel)

Full Xpath
  • node set comparisons and operations
  • order functions (first, last) , etc.
  • arithmetic and string operations

Implementations Xalan, XT, MS Internet Explorer
6 (IE6)
Complexity, efficiency? G.,Koch,Pichler,VLDB 02
28
exponential!
Document ltagtltb/gtltb/gtlt/agt
Core Xpath on Xalan and XT Queries
a/b/parenta/b/parenta/b
29
quadratic
Core Xpath on Microsoft IE6
polynomial combined
complexity,
quadratic data complexity
30
Full XPath on IE6 Exponential combined
complexity!
Exponential query complexity
31
Axes and regular expressions
Observation All XPath Axes can be expressed
as regular expression of ?U-axes
firstchild and nextsibling
child firstchild.nextsibling
parent (nextsibling-1).firstchild-1 descend
ant firstchild.(firstchild?nextsibling)
etc
General Definition of axis
Relation definable via a regular expression
(with inversion) from the primitive relations of
?U
32
Conjunctive queries with axes
CQ conjunction of ?U-atoms and of
atoms corresponding to derived
axes
Example nextsibling(X,Z) descendant(Z,U)
ancestor(U,V)
labela (V) child(V,X)
(firstchild.firstchild?firstchild-
1)(U,X)
Theorem
Evaluating conjunctive queries with axes over
trees is NP-complete (query complexity)
33
Conjunctive queries with axes
CQ conjunction of ?U-atoms and of
atoms corresponding to derived
axes
Example nextsibling(X,Z) descendant(Z,U)
ancestor(U,V)
labela (V) child(V,X)
(firstchild.firstchild?firstchild-
1)(U,X)
Theorem
Evaluating conjunctive queries with axes over
trees is NP-complete (query complexity)
However XPath more akin acyclic conjunctive
queries!
34
Acyclic conjunctive queries with axes
Theorem
Evaluating acyclic conjunctive queries with axes
over trees is feasible in time O(dataquery)
Proof idea translate acyclic qery into monadic
datalog program over ?U
child(A,X)
descendant(X,Y)
descendant(Y,Z)
labelb(Y)
labela(Z)
35
Acyclic conjunctive queries with axes
Theorem
Evaluating acyclic conjunctive queries with axes
over trees is feasible in time O(dataquery)
Proof idea translate acyclic qery into monadic
datalog program over ?U
Ear atom which contains an ear variable that
otherwise occurs in monadic atoms only. Is
definable as (unary) MSO-query and thus
expressible by a monadic datalog program.
child(A,X)
descendant(X,Y)
descendant(Y,Z)
labelb(Y)
labela(Z)
36
Acyclic conjunctive queries with axes
Theorem
Evaluating acyclic conjunctive queries with axes
over trees is feasible in time O(dataquery)
Proof idea translate acyclic qery into monadic
datalog program over ?U
child(A,X)
d(Y) lt- firstchild(Y,Z) aa(Z). aa(Z) ?
labela(Z). aa(Z) ? aa(V) nextsibling(Z,V). aa(Z)
? aa(V) firstchild(Z,V)
descendant(X,Y)
descendant(Y,Z)
labelb(Y)
labela(Z)
37
Acyclic conjunctive queries with axes
Theorem
Evaluating acyclic conjunctive queries with axes
over trees is feasible in time O(dataquery)
Proof idea translate acyclic qery into monadic
datalog program over ?U
child(A,X)
d(Y) lt- firstchild(Y,Z) aa(Z). aa(Z) ?
labela(Z). aa(Z) ? aa(V) nextsibling(Z,V). aa(Z)
? aa(V) firstchild(Z,V)
descendant(X,Y)
d(Y)
labelb(Y)
38
Acyclic conjunctive queries with axes
Theorem
Evaluating acyclic conjunctive queries with axes
over trees is feasible in time O(dataquery)
Proof idea translate acyclic qery into monadic
datalog program over ?U
Ear atom. Continue eliminating ear atoms until
query is entirely monadic.
child(A,X)
descendant(X,Y)
d(Y)
labelb(Y)
39
Acyclic Monadic Datalog with Axes
AMX-Datalog monadic datalog programs whose rule
bodies are acyclic and may contain arbitrary axes
Theorem
Evaluating AMX-datalog programs over trees is
feasible in time O(dataprogram)
Remarks
  • Same bound for stratified AMX-Datalog
  • AMX-Datalog expresses MSO over ?U
  • (both without and with stratification)

40
Core XPath in Linear Time
Corollary
Evaluating core-XPath queries over trees is
feasible in time O(dataquery)
Proof Linear translation from Core XPath
to stratified Monadic Datalog axes
41
Core XPath in Linear Time
Corollary
Evaluating core-XPath queries over trees is
feasible in time O(dataquery)
//paperauthorchandra and not merlin/title
output(X) ? root(R) descendant(R,P)
labelpaperr(P) qual1(P)
child(P,X) labeltitle(X) . qual1(X) ?
child(X,Y) labelauthor(Y) qual2(Y).
qual2(X) ? child(X,Y) labelchandra(Y) not
qual3(X) qual3(X) ? child(X,M)
labelmerlin(M) .
42
Full XPath in Polynomial Time
Theorem G.,Koch,Pichler, VLDB 2002
Evaluating full XPath queries over XML
documents is feasible in polynomial time
(combined complexity)
Proof Extends the Logic Programming evaluation
paradigm to all nasty features of full
XPath.
Implementation (main memory)
XML-Taskforce XPath
To our knowledge the only XPath system that
scales.
43
Combined Complexity of XPath
PODS03, JACM05
44
Data and Query Complexity
  • Theorem. XPath is in L (data complexity).
  • Theorem. PF is L-hard under NC1-reductions (data
    complexity).
  • Theorem. XPath w/o multiplication, concatenation
    is in L w.r.t. query complexity.

XPath
PF
L-complete (NC1-red.)
L
Data complexity
45
Core XPath and CTL
Straightforward translation from Core XPath with
vertical axes to CTL with past modalities. (On
graphs with child relation order independent!)
//paperauthorchandra and merlin/title
first normalize to
//titleparentpaperauthorchandra and merlin
title EX-1(paper EX(author EXchandra
EXmerlin))
Core XPath requires multimodal CTL X? , X? ,
etc.
46
General conjunctive queries with axes
We know they are NP-complete, but
Research programme
  • Find interesting sets of axes for which
  • CQs are tractable.
  • Trace the tractablity frontier, i.e.,
    determine all
  • maximal sets of axes for which CQs are
    tractable.
  • Extend tractability results to datalog.

PODS 2004 G.,Koch, Schulz Solved for all XPath
axes
47
Cyclic Query Example (from ComputationalLinguisti
cs)
48
Complexity Results
(combined complexity)
(Partition of set of axes!)
49
Some simple tractability results
CQs with ?U-atoms and additional axe-sets
child or child,child can be
answered in time O(dataquery).
  • Proof idea for child
  • Cycles involving child
  • unsatisfiable (easy to check), or
  • rewritable in linear time into acyclic CQs

50
Proof idea for child,child
Xa
a


c
b
Yb
Zc


c
c
Uc
Data tree T
Cyclic query Q
51
Proof idea for child,child
Xa
a
XYZU


c
b
XYZU
XYZU
Yb
Zc


c
c
XYZU
XYZU
Uc
52
Proof idea for child,child
Xa
a
XYZU


c
b
XYZU
XYZU
Yb
Zc


c
c
XYZU
XYZU
Uc
53
Proof idea for child,child
Xa
a
X


c
b
ZU
Y
Yb
Zc


c
c
ZU
ZU
Uc
U must have an ancestor labeled b !
54
Proof idea for child,child
Xa
a
X


c
b
ZU
Y
Yb
Zc


c
c
ZU
ZU
Uc
55
Proof idea for child,child
Xa
a
X


c
b
Z
Y
Yb
Zc


c
c
Z
ZU
Uc
Z must have U as descendant-or-self
56
Proof idea for child,child
Xa
a
X


c
b
Z
Y
Yb
Zc


c
c
Z
ZU
Uc
57
Proof idea for child,child
Xa
a
X


c
b
Y
Yb
Zc


c
c
ZU
Uc
Reduct(Q,T) Locally arc-consistent!

Lemma T Q iff Reduct(Q,T) well-labeled
58
Proof idea for child,child
morphism
Xa
a
X


c
b
Y
Yb
Zc


c
c
ZU
Uc
Reduct(Q,T) Locally arc-consistent!

Lemma T Q iff Reduct(Q,T) well-labeled
59
Web wrapping
Goal Make web contents accessible to electronic
data processing
WEB HTML pages layout
Corporate edp apps structured
data, Databases, XML
60
Web wrapping
Goal Make web contents accessible to electronic
data processing
WEB HTML pages layout
Corporate edp apps structured
data, Databases, XML
WRAPPER
Wrappers select, extract, annotate Monadic
deatalog ideally suited, but whowannadoit? LiXt
o a graphical wrapper generator for ELOG
61

lt?xml version"1.0" encoding"UTF-8"?gt ltdocumentgt
ltrecordgt ltnumbergt409449118lt/numbergt
ltitemgt98 Degrees - Notebook -
Newlt/itemgt ltpicture/gt
ltpricegt2.99lt/pricegt ltcurrencygtlt/currenc
ygt ltbidsgt-lt/bidsgt lt/recordgt
ltrecordgt ltnumbergt413171469lt/numbergt
ltitemgtNotebook - Compaq Presario
1207lt/itemgt ltpricegt730.00lt/pricegt
ltcurrencygtAU lt/currencygt ...
62
Lixto Architecture
Visual Wrapper Generator
Web

Example page(s)
63
Elog Program for eBay pages
64
Expressive power of LiXto
Elog- Monadic kernel of Elog
Theorems G., Koch PODS2002
ELOG- expresses monadic datalog
All of ELOG- is graphically programmable via
LiXto
Corollary
LiXto expresses all MSO wrapping tasks.
65
Comparison to other Wrapper Generators
  • Lixto more powerful than
  • regular path queries
  • Lixto more powerful than HEL
  • (Sahuguet, Azavant)
  • ? paper

66
The Lixto Suite
  • Automated navigation to target pages
  • Automated data extraction from target pages
  • Automated data analysis,
  • transformation integration
  • Automated data personalization
  • Automated data delivery

Visual Wrapper
Transformation Server
67
Product Architecture
Transformation Server
LiXto Extraction Engine
68
Marketing Business Intelligence
Marketing Department
Oracle 9
Business Objects report
BI Tool
69
Major Customers of LiXto
70
Marketing Business Intelligence
Marketing Department
Oracle 9
Business Objects report
BI Tool
Write a Comment
User Comments (0)
About PowerShow.com