PXML: Probabilistic Semistructured Databases - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

PXML: Probabilistic Semistructured Databases

Description:

Surveillance applications monitoring a region of battlefield ... is surely 15. Convoy 2 may have a truck of type mac and/or a truck of type rover ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 62
Provided by: ehu64
Category:

less

Transcript and Presenter's Notes

Title: PXML: Probabilistic Semistructured Databases


1
PXML Probabilistic Semistructured Databases
  • Edward Hung, Lise Getoor, V.S. Subrahmanian
  • University of Maryland, College Park

2
Outline
  • Motivating example
  • Semistructured data model
  • PXML data model
  • Semantics
  • Interpretation
  • Satisfaction
  • Algebra
  • Related work
  • Future work

3
Motivating Example
  • Surveillance applications monitoring a region of
    battlefield
  • Image processing system identifies vehicles in
    convoys appearing in the region in different time
  • Convoys
  • Timestamp
  • tanks, trucks, etc
  • Uncertainty
  • number of vehicles
  • Category and identity of a vehicle, e.g., a tank?
    T-72?

4
Motivating Example
  • Doppler speed system detects the speed and
    velocity of convoys and infers their possible
    destinations
  • Convoys
  • Timestamp
  • Possible destinations
  • Uncertainty
  • Number of places the convoy will go
  • The name of the places

5
Motivating Example
  • Semistructured data model
  • General hierarchical structure is known.
  • The schema is not fixed
  • Number of vehicles
  • Properties of vehicles
  • Our work store uncertain information in
    probabilistic environments.

6
Semistructured Data Model
  • Instance S(V, lch, t, val)
  • lch(o, l) the set of children of o with label l
  • G (V, lch) is a rooted, directed, edge-labeled
    graph

7
Semistructured Data Model
Time 10
8
Semistructured Data Model
Time 15
9
Semistructured Data Model
  • Example

10
PXML Data Model
  • Uncertainty
  • Existence of sub-objects
  • Number of sub-objects
  • Identity of the sub-objects

11
PXML Data Model
  • Weak instance W (V, lch, t, val, card)
  • Cardinality constraint (card(o, l)) gives the
    bounds of the number of sub-objects with edge
    label l connected to the same parents o.

12
PXML Data Model
  • Example
  • Convoy 2 surely has a timestamp
  • card(convoy2, ts) 1, 1
  • Convoy 2 may have one to two trucks
  • card(convoy2, truck) 1, 2

13
PXML Data Model (Cardinality)
  • Example of cardinality

Weak Instance W Semistructured Instance card
14
PXML Data Model
  • Compatible Instances
  • A semistructured instance S (VS, lchS, tS,
    valS) is compatible with a weak instance W (VW,
    lchW, tW, valW) if
  • (VS, lchS) is a rooted connected graph.
  • If o is a leaf in S, then
  • If o is also a leaf in W, tS(o)tW(o) and
    valS(o)valW(o), otherwise, the type and value is
    defined as unknown.
  • Otherwise, card(o,l).min lt k lt card(o,l).max
    where k is the number of l-labeled children of o,
    i.e. lchS(o, l)

15
PXML Data Model
  • Example

16
PXML Data Model
  • Example
  • There are surely 2 convoys.
  • card(S, convoy) 2, 2
  • Convoy 1 surely has a timestamp, a truck and a
    tank.
  • card(convoy1, ts) 1, 1
  • card(convoy1, truck) 1, 1
  • card(convoy1, tank) 1, 1
  • Convoy 2 surely has a timestamp
  • card(convoy2, ts) 1, 1
  • Convoy 2 may have one to two trucks
  • card(convoy2, truck) 1, 2

17
PXML Data Model
  • D(W) the set of all semistructured instances
    compatible with a weak instance W

18
(No Transcript)
19
PXML Data Model (Weak Instance)
  • Example of a weak instance W

card(S1,convoy)2,2
card(convoy1,ts)1,1
card(convoy1,truck)1,1
card(convoy1,tank)1,1
card(convoy2,ts)1,1
card(convoy2,truck1,2
20
PXML Data Model
  • Example of an instance compatible with W

card(convoy1,ts)1,1
card(S1,convoy)2,2
card(convoy1,truck)1,1
card(convoy1,tank)1,1
card(convoy2,ts)1,1
card(convoy2,truck)1,2
21
  • D(W) the set of all semistructured instances
    compatible with the weak instance W

22
PXML Data Model
  • Potential child set
  • PC(o), the potential child set of a non-leaf
    object o in a weak instance W is
  • the set of all possible sets of children of o
    satisfying the constraint of cardinality

23
PXML Data Model
  • Example
  • Convoy 2s surely has one time stamp which is
    surely 15. Convoy 2 may have a truck of type mac
    and/or a truck of type rover
  • card(convoy2, truck) 1, 2
  • card(convoy2, ts) 1, 1
  • PC(convoy2) ts2, truck3, ts2, truck4,
    ts2, truck3, truck4

24
Potential child set of convoy2, PC(convoy2)
ts2, truck3, truck4,
ts2, truck3,
ts2, truck4
25
PXML Data Model
  • Probabilistic instance I (V, lch, t, val, card,
    ipf)
  • Interval probability function (ipf(o, c)) w.r.t.
    the set PC(o) associates, with each c in PC(o), a
    closed subinterval lb(c), ub(c) 0, 1

26
PXML Data Model
  • Example
  • PC(convoy2) ts2, truck3, ts2, truck4,
    ts2, truck3, truck4
  • ipf(convoy2, ts2, truck3)0.2, 0.3
  • ipf(convoy2, ts2, truck4)0.3, 0.5
  • ipf(convoy2, ts2, truck3, truck4)0.2, 0.4

27
Probabilistic Instance I Weak Instance W ipf
ipf(convoy2, ts2, truck3 , truck4)0.2, 0.3
ipf(convoy2, ts2, truck3)0.3, 0.5
ipf(convoy2, ts2, truck4)0.2, 0.4
28
PXML Data Model
  • Here the ipf assigns the probability interval to
    each possible set of children.
  • More independence assumptions are possible to
    make the representation more compact
  • e.g. independence between trucks and tanks.
  • e.g. all trucks are all indistinguishable.

29
Semantics (Global Interpretation)
  • Interpretation
  • Global interpretation, P
  • a mapping from D(W) (the set of semistructured
    instances compatible with W) to 0,1 s.t.

30
S1a
S1b
S1c
P(S1a) 0.12
P(S1b) 0.08
P(S1c) 0.2
S1d
S1e
S1f
P(S1d) 0.18
P(S1e) 0.12
P(S1f) 0.3
31
Semantics (Local Interpretation)
  • An object probability function (OPF)for an object
    o w.r.t. a weak instance W is a mapping w PC(o)
    ? 0, 1 s.t.

32
Semantics
  • Example
  • ipf(convoy2, ts2, truck3)0.2, 0.3
  • ipf(convoy2, ts2, truck4)0.3, 0.5
  • ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
  • wconvoy2(ts2, truck3) 0.2
  • wconvoy2(ts2, truck4) 0.5
  • wconvoy2(ts2, truck3, truck4) 0.3

33
Semantics (Local Interpretation)
  • Previously, probabilities are assigned to each
    compatible instance globally.
  • Now we are going to assign probabilities of the
    actual children of each non-leaf object in a
    local manner.

34
Object probability function (OPF) for convoy2
w.r.t. W is a mapping w PC(convoy2) ? 0,1 s.t.
wconvoy2(ts2, truck3 , truck4) 0.2
wconvoy2(ts2, truck3) 0.5
wconvoy2(ts2, truck4) 0.3
35
Semantics (Local Interpretation)
  • Interpretation
  • Local interpretation, p
  • a mapping from the set of non-leaf objects to
    OPFs
  • Example
  • p(convoy2) wconvoy2

36
Semantics (Local ? Global)
  • Assume that the probability of any potential
    child of an object o is independent of
    non-descendants of o.
  • W operator
  • W operator returns the probabilities assigned to
    every semistructured instance compatible with a
    given weak instance, which is consistent with a
    given local interpretation.
  • Given a semistructured instance S compatible with
    a weak instance W and a local interpretation p
    for W
  • W(p)(S)Õo S p(o)(CS(o))
  • Theorem
  • W(p) is a global interpretation for W

37
Semantics
  • Example
  • ipf(S1, convoy1, convoy2)1, 1
  • wS1(ts1, truck1, tank1) 1
  • ipf(convoy1, ts1, truck1, tank1)0.2, 0.6
  • ipf(convoy1, ts1, truck1, tank2)0.4, 0.8
  • wconvoy1(ts1, truck1, tank1) 0.4
  • wconvoy1(ts1, truck1, tank2) 0.6
  • ipf(convoy2, ts2, truck3)0.2, 0.3
  • ipf(convoy2, ts2, truck4)0.3, 0.5
  • ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
  • wconvoy2(ts2, truck3) 0.2
  • wconvoy2(ts2, truck4) 0.5
  • wconvoy2(ts2, truck3, truck4) 0.3

38
Semantics
  • Example
  • W(S1a)
  • p(S1)(convoy1, convoy2) x p(convoy1)(ts1,
    truck1, tank1) x p(convoy2)(ts2, truck3,
    truck4)
  • wS1(ts1, convoy1, convoy2) x wconvoy1(ts1,
    truck1, tank1) x wconvoy2(ts2, truck3, truck4)
  • 1 x 0.4 x 0.3
  • 0.12

39
Semantics
wS1(convoy1, convoy2)1
wconvoy1(ts1, truck1, tank1) 0.4
wconvoy2(ts2, truck3, truck4)0.3
p(S1)(convoy1, convoy2) x p(convoy1)(ts1,
truck1, tank1) x p(convoy2)(ts2, truck3,
truck4)
  • W(S1a)

wS1(ts1, convoy1, convoy2) x wconvoy1(ts1,
truck1, tank1) x wconvoy2(ts2, truck3, truck4)

1 x 0.4 x 0.3 0.12
40
Semantics
  • Example
  • Similarly, we can get
  • W(S1a) 0.12
  • W(S1b) 0.08
  • W(S1c) 0.2
  • W(S1d) 0.18
  • W(S1e) 0.12
  • W(S1f) 0.3

41
Semantics (Global ? Local)
  • (Same assumption) The probability of any
    potential child of an object o is independent of
    non-descendants of o.
  • Given a global interpretation P for a weak
    instance W
  • P satisfies W iff P(co, ndes(o)) P(co)
  • ndes(o) is the set of non-descendants of o.

42
Semantics (Global ? Local)
  • D operator
  • D operator returns the probabilities assigned to
    each possible set of children of every non-leaf
    object, which is consistent with a given global
    interpretation.
  • Given a global interpretation P that satisfies a
    weak instance W, for any non-leaf object o, any c
    in PC(o)
  • D(P) returns a function defined as follows for
    any non-leaf object o, D(P)(o)wP,o

43
Semantics (Global ? Local)
  • Theorem
  • D(P) is a local interpretation for W
  • Example
  • Derive D(P)(convoy2)

44
S1a
S1b
S1c
P(S1a) 0.12
P(S1b) 0.08
P(S1c) 0.2
S1d
S1e
S1f
P(S1d) 0.18
P(S1e) 0.12
P(S1f) 0.3
D(P)(convoy2) wP, convoy2
  • wP, convoy2(ts2, truck3, truck4)
    (0.120.18)/10.3

45
D(P)(convoy2) wP, convoy2
  • wP, convoy2(ts2, truck3, truck4)
    (0.120.18)/10.3
  • wP, convoy2(ts2, truck3) (0.080.12)/1 0.2
  • wP, convoy2(ts2, truck4) (0.20.3)/1 0.5

46
Semantics
  • Example
  • Derive D(P)(convoy2) wP, convoy2
  • wP, convoy2(ts2, truck3, truck4)
    (0.120.18)/10.3
  • wP, convoy2(ts2, truck3) (0.080.12)/1 0.2
  • wP, convoy2(ts2, truck4) (0.20.3)/1 0.5

47
Semantics (Local ?? Global)
  • Theorems
  • Suppose p is a local interpretation for a weak
    instance W, then D(W(p))p.
  • Suppose P is a global interpretation that
    satisfies a weak instance W, then W(D(P))P.

48
Semantics (Satisfaction)
  • Given a probabilistic instance I, a non-leaf
    object o,
  • OC(o), the object constraints are
  • p(c) is a real-valued variable denoting the
    probability that c is the actual set of children
    of o.

49
Semantics (Satisfaction)
  • Example
  • ipf(convoy2, ts2, truck3)0.2, 0.3
  • ipf(convoy2, ts2, truck4)0.3, 0.5
  • ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
  • OC(convoy2)

50
Semantics (Local Satisfaction)
  • An OPF w satisfies a non-leaf object o iff w is a
    probability distribution w.r.t. PC(o) over ipf.
  • A local interpretation p satisfies a non-leaf
    object o iff p(o) satisfies o.
  • A local interpretation p satisfies a
    probabilistic instance I iff p satisfies Is
    every non-leaf object.

51
Semantics (Global Satisfaction)
  • A global interpretation P satisfies a
    probabilistic instance I iff D(P) satisfies I.
  • Corollary
  • A local interpretation p satisfies a
    probabilistic instance I iff W(p) satisfies I.

52
Semantics (Consistency)
  • A probabilistic instance is locally consistent
    iff there is a local interpretation that
    satisfies it.
  • A probabilistic instance is globally consistent
    iff there is a global interpretation that
    satisfies it.
  • Theorem
  • Every probabilistic instance is locally and
    globally consistent.

53
Algebra
  • Operators
  • Projection
  • Selection
  • Cross-product
  • Path expression
  • o.l1.l2ln

S1.convoy.truck
54
Algebra (Projection)
  • Ancestor projection
  • Descendant projection
  • Single projection

55
Algebra (Projection)
Semistructured Instance
  • Ancestor projection ( )

56
Weak Instance
  • Ancestor projection ( )

57
Probabilistic Instance
  • Ancestor projection ( )

card(convoy1,ts)1,1
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
card(convoy1,tank)1,1
ipf(convoy1, ts1,truck1,tank1)0,0.3 ipf(convo
y1, ts1,truck1,tank2)0.1,0.4 ipf(convoy1,
ts1,truck2,tank1)0.3,0.5 ipf(convoy1,
ts1,truck2,tank2)0.3,0.6
PC(convoy1)
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
Children of convoy1 before CI2(convoy1)ts1,
truck1, truck2, tank1, tank2
Children of convoy1 after CI2(convoy1)truck1,
truck2
Let Cd CI2(convoy1) CI2(convoy1)ts1,
tank1, tank2
PC(convoy1)truck1,truck2
58
Probabilistic Instance
  • Ancestor projection ( )

card(convoy1,ts)1,1
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
card(convoy1,tank)1,1
ipf(convoy1, ts1,truck1,tank1)0,0.3 ipf(convo
y1, ts1,truck1,tank2)0.1,0.4 ipf(convoy1,
ts1,truck2,tank1)0.3,0.5 ipf(convoy1,
ts1,truck2,tank2)0.3,0.6
PC(convoy1)
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
For each c in PC(convoy1),
ipf(convoy1, c)a, min(1,b)
ipf(convoy1) ? tight(ipf(convoy1))
Dekhtyar, Goldsmith (2002)
59
Probabilistic Instance
  • Ancestor projection ( )

card(convoy1,ts)1,1
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
card(convoy1,tank)1,1
ipf(convoy1, ts1,truck1,tank1)0,
0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip
f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo
y1, ts1,truck2,tank2)0.3,0.6
PC(convoy1)
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
For truck1,
a 0.00.1 0.1
b 0.30.4 0.7
ipf(convoy1, truck1) 0.1, min(1, 0.7)
0.1, 0.7
60
Probabilistic Instance
  • Ancestor projection ( )

card(convoy1,ts)1,1
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
card(convoy1,tank)1,1
ipf(convoy1, ts1,truck1,tank1)0,
0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip
f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo
y1, ts1,truck2,tank2)0.3,0.6
PC(convoy1)
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
For truck2,
a 0.30.3 0.6
b 0.50.6 1.1
ipf(convoy1, truck2) 0.6, min(1, 1.1)
0.6, 1
61
Probabilistic Instance
  • Ancestor projection ( )

card(convoy1,ts)1,1
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
card(convoy1,tank)1,1
ipf(convoy1, ts1,truck1,tank1)0,
0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip
f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo
y1, ts1,truck2,tank2)0.3,0.6
PC(convoy1)
card(I2,convoy)1,1
card(convoy1,truck)1,1
ipf(I2, convoy1)1
ipf(convoy1) ? tight(ipf(convoy1))
tight
ipf(convoy1, truck1)0.1, 0.7 ipf(convoy1,
truck2)0.6, 1
ipf(convoy1, truck1)0.1, 0.4 ipf(convoy1,
truck2)0.6, 0.9
62
  • Ancestor projection ( )

HIDE IT
card(convoy1,ts)1,1
card(convoy1,truck)1,1
card(convoy1,tank)1,1
wconvoy1(ts1,truck1,tank1)0.4 wconvoy1(ts1,tru
ck1,tank2)0.6
card(S1,convoy)2,2
wS1(convoy1,convoy2)1
card(convoy2,ts)1,1
card(convoy2,truck1,2
wconvoy2(ts2,truck3)0.2 wconvoy2(ts2,truck4)
0.5 wconvoy2(ts2,truck3,truck4)0.3
63
Algebra (Projection)
  • Descendant projection ( )

card(I3, truck)0,3 ipf(I3,c)0,1
One naive strategy
Our better strategy similar to the one in cross
product
64
Algebra (Projection)
  • Single projection ( )

(null)
card(I3, truck)0,3 ipf(I3,c)0,1
65
Algebra (Projection)
  • Equivalence

Equivalent
66
Algebra (Projection)
  • Equivalence

Equivalent
67
Algebra (Projection)
  • Equivalence

Equivalent
e1 and e2 are a sequence of zero or more
edges. Thus, I.e1.lm can include I.lm, I.l1.lm,
I.l2.l3.lm, etc.
68
In general non-equivalent
69
Algebra (Selection) ( )
  • Similar to ancestor projection
  • Path expression specifies leaf objects with a
    specified value.

70
Algebra (Selection)
Semistructured Instance
I1
71
Algebra (Selection) ( )
card(I7, convoy)1,2, wI7(convoy1)0.2,
wI7(convoy2)0.5, wI7(convoy1,convoy2)0.3
card(convoy1, tank)1,1 wconvoy1(tank1)0.3,
wconvoy1(tank2)0.7
card(convoy2, tank)1,1 wconvoy2(tank2)0.4,
wconvoy2(tank3)0.6
0.14 0.3 0.054 0.036 0.084 0.614
D(I7) ?
0.054
/ 0.614
0.06
0.126
0.14
/ 0.614
0.036
0.3
/ 0.614
/ 0.614
0.2
0.084
/ 0.614
72
Algebra (Selection) ( )
card(I7, convoy)1,2, ipf(I7,convoy1)0.1,0.
3, ipf(I7,convoy2)0.4,0.6,
ipf(I7,convoy1,convoy2)0.2,0.4
card(convoy1, tank)1,1 ipf(convoy1,tank1)0.
2,0.4, ipf(convoy1,tank2)0.6,0.8
card(convoy2, tank)1,1 ipf(convoy2,tank2)0.
3,0.5, ipf(convoy2,tank3)0.5,0.7
D(I7) ?
0.012,0.08
Conditionalization of interval probabilities
0.02,0.12
0.02,0.112
0.06,0.24
0.036,0.16
Dekhtyar, Goldsmith (2002)
0.08,0.24
0.24,0.48
0.06,0.224
73
Algebra (Cross product (x))
  • Probabilistic conjunction strategies
  • Example
  • Ignorance
  • Positive correlation
  • Negative Correlation
  • Independence

74
Algebra (Cross product (x))
card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7
ipf(I4, truck2)0.3,0.8
card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip
f(I5, tank2)0.4,0.9
card(I6, truck)1,1 card(I6, tank)1,1
I4 x I5
75
Algebra (Cross product (x))
card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7
ipf(I4, truck2)0.3,0.8
card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip
f(I5, tank2)0.4,0.9
card(I6, truck)1,1 card(I6, tank)1,1
I4 x I5
76
Algebra (Cross product (x))
card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7
ipf(I4, truck2)0.3,0.8
card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip
f(I5, tank2)0.4,0.9
card(I6, truck)1,1 card(I6, tank)1,1
I4 x I5
77
Algebra (Cross product (x))
card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7
ipf(I4, truck2)0.3,0.8
card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip
f(I5, tank2)0.4,0.9
card(I6, truck)1,1 card(I6, tank)1,1
I4 x I5
78
Algebra (Cross product)
  • Equivalence
  • (I1 x I2) x I3
  • I1 x (I2 x I3)
  • (I1 x I3) x I2

Equivalent
79
Related Work
  • Semistructured Probabilistic Objects (SPOs)
    (Dekhtyar, Goldsmith, Hawkes, 2001)
  • SPO express probabilistic information in a
    semistructured manner
  • PXML data model stores XML data AND probabilistic
    information.

80
Related Work
  • Algebras TAX, SAL
  • TAX (Jagadish, Lakshmanan, Srivastava, 2001)
  • use pattern tree to extract subsets of nodes, one
    for each embedding of pattern tree.
  • fixed number of children
  • SAL (Beeri, Tzaban, 1999)
  • bind objects to variables
  • original structure is totally lost

81
Future Work
  • Implement the system
  • Query optimization

82
Summary
  • PXML data model
  • Semistructured instance
  • Weak instance (add cardinality)
  • Probabilistic instance (add ipf)
  • Semantics
  • Local and Global
  • Interpretation
  • Satisfaction
  • Algebra
  • Projections, selection, cross product

83
Algebra (Projection)
  • Equivalence

Equivalent
84
Algebra (Projection)
  • Equivalence

Equivalent
e1 and e2 are a sequence of zero or more
edges. Thus, I.e1.lm can include I.lm, I.l1.lm,
I.l2.l3.lm, etc.
85
In general non-equivalent
86
Algebra (Cross product)
  • Equivalence
  • (I1 x I2) x I3
  • I1 x (I2 x I3)
  • (I1 x I3) x I2

Equivalent
87
Related Work
  • Bayesian net (Pearl, 1988)
  • random variables (probability of events)
  • ours existence of children requires existence of
    parents
Write a Comment
User Comments (0)
About PowerShow.com