Probabilistic RDF - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Probabilistic RDF

Description:

Probabilistic RDF Octavian Udrea1 V.S. Subrahmanian1 Zoran Majki 2 1University of Maryland College Park 2University La Sapienza , Rome, Italy – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 27
Provided by: Octavia8
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic RDF


1
Probabilistic RDF
  • Octavian Udrea1
  • V.S. Subrahmanian1
  • Zoran Majkic2
  • 1University of Maryland College Park
  • 2University La Sapienza, Rome, Italy

2
Motivation
  • Not all information on the Web is easily
    expressible in classic models (i.e.,
    relational)
  • RDF extraction from text
  • STORY is the first, very successful prototype
  • Need to extend RDF with temporal, uncertainty
    components
  • Goal build a logical model of RDF with
    uncertainty and provide query algorithms

3
The Probabilistic RDF idea
  • An RDF theory is a set of triples (subject,
    property, value)
  • (USA hasCapital Washington DC),
  • (Washington DC hasPopulation 500,000)
  • Probabilistic RDF extends this model with
    uncertainty over the set of values.
  • (USA hasCapital (Washington DC, 0.95), (State of
    Washington, 0.05))

4
Probabilistic RDF example
Extracted based on www.wrongdiagnosis .com
5
Probabilistic RDF example
6
Probabilistic RDF example
7
Probabilistic RDF example
8
Probabilistic RDF syntax
  • Schema uncertainty
  • (c subClassOf (C,d))
  • Sd?C d(d) lt 1
  • Class-instance uncertainty
  • (x rdftype (C,d))
  • Sd?C d(d) lt 1
  • Instance-based uncertainty
  • (x p (Y, d))
  • Sy?Y d(y) lt 1

9
Probabilistic RDF syntax
  • Sanity requirements
  • (c subClassOf (C1,d1)), ((c subClassOf (C2,d2))
    gt (C1 C2 and d1 d2) or C1 n C2 Ø
  • Same applies for other types of uncertainty
  • Transitive properties
  • Simple inferential capability
  • Examples associatedWith, controlledBy
  • P-path
  • A set of triples connected by transitive
    properties

10
Example p-path
11
P-path semantics and t-norms
  • We cannot generally assume independence between
    triples on a transitive path
  • Flu, AcuteBronchitis, Pneumonia
  • T-norms are used to express the users knowledge
    of the relationship between triples
  • ? is associative, commutative
  • 0 ? x 0, 1 ? x x
  • x lt y, z lt w gt x ? z lt y ? w
  • P-Path probability t-norm applied to individual
    probabilities on the path

12
Example p-path
(Flu, associatedWith, (Pneumonia, 0.455)) w.r.t.
the product t-norm
13
pRDF semantics
  • A world W is a set of simple triples (with no
    probabilities)
  • An interpretation I associates a probability to
    each world
  • I satisfies a pRDF theory
  • For each (s, p, (V,d)), d(v) lt S I(W), where W
    contains (s,p,v)
  • Same applies to paths w.r.t. to a given t-norm

14
pRDF semantics
  • A theory is consistent iff it has a satisfying
    interpretation
  • Every pRDF theory is consistent
  • Entailment T entails T iff every satisfying
    interpretation of T satisfies T
  • Closure of a theory The entire set of triples
    entailed by the theory
  • Maximal w.r.t. the probability values

15
pRDF fixpoint semantics
  • The closure operator ? adds exactly one entailed
    triple at each step
  • (Flu associatedWith, (Acute Bronchitis, .7)) and
  • (Acute Bronchitis associatedWith (Pneumonia,
    .65)) yields
  • (Flu associatedWith, (Pneumonia, 0.455))
  • w.r.t. the product t-norm
  • ? has a fixpoint which is the theory closure.

16
pRDF query processing
  • We will consider only simple queries a triple
    with a variable term
  • Example (? associatedWith Pneumonia 4)
  • What is associated with Pneumonia with
    probability above .4?
  • Simple method
  • Compute the closure
  • Select any triple in the closure that matches the
    query
  • VERY expensive computationally

17
pRDF query processing
  • Set of algorithms for answering simple queries
    and conjunctions
  • pRDF_Subject, pRDF_Property, , pRDF_conjunction
  • Central idea
  • Apply ? in only those directions that yield
    tuples relevant to the query
  • Cut off path computations when the threshold can
    no longer be reached.
  • min?(current_probability, threshold)

18
Experimental results
  • Implementation
  • Java, 1700 LOC
  • Disk-based storage for pRDF theories
  • Synthetically generated datasets
  • According to varying underlying distributions
  • Datasets extracted from Web sources

19
Experimental questions
  • Does the underlying distribution affect query
    running time?
  • From a practical point of view, which are the
    fastest types of queries?
  • How does running time vary with the number of
    atoms in a conjunction?
  • What other theory-dependent factors affect
    running time?
  • Theory width
  • Number of properties

20
Query running time (Poisson)
21
Query running time (zipf)
22
Conjunctive queries running time
23
Dependence on property width
24
Number of properties
25
Take away points
  • RDF syntax with uncertainty
  • Model-theory and fixpoint semantics for pRDF
  • Efficient query algorithms for pRDF

26
The end
  • http//om.umiacs.umd.edu/
  • Thank you!
  • Questions comments
Write a Comment
User Comments (0)
About PowerShow.com