Efficiently Querying Contradictory and Uncertain Genealogical Data - PowerPoint PPT Presentation

About This Presentation
Title:

Efficiently Querying Contradictory and Uncertain Genealogical Data

Description:

First Name. Cambridge. Oxford. Purcell. Loveridge. Priscilla. Sub-relation Data Construct ... meaning what? ID# Birth Place. Birth Place. City. City. Using ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 16
Provided by: lars194
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficiently Querying Contradictory and Uncertain Genealogical Data


1
Efficiently Querying Contradictory and Uncertain
Genealogical Data
  • Lars E. Olson and David W. Embley
  • DEG Lab
  • BYU Computer Science Dept.

Supported by National Science Foundation Grant
0083127
2
Introduction
  • Integrating data from multiple sources
  • Some data just doesnt fit the data model
  • Multiple data sources conflicting data
  • Uncertain or imprecise data
  • Data that violates constraints
  • Sometimes its not possible to resolve the data
  • PAF / Gedcom

3
Disjunctive Databases
OR-tables, Imielinski and Vadaparty, 1989
4
Shortcomings of OR-tables
  • Cant correlate between possible values
  • Answering queries in general is CoNP-complete
    (Imielinski Vadaparty)

5
Sub-relation Data Construct
  • Solution store the correlated data in its own
    relation

6
Disjunctive Database Problems
  • How do we avoid the CoNP-completeness problem and
    answer queries efficiently?
  • If more than one value is possible, which one is
    the most likely?
  • Other questions to be solved
  • Where are the constraint violations?
  • How do we map sub-relations to physical storage?
  • How do we efficiently update the database?

7
Transitive Closure of Disjunctive Graphs
Solving the CoNP-completeness problem LYY95
Disjunctive graph
Possible interpretation
b
b
e
e
a
c
a
c
f
f
d
d
Transitive closure of a a, d, e
8
Using Disjunctive Graphs to Answer Queries
Table Person
Table Place
9
Using Disjunctive Graphs to Answer Queries
Person
John Doe
Name
12 Mar 1840
ID
Birth Date
1
12 Mar 1841
Place
Nauvoo
ID
City
Marriage Date
1
Birth Place
Commerce
16 Jun 1869
15 Jun 1869
State
Illinois
State
City
ID
2
Quincy
10
Using Disjunctive Graphs to Answer Queries
Place
Nauvoo
ID
1
Person
Commerce
ID
State
1
Illinois
State
City
ID
2
Quincy
11
Using Disjunctive Graphs to Answer Queries
meaning what?
  • Definitely known?
  • All possible values?
  • Most likely value?

Place
Nauvoo
1.0
ID
City
1
Person
Commerce
0.2
ID
State
Birth Place
1
Illinois
State
0.8
City
ID
2
Quincy
12
Using Disjunctive Graphs to Answer Queries
Person P1
John Doe
12 Mar 1840
Person P2
ID 1
ID 2
12 Mar 1841
James Doe
13 Mar 1840
13
Limiting the Search Space
  • In genealogy, most disjunctions are mutually
    independent
  • Disjunctions that arent independent are limited
    to immediate family relations
  • Build a relation containing all immediate family
    members

(Person P1 P1.parent P2.ID Person P2
P2.ID P3.parent Person P3)
14
Limiting the Search Space
  • Example constraints
  • Each parent should be born before their children
  • Each child should be born at least 9 months apart
    (except multiple births)

Person P1
Person P2
Person P3
ID 1
ID 1
ID 1
1.0
ID 2
ID 2
ID 2
1.0
ID 3
ID 3
ID 3
ID 4
ID 4
ID 4
parent
child parent-1
15
Conclusions
  • Genealogical data can be stored in a disjunctive
    database format.
  • Many common queries can be computed in polynomial
    time.
  • We can detect intractable queries and limit the
    search space required, usually enough to get
    polynomial time.
Write a Comment
User Comments (0)
About PowerShow.com