Folie 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Folie 1

Description:

Title: Folie 1 Author: Suchanek Last modified by: suchanek Document presentation format: On-screen Show Other titles: Arial Lucida Sans Unicode Times New Roman Symbol ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 77
Provided by: Suc93
Category:

less

Transcript and Presenter's Notes

Title: Folie 1


1
YAGO Yet Another Great Ontology
PhD Defense Fabian M. Suchanek (Max-Planck
Institute for Informatics, Saarbrücken)?
2
Overview
  • Motivation Why would anybody need
    Ontologies?
  • Building a Core Ontology YAGO
  • Extending the Core Ontology SOFIE

3
Santa Claus in Need
World population
4
The Search for a Second Santa Claus
strong, tall guy , australian
Seeking strong, tall Australian man I'm 27, blue
eyes, looking for a tall strong Australian man.
girls-seek-guys.com/london/42 Cached
Similar pages
5
The Search for a Second Santa Claus
strong person, gt 1.90, Australian
Seeking strong, tall Australian man I'm 27, blue
eyes, looking for a tall strong Australian man.
... I'm 190 kg girls-seek-guys.com/london/42
Cached Similar pages
6
The Search for a Second Santa Claus
Hi Larry, it's me, Santa Claus. I think you
misunderstood wh
Seeking strong, tall Australian man I'm 27, blue
eyes, looking for a tall strong Australian man.
girls-seek-guys.com/london/42 Cached
Similar pages
7
Solution An Ontology
physical entity
is a
person
is a
is a
continent
is a
isFrom
height
Australia
1.90m
8
Solution An Ontology
physical entity
is a
person
is a
Classes
is a
Relations
continent
is a
isFrom
Individuals
Australia
9
Vision
Gathering the knowledge of this world in a
structured ontology.
? Semantic Search ? Question answering ? Machine
Translation ? Document classification ?
The world, Id like to say, even though some may
contradict, is not as it seems. It rather seems
as if the world seems not what it seems
10
Plan of Attack
  • Motivation ?
  • Building a Core Ontology YAGO
  • Extending the Core Ontology SOFIE

The world, Id like to say, even though some may
contradict, is not as it seems. It rather seems
as if the world seems not what it seems
11
YAGO Goal
Goal Build a Large Ontology
Previous Approaches ? Assemble the ontology
manually (WordNet, SUMO, Cyc, GeneOntology)?
Problem Usually low coverage (MPI is in none
of these)?
? Use community work (Semantic Wikipedia,
Freebase)? Problem We don't know yet
whether it takes off
12
YAGO Goal
Goal Build a Large Ontology
Our Approach ? Extract knowledge from
Wikipedia and WordNet (securing high coverage)
? Use extensive quality control techniques
(securing high consistency)
13
YAGO Infoboxes
Claus K
bornIn
Sydney
blah blah blub (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis
laber fasel suelz. Blub, aber blah! Insbesondere,
blub, texte zu, und so weiter blah blah blub
Elvis laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter
Exploit infoboxes
Born in Sydney ...
14
YAGO Categories
Claus K
bornIn
born
Sydney
blah blah blub (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis
laber fasel suelz. Blub, aber blah! Insbesondere,
blub, texte zu, und so weiter blah blah blub
Elvis laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter
1980
Exploit infoboxes
Exploit relational categories
Categories
1980_births
15
YAGO Categories
Australian Boxer
Claus K
isA
bornIn
born
Sydney
blah blah blub (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis
laber fasel suelz. Blub, aber blah! Insbesondere,
blub, texte zu, und so weiter blah blah blub
Elvis laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter
1980
Exploit infoboxes
Exploit relational categories
Categories
Exploit conceptual categories
Australian Boxers
16
YAGO Categories
Australian Boxer
Kick boxing
Claus K
isA
isA
bornIn
born
Sydney
blah blah blub (don't read this! Better listen to
the talk!) laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter blah blah blub Elvis
laber fasel suelz. Blub, aber blah! Insbesondere,
blub, texte zu, und so weiter blah blah blub
Elvis laber fasel suelz. Insbesondere, blub,
texte zu, und so weiter
1980
Exploit infoboxes
Exploit relational categories
Categories
Exploit conceptual categories
Kick boxing
Avoid thematic categories
17
YAGO Upper Model
entity
?
person
Australian boxer
is a
born
1980
18
YAGO Upper Model
Business
Social_group
?
People_by_occupation
Australian boxer
is a
born
1980
19
YAGO Upper Model
Person
subclass
WordNet
Boxer
subclass
Australian boxer
is a
Wikipedia
born
1980
Suchanek et al. WWW 2007
20
YAGO Quality Control
1. Canonicalization 1. ... of entities
Santa Klaus
Santa Clause
Santa Claus
Santa
21
YAGO Quality Control
1. Canonicalization 1. ... of entities
22
YAGO Quality Control
1. Canonicalization 1. ... of entities
2. ... of facts
born
1980
born
1980-12-19
23
YAGO Quality Control
1. Canonicalization 1. ... of entities
2. ... of facts 2. Type Checks 1.
Reductive Type Checking
range(bornOnDate, timepoint)? bornOnDate(Claus_Ken
t, Sydney)?
24
YAGO Quality Control
Entity
1. Canonicalization 1. ... of entities
2. ... of facts 2. Type Checks 1.
Reductive Type Checking 2. Type Coherence
Checking
Person
Artifact
Boxer, Swimmer, Flight instructor, Airplane
25
YAGO Quality Control
1. Canonicalization 1. ... of entities
2. ... of facts 2. Type Checks 1.
Reductive Type Checking 2. Type Coherence
Checking
Every fact and every entity occurs exactly once
Every fact fulfills its type constraints
Suchanek et al. JWS 2008
26
YAGO Numbers
bornIn, actedIn, hasInflation,...
Relations 100 Entities 2 million Facts 19
million Accuracy 95
One of the largest public free ontologies
Unprecedented quality among automatedly
constructed ontologies
27
YAGO Model
boxer
1 (ClausKent,is_a,boxer)? 2 (1, since,
1990)? 3 (1, source, Wikipedia)?
since
1990
is a
source
Wikipedia
28
YAGO Model
  • A YAGO ontology over
  • a set of relations R
  • a set of common entities C
  • a set of fact identifiers I
  • is a function
  • I ? (R?C?I) ? R ? (R?I?C)?

1 (ClausKent,is_a,boxer)? 2 (1, since,
1990)? 3 (1, source, Wikipedia)?
  • We can talk about
  • facts (1, source, Wikipedia)?
  • additional arguments (1, since, 1990)?
  • relations (since, hasRange, time_interval)?

Still Decideable Consistency
29
YAGO Summary
YAGO is an ontology that is ? large (combining
Wikipedia and WordNet) ? accurate (using
extensive quality control) ? computationally
tractable (with a decideable consistency)
30
Plan of Attack
  • Motivation ?
  • Building a Core Ontology YAGO ?
  • Extending the Core Ontology SOFIE

YAGO
The world, Id like to say, even though some may
contradict, is not as it seems. It rather seems
as if the world seems not what it seems
31
SOFIE Goal Statement
bornIn
Patara
Saint Nicholas
Goal Extending the ontology
Saint Nicholas was born in Patara.
32
SOFIE Goal Statement
bornIn
Patara
Saint Nicholas
Goal Extending the ontology
Saint Nicholas ce e po?u? ? Patara.
33
SOFIE Goal Statement
bornIn
Patara
Saint Nicholas
Goal Extending the ontology
recoverWithout(most_people, medication)? areUnder(
0, the_age_of_18)? support(these_findings,
the_notion)?
Saint Nicholas was born in Patara.
Previous Approaches
? Extract knowledge from corpora (e.g. the
Web)? (Text2Onto, Espresso, Snowball,
TextRunner)? Problems Low accuracy,
non-canonicity
34
SOFIE Goal Statement
bornIn
Patara
Saint Nicholas
Goal Extending the ontology
Saint Nicholas was born in Patara.
Our Approach (1)
? LEILA - Combining Linguistic and Statistical
Analysis Suchanek et al. KDD 2006 Has high
accuracy, but does not deliver canonicity
35
SOFIE Goal Statement
bornIn
Patara
Saint Nicholas
Goal Extending the ontology
Saint Nicholas was born in Patara.
Our Approach (2)
? SOFIE Use logical reasoning to guarantee
canonicity
36
SOFIE Example
YAGO
Worshipped People
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
"was born in the year" expresses bornInYear
Pattern occurrence gt pattern meaning
37
SOFIE Example
YAGO
Worshipped People
bornInYear
1935
Saint Nicholas was born in the year 1417.
Elvis Presley was born in the year 1935.
"was born in the year" expresses bornInYear
Pattern occurrence gt pattern meaning
bornInYear
Pattern occurrence gt sentence meaning
1417
38
SOFIE Example
YAGO
Worshipped People
bornInYear
1935
Saint Nicholas was born in the year 1417.
diedInYear
Elvis Presley was born in the year 1935.
347
"was born in the year" expresses bornInYear
Pattern occurrence gt pattern meaning
bornInYear
Pattern occurrence gt sentence meaning
1417
People should be born before they die.
39
SOFIE Example
YAGO
Worshipped People
bornInYear
1935
Saint Nicholas was born in the year 1417.
diedInYear
Elvis Presley was born in the year 1935.
347
"was born in the year" expresses bornInYear
Pattern occurrence gt pattern meaning
bornInYear
Pattern occurrence gt sentence meaning
1417
People should be born before they die.
40
SOFIE Example
YAGO
Task 1 Find Patterns
bornInYear
1935
Saint Nicholas was born in the year 1417.
diedInYear
Elvis Presley was born in the year 1935.
347
Task 2 Use semantic reasoning
Task 3 Disambiguate entities
Pattern occurrence gt pattern meaning
Pattern occurrence gt sentence meaning
bornInYear
1417
People should be born before they die.
41
SOFIE Its all logical formulae!
YAGO
Task 1 Find Patterns
bornInYear(ElvisPresley,1935) diedInYear(Nichola
sOfMyra,347)
occurs("was born in the year", SaintNicholas,1417)
occurs("was born in the year", ElvisPresley,1935)
Task 2 Use semantic reasoning
Task 3 Disambiguate entities
occurs(P,X,Y) /\ expresses(P,R) gt R(X,Y)
means(SaintNicholas,NicholasOfMyra) 0.8
means(SaintNicholas,NicholasOfFüe)
0.2 refersTo(SaintNicholas,NicholasOfFüe)
? bornOnDate(NicholasOfFüe, 1417) ?
bornInYear(X,B) /\ diedInYear(X,D) gt BltD
42
SOFIE Information Extraction as MAX SAT
We have a Weighted MAX SAT Problem
r(x,y) /\ s(x,z) gt t(x,z) w ...
Problem ? The Weighted MAX SAT Problem is
NP-hard ? Our instance contains YAGO (19
million facts) and textual facts (e.g.
10,000 facts) ? The best-known approximation
algorithm cannot deal well with our
specific instance
43
SOFIE A Unifying Framework
r(a,b) gt s(x,y)?
Task 1 Find Patterns
Polynomial time
Algorithm Functional MAX SAT FOR i1 TO
42 ... NEXT i
Task 2 Use semantic reasoning
Approximation Guarantee
Task 3 Disambiguate entities
1417
NicholasOfFlüe
Suchanek et al TR 2009
44
SOFIE Experiments
Corpus Type Docs Relations Time Precision
Wikipedia toy corpus structured 100 3 8min 100
Wikipedia subcorpus semi-structured 2000 15 15h 94
News article toy corpus unstructured 150 1 24min 91
Biographies from Web unstructured 3440 5 15h 90
45
SOFIE Summary
SOFIE unifies 3 tasks in a single
framework SOFIE delivers ? canonicalized
facts ? of high precision
Task 1 Find Patterns
Task 2 Use semantic reasoning
Task 3 Disambiguate entities
46
But back to the original question...
Is there any Australian guy taller than 1.90m who
could help me out?
47
Conclusion Good News
? We made a great step towards gathering
the knowledge of this world in a structured
ontology
YAGO
SOFIE
The world, Id like to say, even though some may
contradict, is not as it seems. It rather seems
as if the world seems not what it seems
? Christmas is safe!
48
References
Suchanek et al. KDD 2006 Fabian M. Suchanek,
Georgiana Ifrim and Gerhard Weikum
"Combining Linguistic and Statistical
Analysis to Extract
Relations from Web Documents"
Conference on Knowledge Discovery and Data
Mining (KDD 2006)? Suchanek et al. WWW 2007
Fabian M. Suchanek, Gjergji Kasneci and Gerhard
Weikum "YAGO - A Core of
Semantic Knowledge"
International World Wide Web conference (WWW
2007)? Suchanek et al. JWS 2008 Fabian M.
Suchanek, Gjergji Kasneci and Gerhard Weikum
"YAGO - A Large Ontology
from Wikipedia and WordNet"
Suchanek et al. JWS Journal of Web Semantics
2008 Suchanek et al. TR 2009 Fabian M.
Suchanek, Mauro Sozio, Gerhard Weikum
SOFIE A Self-Organizing Framework
for Information Extraction
Submitted to the International World Wide Web
conference (WWW 2009)?
See Technical Report or my PhD Thesis on
http//mpii.de/suchanek
49
Acronyms
LEILA Learning to Extract Information by
Linguistic Analysis YAGO Yet Another Great
Ontology SOFIE Self-Organizing Framework for
Information Extraction NAGA Not another Google
Answer
50
YAGO Thematic vs Conceptual Categories
Australian boxers of German origin
? conceptual
? thematic
Kick boxing in Australia
Shallow linguistic noun phrase parsing
Premodifier Head Postmodifier
Heuristics If the head is a plural word, the
category is conceptual
51
YAGO Upper Model
Person
subclass
WordNet
Boxer 42
Boxer 1
....
Australian boxer
is a
Wikipedia
born
1980
52
A Hitchhiker's Guide to Ontology
DBpedia (HU Berlin)?
SUMO (research project)?
YAGO forms taxonomic backbone
YAGO and SUMO have been merged
YAGO
YAGO is part of the project by its Web service
YAGO will be included
Linking Open Data (HU Berlin, U Leipzig, OLS
Inc.)?
Freebase (community)?
Planned
YAGO contributes the entities
YAGO is used for bootstrapping
Cyc (commercial)?
KOG (U Washington)?
UMBEL (commercial)?
Suchanek et al. JWS 2008
53
YAGO Applications
NAGA (Semantic Search Ranking)? Kasneci et
al ICDE 2008
TagBooster (User Study on Social
Tagging)? Suchanek et al. CIKM 2008
YAGO
ESTER (Semantic Search Full Text Search)? Bast
et al. SIGIR 2007
Projects by other people
54
YAGO Relations
establishedOnDate isMarriedTo hasPopulation hasHei
ght hasWeight hasInflation actedIn ...
is a familyName givenName bornOnDate diedOnDate bo
rnIn diedIn locatedIn
100 relations
55
19,000,000
YAGO Size
3,000,000
30,000 60,000 200,000 300,000
KnowItAll SUMO WordNet OpenCyc Cyc
Yago
Publicly available ontologies with a quality
guarantee. Size is not correlated with usefulness.
56
YAGO Model
Axioms (x, is_a, y)? (y, subclass, z)? gt (x,
is_a, z)? ...
person
subclass
saint
is a
is a
57
YAGO Model
finite, unique
f1, f2, f3, f4, f5, f6, f7, f8, f9, f10
Axioms (x, is_a, y)? (y, subclass, z)? gt (x,
is_a, z)? ...
derive facts
f1, f2, f3, f4, f5
Eliminate facts
f1, f2, f3
finite, unique
Suchanek et al. WWW 2007
58
YAGO Knowledge Representation
OWL Full
RDFS
YAGO
ADTs
Acyclicity Datatypes
Reification
subClassOf
Transitivity
Property Restrictions
OWL DL
59
SOFIE rules!
occurs(P,WX,WY) /\ refersTo(WX.X) /\
refersTo(WY,Y) /\ R(X,Y) gt expresses(P,R)
occurs(P,WX,WY) /\ expressed(P,R) /\
refersTo(WX.X) /\ refersTo(WY,Y) /\
range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\
type(Y,D2) gt R(X,Y)
R(X,Y) /\ R(X,Z) /\ type(R,functionalRelation)
gt Y Z
disambiguationPrior(W,X) gt refersTo(W,X)
? R(X,Y)
relation-dependent rules
bornInYear(X,B) /\ diedInYear(X,D) gt BltD
60
SOFIE Clause transformation
Rules
r(X,Y) /\ s(X,Y) gt t(X,X) u(a)
Entities a,b
Grounded Rules
Clauses
r(a,a) /\ s(a,a) gt t(a,a) r(a,b) /\ s(a,b) gt
t(a,a) r(b,a) /\ s(b,a) gt t(b,b) r(b,b) /\
s(b,b) gt t(b,b) u(a)
? r(a,a) \/ ? s(a,a) \/ t(a,a) ? r(a,b) \/ ?
s(a,b) \/ t(a,a) ? r(b,a) \/ ? s(b,a) \/ t(b,b) ?
r(b,b) \/ ? s(b,b) \/ t(b,b) u(a)
61
SOFIE Clause transformation
Clauses
Textual Facts
1
? r(a,a) \/ ? s(a,a) \/ t(a,a) ? r(a,b) \/ ?
s(a,b) \/ t(a,a) ? r(b,a) \/ ? s(b,a) \/ t(b,b) ?
r(b,b) \/ ? s(b,b) \/ t(b,b) u(a)
r(a,a) w1 r(a,b) w2 r(b,a) w3 r(b,b)
w4
YAGO
s(a,a)
62
SOFIE Clause weighting
Clauses
Textual Facts
? 1 \/ ? 1 \/ t(a,a) w1 ? 1
\/ ? s(a,b) \/ t(a,a) w2 ? 1 \/ ? s(b,a)
\/ t(b,b) w3 ? 1 \/ ? s(b,b) \/ t(b,b)
w4 u(a) W
r(a,a) w1 r(a,b) w2 r(b,a) w3 r(b,b)
w4
YAGO
s(a,a)
63
SOFIE Hypothesis generation
Textual Facts
Rules
r(a,b) w1
r(X,Y) /\ s(X,Y) gt t(X,X)
Hypotheses
t(a,a) t(b,b)
64
SOFIE Hypothesis generation
Grounded Rules
Rules
r(a,a) /\ s(a,a) gt t(a,a) r(a,b) /\ s(a,b) gt
t(a,a)
r(X,Y) /\ s(X,Y) gt t(X,X)
Hypotheses
t(a,a)
65
SOFIE Functional MAX SAT Algorithm
The functional MAX SAT Algorithm considers only
unit clauses.
Variables
Clauses
0
X Y Z
?X \/ ?Z w1 ?X \/ ?Y w1 ?Y \/ ?Z
w1 Z w1
0
1
66
SOFIE Experiments
Corpus Type Docs Rel Time Facts Precision Recall
Wikipedia toy corpus structured 100 3 8min 165 100 98
Wikipedia toy corpus semi-structured 50 infoboxes removed 100 3 8min 165 100 57
Wikipedia subcorpus semi-structured 2000 15 15h 505 94 ?
News article toy corpus unstructured 150 1 24min 35, 46 91 24, 31
Snowball Snowball Snowball Snowball Snowball 65 56 31
Biographies from Web unstructured 3440 5 15h 744 90 ?
67
SOFIE Large-Scale Experiment
Goal Extract bornIn, bornOnDate, diedIn,
diedOnDate, politicianOf
Corpus 3700 biography documents downloaded from
the Web
Results (precision in )
Runtime (summed over 5 batches)
Parsing 705h Hypothesis Generation 615h Sol
ving 230h Total 1550h
87 87 13 98 95
? 90
bornIn bornOnD diedIn diedOnD polOf
68
SOFIE Relation to Markov Logic
Number of satisfied instances of the ith formula
Weight of the ith formula
r(x,y) /\ s(x,z) gt t(x,z) w ...
P(X) ? e sat(i,X) wi
max X ? e sat(i,X) wi
P
max X log( ? e sat(i,X) wi )
max X ? sat(i,X) wi
false true
bornIn(Nicholas, Patras)
gt Weighted MAX SAT problem
69
LEILA Workflow
Fix one relation, e.g. foundedInYear
The UDS was founded in 1948. The UDS has 1974
employees. The MPII has 1'000 employees. The
MPI-SWS was founded in 2004. The MPI-SWS has
2'003 employees.
10 2
X was founded in Y
Examples UDS 1948 UDS 1949 UDS 1950 ... MPII
1988 MPII 1989 MPII 1990 ...
X has Y employees
3 20
foundedIn(MPI-SWS, 2004)?
70
LEILA Theoretical considerations
THEOREM Goodnaturedness As the number
of parsed sentences increases, the
probability of false extractions
decreases. Intuition One of two cases
applies 1. A pattern occurs very frequently.
Then it is unlikely to be mistaken for a good
pattern 2. A pattern occurs very infrequently.
Then it does not matter if it is mistaken for a
good pattern.
Suchanek et al. KDD 2006
71
LEILA The Linguistic Part
X was founded in Y
The MPI-SWS was founded in 2004.
foundedIn(MPI-SWS, 2004)
72
LEILA The Linguistic Part
X was founded in Y
The MPI-SWS, the great institution, was founded
in 2004.
foundedIn(MPI-SWS, 2004)
73
LEILA The Linguistic Part
X was founded in Y
The MPI-SWS, the great institution, was founded
in 2004.
foundedIn(MPI-SWS, 2004)
74
Future Work With YAGO
? personalize (Shady, Maya) ? use social
networks to extend YAGO (Maya, Sharat, Ashwin) ?
make YAGO multilingual (Gerard) ? add Web
services (Nicoleta) ? make querying efficient
(Gjergji) ? store YAGO efficiently (Thomas) ?
make reasoning efficient (Mauro,Martin) ?
provide good visualization (Shady) ? add a
temporal component to SOFIE ? add biomedical
knowledge (Alessandro Fiori) ? add multimodal
support (Martin Schreiber) ? add natural
language support (help from workshop on Monday)
(slide by Prof. Gerhard Weikum)
75
Future Work Beyond YAGO
? join forces with other ontology projects ?
learn not just facts, but also relations ? apply
the SOFIE approach in related settings
(information extraction with music or pictures?)
76
Acknowledgements
The following people have worked
with me LEILA Georgiana Ifrim and Gerhard
Weikum YAGO Gjergji Kasneci and Gerhard
Weikum SOFIE Mauro Sozio and Gerhard
Weikum TagBooster Milan Vojnovic and Dinan
Gunawardena NAGA Gjergji Kasneci, Georgiana
Ifrim, Shady Elbassuoni and Gerhard Weikum ESTER
Holger Bast, Ingmar Weber and Alex
Chitea YAGOSUMO Gerard de Melo and Adam
Pease STAR Gjergji, Mauro, Maya Ramanath and
Gerhard
Thank you for making these projects possible!
Write a Comment
User Comments (0)
About PowerShow.com