Title: EntityAuthority Semantically Enriched GraphBased Authority Propagation
1EntityAuthoritySemantically Enriched Graph-Based
Authority Propagation
- Julia Stoyanovich Columbia University
- Srikanta Bedathur
- Klaus Berberich
- Gerhard Weikum
Max-Planck Institute for Informatics
2Query NBA Team
Date June 5, 2007
3Query NBA Team
Date June 5, 2007
4Query NBA Team
Date June 5, 2007
5What is the Problem?
- No NBA team homepages appear at high ranks (with
one exception) - User cannot tell what the NBA teams are by
looking at titles and snippets at high ranks - Portals like ESPN.com, NBA.com dominate high
ranks
6The Knowledge Soup
7The Knowledge Soup
YAGO Suchanek et al, WWW07 GATE Cunningham et
al, ACL02
8The Knowledge Soup
YAGO Suchanek et al, WWW07 GATE Cunningham et
al, ACL02
9Central Idea
- Authority is based on hyperlinks, but reflects
quality of information. Pages are collections of
semantic entities! - To improve ranking, exploit mutual reinforcement
between pages and entities. -
- Can construct a graph that includes pages,
entities, and ontological concepts a richer
substrate for authority propagation.
10Our Contributions
- Generalized Data Graph (GDG) data model
- Several models of authority propagation on GDG,
most notably EVA (Entity deriVed Authority)
inspired by HITS but with richer semantics - Prototype implementation that combines
information extraction with a rich ontology - Experimental results on Wikipedia
11Related Work
- PageRank, HITS the classics simple directed
graph models of the Web - ObjectRank also HITS-inspired, but developed
with DB data in mind - PopRank combines the OR graph with PageRank
values (similar to PIA one of the options we
consider) - EntityRank focuses on frequency-based content
strength, not on the graph structure of the Web
embedded entities
12PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
13PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
14PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
15PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
16PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Staford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
17PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
18PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
InfoUnits
UW
Alon Levy
Stanford
Stanford University
Alon Halevy
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
OntoGraph
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
19PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
20PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Generalized Data Graph
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
21PAGES
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
1.0
0.5
0.5
0.33
0.5
0.5
0.5
0.33
Generalized Data Graph
1.0
1.0
0.33
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
CONCEPTS ENTITIES
entity
22Authority Propagation How?
- PIA (Page-Inherited Authority)
- Compute authority on pages, propagate to
entities - AP(x) ? AP(x) ? w (p ? x)
- UTA (UnTyped Authority)
- Compute authority on the entire data graph
-
- Can we do any better?
p ? P(x)
23Entity Derived Authority (EVA)
24Entity Derived Authority (EVA)
25Entity Derived Authority (EVA)
26Entity Derived Authority (EVA)
27Entity Derived Authority (EVA)
28Authority Propagation Where?
- On the entire Generalized Data Graph
- vs.
- On the query-relevant sub-graph of GDG,
- Query Result Graph
- select a sub-set of relevant nodes (with scores)
- expand by successors and predecessors, possibly
several levels - re-scale edge weights
29Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
30Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
31Query Stanford
ACM DL page
Google Scholar page
Project page
DBLP page x
Stanford.edu
Person 1 homepage
Person 2 homepage
Alon Halevy
founded
Leland Stanford
Stanford University
University of Wisconsin
University of Washington
computer scientist
spin-off
Google
scientist
company
university
person
organization
entity
32Query Processing
- Keyword queries, evaluated as conjunctions
- Pages TF/IDF score of page content
- Entities TF/IDF of YAGO thematic neighborhood
33Query Processing
- Keyword queries, evaluated as conjunctions
- Pages TF/IDF score of page content
- Entities TF/IDF of YAGO thematic neighborhood
Serbian basketball players
Olympic competitors for Serbia
LA Lakers players
Vlade Divac
34Query Processing
- Keyword queries, evaluated as conjunctions
- Pages TF/IDF score of page content
- Entities TF/IDF of YAGO thematic neighborhood
Serbian basketball players
Olympic competitors for Serbia
LA Lakers players
Vlade Divac
Queries Olympic basketball competitors Serbia
n LA Lakers players
35Experimental Evaluation
- Test bed
- two thematic slices of Wikipedia Serbia and
basketball, comparable in size - 7800 articles, 1.2M InfoUnits, 240K entities
- Queries
- 20 total, 10 per slice, 1-6 words
- e.g. lake, living writer prize winner, NBA venue,
African American basketball player Olympic
competitor
36Experimental Evaluation Metrics
- Simple goodness metric (between 0 and 2)
- Results from different methods pooled, each
evaluated by 2 people, scores averaged - Evaluation based on
- Precision (avg(goodness) gt 0.5)
- Recall (w.r.t. pooled ideal)
- Discounted Cumulative Gain (DCG)
- Normalized Discounted Cumulative Gain (NDCG)
- (Järvelin, Kekäläinen, TOIS02)
37Results Ranking on Pages
38Results Ranking on Entities
39Observations
- Highly-ranked entities consistently and
significantly outperform highly-ranked pages,
w.r.t. all metrics. - EVA significantly outperforms other methods
w.r.t. highly ranked pages, for all metrics. - No conclusion can be drawn about the relative
performance of ranking methods w.r.t. entities - Likely because of relatively small slices, no
inter-entity edges in the GDG. Both are ongoing
work.
40Relative Performance of Ranking Example
- Slice Serbia, query basketball
- Top-20 pages
- PageRank 1977, Greece, Belgrade
- UTA August 2004 in Sports
- EVA Basketball in Yugoslavia, Vlade Divac
- Top-20 entities (various methods)
- Michael Jordan, LA Lakers, Vlade Divac,
Predrag Danilovic etc.
41Conclusions and Future Work
- Conclusions
- Mutual reinforcement between pages and entities
improves ranking! - A rich ontology (e.g. YAGO), can be used for
query processing in this setting - Future work
- Incorporate inter-entity edges into the GDG
- Extensive experimental evaluation
- Scaling up the framework
- Evaluating on a corpus other than Wikipedia
42(No Transcript)
43PAGES
ACM DL page
Google Scholar page
Lab page
Project page
University page
DBLP page x
DBLP page y
Person 1 homepage
EnrichedWebGraph
Person 2 homepage
DBLP page z
UW
InfoUnits
VLDB
Stanford
Berkeley
DeWitt
Alon Levy
Stanford University
Google
Alon Halevy
UC Berkeley
founded
advisor
George Berkeley
Leland Stanford
Alon Halevy
Joseph M. Hellerstein
David J. DeWitt
student
computer scientist
philosopher
Stanford University
University of Wisconsin
University of Washington
spin off
scientist
OntoGraph
UC Berkeley
located in
Berkeley
Stanford
Google
Bay Area
Wisconsin
Seattle
company
university
organization
CONCEPTS ENTITIES
person
location
entity
44Data Model Generalized Data Graph
- Typed nodes pages, IntoUnits, onto entities and
concepts - Typed and weighted edges
- Hyperlinks (normalized by out-degree)
- Extraction edges (pages -gt InfoUnits), weighted
by confidence of extraction - Mapping edges (InfoUnits -gt onto entities/
concepts), weighted by confidence in mapping - Direct page lt-gt onto entity/concept