Title: AnHai Doan
1Schema Ontology Matching Current Research
Directions
- AnHai Doan
- Database and Information System Group
- University of Illinois, Urbana Champaign
- Spring 2004
2Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
3Motivation Data Integration
Find houses with 2 bedrooms priced under 200K
New faculty member
homes.com
realestate.com
homeseekers.com
4Architecture of Data Integration System
Find houses with 2 bedrooms priced under 200K
mediated schema
source schema 2
source schema 3
source schema 1
homes.com
realestate.com
homeseekers.com
5Semantic Matches between Schemas
Mediated-schema
price agent-name address
1-1 match
complex match
homes.com
listed-price contact-name city
state
320K Jane Brown Seattle
WA 240K Mike Smith Miami
FL
6Schema Matching is Ubiquitous!
- Fundamental problem in numerous applications
- Databases
- data integration
- data translation
- schema/view integration
- data warehousing
- semantic query processing
- model management
- peer data management
- AI
- knowledge bases, ontology merging, information
gathering agents, ... - Web
- e-commerce
- marking up data using ontologies (e.g., on
Semantic Web)
7Why Schema Matching is Difficult
- Schema data never fully capture semantics!
- not adequately documented
- schema creator has retired to Florida!
- Must rely on clues in schema data
- using names, structures, types, data values, etc.
- Such clues can be unreliable
- same names gt different entities area gt
location or square-feet - different names gt same entity area
address gt location - Intended semantics can be subjective
- house-style house-description?
- military applications require committees to
decide! - Cannot be fully automated, needs user feedback!
8Current State of Affairs
- Finding semantic mappings is now a key
bottleneck! - largely done by hand
- labor intensive error prone
- data integration at GTE LiClifton, 2000
- 40 databases, 27000 elements, estimated time 12
years - Will only be exacerbated
- data sharing becomes pervasive
- translation of legacy data
- Need semi-automatic approaches to scale up!
- Many research projects in the past few years
- Databases IBM Almaden, Microsoft Research, BYU,
George Mason, U of Leipzig, U
Wisconsin, NCSU, UIUC, Washington, ... - AI Stanford, Karlsruhe University, NEC Japan,
...
9Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
10LSD
- Learning Source Description
- Developed at Univ of Washington 2000-2001
- with Pedro Domingos and Alon Halevy
- Designed for data integration settings
- has been adapted to several other contexts
- Desirable characteristics
- learn from previous matching activities
- exploit multiple types of information in schema
and data - incorporate domain integrity constraints
- handle user feedback
- achieves high matching accuracy (66 -- 97) on
real-world data
11Schema Matching for Data Integrationthe LSD
Approach
- Suppose user wants to integrate 100 data
sources - 1. User
- manually creates matches for a few sources, say 3
- shows LSD these matches
- 2. LSD learns from the matches
- 3. LSD predicts matches for remaining 97 sources
12Learning from the Manual Matches
Mediated schema
price agent-name agent-phone
office-phone description
If office occurs in name gt office-phone
listed-price contact-name contact-phone
office comments
Schema of realestate.com
realestate.com
listed-price contact-name contact-phone
office comments
250K James Smith (305) 729 0831
(305) 616 1822 Fantastic house 320K
Mike Doan (617) 253 1429 (617) 112
2315 Great location
If fantastic great occur frequently in
data instances gt description
homes.com
sold-at contact-agent extra-info
350K (206) 634 9435 Beautiful yard
230K (617) 335 4243 Close to
Seattle
13Must Exploit Multiple Types of Information!
Mediated schema
price agent-name agent-phone
office-phone description
If office occurs in name gt office-phone
listed-price contact-name contact-phone
office comments
Schema of realestate.com
realestate.com
listed-price contact-name contact-phone
office comments
250K James Smith (305) 729 0831
(305) 616 1822 Fantastic house 320K
Mike Doan (617) 253 1429 (617) 112
2315 Great location
If fantastic great occur frequently in
data instances gt description
homes.com
sold-at contact-agent extra-info
350K (206) 634 9435 Beautiful yard
230K (617) 335 4243 Close to
Seattle
14Multi-Strategy Learning
- Use a set of base learners
- each exploits well certain types of information
- To match a schema element of a new source
- apply base learners
- combine their predictions using a meta-learner
- Meta-learner
- uses training sources to measure base learner
accuracy - weighs each learner based on its accuracy
15Base Learners
- Training
- Matching
- Name Learner
- training (location, address)
(contact name, name)
- matching agent-name gt (name,0.7),(phone,0
.3) - Naive Bayes Learner
- training (Seattle, WA,address)
(250K,price) - matching Kent, WA gt
(address,0.8),(name,0.2)
labels weighted by confidence score
X
16The LSD Architecture
Matching Phase
Training Phase
Mediated schema
Source schemas
Training data for base learners
Base-Learner1 .... Base-Learnerk
Meta-Learner
Base-Learner1
Base-Learnerk
Predictions for instances
Hypothesis1
Hypothesisk
Prediction Combiner
Domain constraints
Predictions for elements
Constraint Handler
Weights for Base Learners
Meta-Learner
Mappings
17Training the Base Learners
Mediated schema
address price agent-name agent-phone
office-phone description
realestate.com
location price contact-name
contact-phone office
comments
Miami, FL 250K James Smith (305) 729
0831 (305) 616 1822 Fantastic house Boston,
MA 320K Mike Doan (617) 253 1429 (617)
112 2315 Great location
18Meta-Learner StackingWolpert 92,TingWitten99
- Training
- uses training data to learn weights
- one for each (base-learner,mediated-schema
element) pair - weight (Name-Learner,address) 0.2
- weight (Naive-Bayes,address) 0.8
- Matching combine predictions of base learners
- computes weighted average of base-learner
confidence scores
area
Name Learner Naive Bayes
(address,0.4) (address,0.9)
Seattle, WA Kent, WA Bend, OR
Meta-Learner
(address, 0.40.2 0.90.8 0.8)
19The LSD Architecture
Matching Phase
Training Phase
Mediated schema
Source schemas
Training data for base learners
Base-Learner1 .... Base-Learnerk
Meta-Learner
Base-Learner1
Base-Learnerk
Predictions for instances
Hypothesis1
Hypothesisk
Prediction Combiner
Domain constraints
Predictions for elements
Constraint Handler
Weights for Base Learners
Meta-Learner
Mappings
20Applying the Learners
homes.com schema
area sold-at contact-agent
extra-info
area
Name Learner Naive Bayes
(address,0.8), (description,0.2) (address,0.6),
(description,0.4) (address,0.7), (description,0.3)
Meta-Learner
Seattle, WA Kent, WA Bend, OR
Name Learner Naive Bayes
Meta-Learner
Prediction-Combiner
(address,0.7), (description,0.3)
homes.com
sold-at
(price,0.9), (agent-phone,0.1)
contact-agent
(agent-phone,0.9), (description,0.1)
extra-info
(address,0.6), (description,0.4)
21Domain Constraints
- Encode user knowledge about domain
- Specified only once, by examining mediated schema
- Examples
- at most one source-schema element can match
address - if a source-schema element matches house-id then
it is a key - avg-value(price) gt avg-value(num-baths)
- Given a mapping combination
- can verify if it satisfies a given constraint
area address sold-at
price contact-agent agent-phone extra-info
address
22The Constraint Handler
Predictions from Prediction Combiner
Domain Constraints At most one element matches
address
area (address,0.7),
(description,0.3) sold-at
(price,0.9), (agent-phone,0.1) contact-agent
(agent-phone,0.9), (description,0.1) extra-info
(address,0.6), (description,0.4)
0.3 0.1 0.1 0.4 0.0012
0.7 0.9 0.9 0.4 0.2268
area address sold-at
price contact-agent agent-phone extra-info
description
0.7 0.9 0.9 0.6 0.3402
area address sold-at
price contact-agent agent-phone extra-info
address
- Searches space of mapping combinations
efficiently - Can handle arbitrary constraints
- Also used to incorporate user feedback
- sold-at does not match price
23The Current LSD System
- Can also handle data in XML format
- matches XML DTDs
- Base learners
- Naive Bayes DudaHart-93, DomingosPazzani-97
- exploits frequencies of words symbols
- WHIRL Nearest-Neighbor Classifier CohenHirsh
KDD-98 - employs information-retrieval similarity metric
- Name Learner SIGMOD-01
- matches elements based on their names
- County-Name Recognizer SIGMOD-01
- stores all U.S. county names
- XML Learner SIGMOD-01
- exploits hierarchical structure of XML data
24Empirical Evaluation
- Four domains
- Real Estate I II, Course Offerings, Faculty
Listings - For each domain
- created mediated schema domain constraints
- chose five sources
- extracted converted data into XML
- mediated schemas 14 - 66 elements, source
schemas 13 - 48
- Ten runs for each domain, in each run
- manually provided 1-1 matches for 3 sources
- asked LSD to propose matches for remaining 2
sources - accuracy of 1-1 matches correctly identified
25High Matching Accuracy
Average Matching Acccuracy ()
LSDs accuracy 71 - 92
Best single base learner 42 - 72
Meta-learner 5 - 22
Constraint handler 7 - 13 XML
learner 0.8 - 6
26Contribution of Schema vs. Data
Average matching accuracy ()
- LSD with only schema info.
- LSD with only data info.
- Complete LSD
More experiments in Doan et al. SIGMOD-01
27LSD Summary
- LSD
- learns from previous matching activities
- exploits multiple types of information
- by employing multi-strategy learning
- incorporates domain constraints user feedback
- achieves high matching accuracy
- LSD focuses on 1-1 matches
- Next challenge discover more complex matches!
- iMAP (illinois Mapping) system SIGMOD-04
- developed at Washington and Illinois, 2002-2004
- with Robin Dhamanka, Yoonkyong Lee, Alon Halevy,
Pedro Domingos
28The iMAP Approach
Mediated-schema
price num-baths address
homes.com
listed-price agent-id full-baths
half-baths city zipcode
- For each mediated-schema element
- searches space of all matches
- finds a small set of likely match candidates
- uses LSD to evaluate them
- To search efficiently
- employs a specialized searcher for each element
type - Text Searcher, Numeric Searcher, Category
Searcher, ...
29The iMAP Architecture SIGMOD-04
Source schema data
Mediated schema
Searcherk
Searcher2
Searcher1
Match candidates
Explanation module
Base-Learner1 .... Base-Learnerk
Domainknowledge and data
Meta-Learner
Similarity Matrix
User
Match selector
1-1 and complex matches
30An Example Text Searcher
- Beam search in space of all concatenation matches
- Example find match candidates for address
Mediated-schema
price num-baths address
homes.com
listed-price agent-id full-baths
half-baths city zipcode
320K 532a 2
1 Seattle 98105 240K
115c 1 1
Miami 23591
concat(agent-id,zipcode)
concat(city,zipcode)
concat(agent-id,city)
532a 98105 115c 23591
Seattle 98105 Miami 23591
532a Seattle 115c Miami
- Best match candidates for address
- (agent-id,0.7), (concat(agent-id,city),0.75),
(concat(city,zipcode),0.9)
31Empirical Evaluation
- Current iMAP system
- 12 searchers
- Four real-world domains
- real estate, product inventory, cricket,
financial wizard - target schema 19 -- 42 elements, source schema
32 -- 44 - Accuracy 43 -- 92
- Sample discovered matches
- agent-name concat(first-name,last-name)
- area building-area / 43560
- discount-cost (unit-price quantity) (1 -
discount) - More detail in Dhamanka et. al. SIGMOD-04
32Observations
- Finding complex matches much harder than 1-1
matches! - require gluing together many components
- e.g., num-rooms bath-rooms bed-rooms
dining-rooms living-rooms - if missing one component gt incorrect match
- However, even partial matches are already very
useful! - so are top-k matches gt need methods to handle
partial/top-k matches - Huge/infinite search spaces
- domain knowledge plays a crucial role!
- Matches are fairly complex, hard to know if they
are correct - must be able to explain matches
- Human must be fairly active in the loop
- need strong user interaction facilities
- Break matching architecture into multiple
"atomic" boxes!
33Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
34Finding Matches is only Half of the Job!
- To translate data/queries, need mappings, not
matches
Schema S
Schema T
HOUSES
location price () agent-id Atlanta,
GA 360,000 32 Raleigh, NC 430,000
15
LISTINGS
area list-price agent-address
agent-name Denver, CO 550,000 Boulder, CO
Laura Smith Atlanta, GA 370,800
Athens, GA Mike Brown
AGENTS
id name city state
fee-rate 32 Mike Brown Athens GA
0.03 15 Jean Laup Raleigh
NC 0.04
- Mappings
- area SELECT location FROM
HOUSES - agent-address SELECT concat(city,state) FROM
AGENTS - list-price price (1 fee-rate)
FROM HOUSES,
AGENTS WHERE
agent-id id
35Clio Elaborating Matches into Mappings
- Developed at Univ of Toronto IBM Almaden,
2000-2003 - by Renee Miller, Laura Haas, Mauricio Hernandez,
Lucian Popa, Howard Ho, Ling Yan, Ron Fagin - Given a match
- list-price price (1 fee-rate)
- Refine it into a mapping
- list-price SELECT price (1 fee-rate)
FROM HOUSES (FULL OUTER JOIN)
AGENTS WHERE agent-id id - Need to discover
- the correct join path among tables, e.g.,
agent-id id - the correct join, e.g., full outer join? inner
join? - Use heuristics to decide
- when in doubt, ask users
- employ sophisticated user interaction methods
VLDB-00, SIGMOD-01
36Clio Illustrating Examples
Schema S
Schema T
HOUSES
location price () agent-id Atlanta,
GA 360,000 32 Raleigh, NC 430,000
15
LISTINGS
area list-price agent-address
agent-name Denver, CO 550,000 Boulder, CO
Laura Smith Atlanta, GA 370,800
Athens, GA Mike Brown
AGENTS
id name city state
fee-rate 32 Mike Brown Athens GA
0.03 15 Jean Laup Raleigh
NC 0.04
- Mappings
- area SELECT location FROM
HOUSES - agent-address SELECT concat(city,state) FROM
AGENTS - list-price price (1 fee-rate)
FROM HOUSES,
AGENTS WHERE
agent-id id
37Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
38Broader Picture Find Matches
Single learner Exploit data 1-1 matches
Hand-crafted rules Exploit schema 1-1 matches
TRANSCM MiloZohar98 ARTEMIS
CastanoAntonellis99
Palopoli et al. 98 CUPID Madhavan et al.
01
SEMINT LiClifton94 ILA PerkowitzEtzioni95 DE
LTA Clifton et al. 97 AutoMatch, Autoplex
Berlin Motro, 01-03
Learners rules, use multi-strategy
learning Exploit schema data 1-1 complex
matches Exploit domain constraints
Other Important Works
COMA by Erhard Rahm group David Embley group at
BYU Jaewoo Kang group at NCSU Kevin Chang group
at UIUC Clement Yu group at UIC
LSD Doan et al., SIGMOD-01 iMAP Dhamanka et.
al., SIGMOD-04
More about some of these works soon ....
39Broader Picture From Matches to Mappings
Learners rules Exploit schema data 1-1
complex matches Automate as much as possible
Rules Exploit data Powerful user interaction
CLIO Miller et. al., 00 Yan et al.
01
iMAP Dhamanka et al., SIGMOD-04
?
40Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
41Ontology Matching
- Increasingly critical for
- knowledge bases, Semantic Web
- An ontology
- concepts organized into a taxonomy tree
- each concept has
- a set of attributes
- a set of instances
- relations among concepts
- Matching
- concepts
- attributes
- relations
CS Dept. US
Entity
Undergrad Courses
Grad Courses
People
Staff
Faculty
Assistant Professor
Associate Professor
Professor
name Mike Burns degree Ph.D.
42Matching Taxonomies of Concepts
CS Dept. Australia
Entity
Courses
Staff
Technical Staff
Academic Staff
Senior Lecturer
Lecturer
Professor
43Glue
- Solution
- Use data instances extensively
- Learn classifiers using information within
taxonomies - Use a rich constraint satisfaction scheme
- Doan, Madhavan, Domingos, Halevy WWW2002
44Concept Similarity
Concept S
Concept A
Hypothetical universe of all examples
Joint Probability Distribution
P(A,S),P(?A,S),P(A,?S),P(?A,?S)
- Multiple Similarity measures in terms of the JPD
45Machine Learning for Computing Similarities
Taxonomy 1
Taxonomy 2
- JPD estimated by counting the sizes of the
partitions
46The Glue System
Matches for O1 , Matches for O2
Relaxation Labeling
Similarity Matrix
Common Knowledge Domain Constraints
Similarity Estimator
Joint Probability Distribution P(A,B), P(A, B)
Similarity Function
Meta Learner
Distribution Estimator
Base Learner
Base Learner
Taxonomy O1 (tree structure data instances)
Taxonomy O2 (tree structure data instances)
47Constraints in Taxonomy Matching
- Domain-dependent
- at most one node matches department-chair
- a node that matches professor can not be a child
of a node that matches assistant-professor - Domain-independent
- two nodes match if parents children match
- if all children of X matches Y, then X also
matches Y - Variations have been exploited in many restricted
settingsMelnikGarcia-Molina,ICDE-02,
MiloZohar,VLDB-98,Noy et al., IJCAI-01,
Madhavan et al., VLDB-01 - Challenge find a general efficient approach
48Solution Relaxation Labeling
- Relaxation labeling HummelZucker, 83
- applied to graph labeling in vision, NLP,
hypertext classification - finds best label assignment, given a set of
constraints - starts with initial label assignment
- iteratively improves labels, using constraints
- Standard relax. labeling not applicable
- extended it in many ways Doan et al., W W W-02
49Real World Experiments
- Taxonomies on the web
- University organization (UW and Cornell)
- Colleges, departments and sub-fields
- Companies (Yahoo and The Standard)
- Industries and Sectors
- For each taxonomy
- Extract data instances course descriptions,
company profiles - Trivial data cleaning
- 100 300 concepts per taxonomy
- 3-4 depth of taxonomies
- 10-90 data instances per concept
- Evaluation against manual mappings as the gold
standard
50Glues Performance
University Depts 1
Company Profiles
University Depts 2
51Broader Picture
- Ontology matching parallels the development of
schema matching - rule-based learning-based approaches
- PROMPT family, OntoMorph, OntoMerge, Chimaera,
Onion, OBSERVER, FCAMerge, ... - extensive work by Ed Hovy's group
- ontology versioning (e.g., by Noy et. al.)
- More powerful user interaction methods
- e.g., iPROMPT, Chimaera
- Much more theoretical works in this area
52Road Map
- Schema Matching
- motivation problem definition
- representative current solutions LSD, iMAP, Clio
- broader picture
- Ontology Matching
- motivation problem definition
- representative current solution GLUE
- broader picture
- Conclusions Emerging Directions
53Develop the Theoretical Foundation
- Not much is going on, however ...
- see works by Alon Halevy (AAAI-02) and Phil
Bernstein (in model management contexts) - some preliminary work in AnHai Doan's Ph.D.
dissertation - work by Stuart Russell and other AI people on
identity uncertainty is potentially relevant - Most likely foundation
- probability framework
54Need Much More Domain Knowledge
- Where to get it?
- past matches (e.g., LSD, iMAP)
- other schemas in the domain
- holistic matching approach by Kevin Chang group
SIGMOD-02 - corpus-based matching by Alon Halevy group
IJCAI-03 - clustering to achieve bridging effects by Clement
Yu group SIGMOD-04 - external data (e.g., iMAP at SIGMOD-04)
- mass of users (e.g., MOBS at WebDB-03)
- How to get it and how to use it?
- no clear answer yet
55Employ Multi-Module Architecture
- Many "black boxes", each is good at doing a
single thing - Combine them and tailor them to each application
- Examples
- LSD, iMAP, COMA, David Embley's systems
- Open issues
- what are these back boxes?
- how to build them?
- how to combine them?
56Powerful User Interaction
- Minimize user effort, maximize its impact
- Make it very easy for users to
- supply domain knowledge
- provide feedback on matches/mappings
- Develop powerful explanation facilities
57Other Issues
- What to do with partial/top-k matches?
- Meaning negotiation
- Fortifying schemas for interoperability
- Very-large-scale matching scenarios (e.g., the
Web) - What can we do without the mappings?
- Interaction between schema matching and tuple
matching? - Benchmarks, tools?
58Summary
- Schema/ontology matching key to
numerous data management problems - much attention in the database, AI, Semantic Web
communities - Simple problem definition, yet very difficult to
do - no satisfactory solution yet
- AI complete?
- We now understand the problems much better
- still at the beginning of the journey
- will need techniques from multiple fields
59Backup Slides
60Backup Slides
61Training the Meta-Learner
Name Learner
Naive Bayes
True Predictions
Extracted XML Instances
ltlocationgt Miami, FLlt/gt ltlisted-pricegt
250,000lt/gt ltareagt Seattle, WA lt/gt lthouse-addrgtKen
t, WAlt/gt ltnum-bathsgt3lt/gt ...
0.5 0.8
1 0.4
0.3 0 0.3
0.9 1
0.6 0.8
1 0.3
0.3 0 ...
... ...
Least-SquaresLinear Regression
Weight(Name-Learner,address)
0.1 Weight(Naive-Bayes,address) 0.9
62Sensitivity to Amount of Available Data
Average matching accuracy ()
Number of data listings per source (Real Estate I)
63Contribution of Each Component
Average Matching Acccuracy ()
Without Name Learner Without Naive Bayes Without
Whirl Learner Without Constraint Handler The
complete LSD system
64Exploiting Hierarchical Structure
- Existing learners flatten out all structures
- Developed XML learner
- similar to the Naive Bayes learner
- input instance bag of tokens
- differs in one crucial aspect
- consider not only text tokens, but also structure
tokens
ltcontactgt ltnamegt Gail Murphy lt/namegt ltfirmgt
MAX Realtors lt/firmgt lt/contactgt
ltdescriptiongt Victorian house with a view.
Name your price! To see it, contact Gail
Murphy at MAX Realtors. lt/descriptiongt
65Reasons for Incorrect Matchings
- Unfamiliarity
- suburb
- solution add a suburb-name recognizer
- Insufficient information
- correctly identified general type, failed to
pinpoint exact type - agent-name phoneRichard Smith
(206) 234 5412 - solution add a proximity learner
- Subjectivity
- house-style description?Victorian
Beautiful neo-gothic houseMexican
Great location
66Evaluate Mapping Candidates
- For address, Text Searcher returns
- (agent-id,0.7)
- (concat(agent-id,city),0.8)
- (concat(city,zipcode),0.75)
- Employ multi-strategy learning to evaluate
mappings - Example (concat(agent-id,city),0.8)
- Naive Bayes Learner 0.8
- Name Learner address vs. agent id city 0.3
- Meta-Learner 0.8 0.7 0.3 0.3 0.65
- Meta-Learner returns
- (agent-id,0.59)
- (concat(agent-id,city),0.65)
- (concat(city,zipcode),0.70)
67Relaxation Labeling
- Applied to similar problems in
- vision, NLP, hypertext classification
People
Dept U.S.
Dept Australia
Courses
Courses
Courses
Courses
People
Staff
Staff
Faculty
Tech. Staff
Acad. Staff
Staff
Faculty
68Relaxation Labeling for Taxonomy Matching
- Must define
- neighborhood of a node
- k features of neighborhood
- how to combine influence of features
-
- Algorithm
- init for each pair ltN,Lgt, compute
- loop for each pair ltN,Lgt, re-compute
Acad. Staff Faculty Tech. Staff Staff
Staff People
Neighborhood configuration
69Relaxation Labeling for Taxonomy Matching
- Huge number of neighborhood configurations!
- typically neighborhood immediate nodes
- here neighborhood can be entire graph100 nodes,
10 labels gt configurations - Solution
- label abstraction dynamic programming
- guarantee quadratic time for a broad range of
domain constraints - Empirical evaluation
- GLUE system Doan et. al., WWW-02
- three real-world domains
- 30 -- 300 nodes / taxonomy
- high accuracy 66 -- 97 vs. 52 -- 83 of best
base learner - relaxation labeling very fast, finished in
several seconds