Title: Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier
1An Domain Adaptive Approach to Automatic Acquisiti
on of Domain Relevant Terms and their Relations
with Bootstrapping
- Article by Feiyu Xu, Daniela Kurz, Jakub
Piskorski, Sven Schmeier - Article Summary by Mark Vickers
2Presentation Layout
- Introduction to Research
- Methods used
- Overview of GermaNet
- Overview of SPPC
- Details of their Approach
- Results
- Conclusion
-
3Goal
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
- Automatic Acquisition of Domain Relevant terms
and their relations - How?
- Single-word Terms TFIDF classification
- Domain Relevant Relations
- Use Lexico-syntactic patters
- Existing Ontologies
- Collocation methods
4Input
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
- No seed words
- No syntactic patterns
- Just a collection of classified documents
5Methods Used
Introduction
GermaNet
SPPC
Approach
Results
Conclusion
- Builds on Other Systems
- GermaNet
- (They built an Ontology Inference Machine
- to search GermaNet)
- For Accessing Semantic relations
- SPPC (Shallow Processing Production Center)
- For Linguistic Annotation
6Accessing Semantic RelationsGermaNet
Introduction
Methods Used
SPPC
Approach
Results
Conclusion
- Developed within the LSD Project at the Division
of Computational Linguistics of the Linguistics
Department at the University of Tübingen, Germany - A lexical-semantic net
- German nouns, verbs, and adjectives are
semantically grouped by an underlying
lexical concept (like a thesaurus) called
synsets - Synsets are connected by semantic relations
- Lexical relationships include synonyms,
antonyms, and pertains to - Conceptual relations include hyponyms (is-a),
meronyms (has-a), entailment, and cause - Based off the technology of WordNet (Princeton)
7Accessing Semantic Relations WordNet
Introduction
Methods Used
SPPC
Approach
Results
Conclusion
8Accessing Semantic Relations WordNet
Introduction
Methods Used
SPPC
Approach
Results
Conclusion
9Accessing Semantic Relations WordNet
Introduction
Methods Used
SPPC
Approach
Results
Conclusion
10Accessing Semantic RelationsInference Machine
Introduction
Methods Used
SPPC
Approach
Results
Conclusion
- Allows GermaNets relations to be searched by
other applications - Provides 3 different functions
- Retrieval of relations assigned to words
- Example Find all synonyms for the word bar ?
rod, saloon, - Retrieval of relations between words
- Example Find relations between
Internet-Service-Provider and Company ? hyponym
(so and ISP is a company) - Navigation in the GermaNet graph
-
11Linguistic AnnotationSPPC
Introduction
Methods Used
GermaNet
Approach
Results
Conclusion
- SPPC (Shallow Processing Production Center)
- Robust German NLP that uses cascaded optimized
weighted finite state devices - SPPC parts
- Tokenizer
- Lexical Processor
- Part-of-Speech Filtering
- Named-entity Finder
- Chunk recognizer
12Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
- Three Main components
- TFIDF-based single-word term classifier
- Lexico-syntactic pattern finder
- Learns patterns based on known relations
- Learns patterns based on term collocation methods
- Relation Extractor
13Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
1. Extract Single-word terms
Single-word term extraction (KFIDF)
2. Learn multi-word terms identify syntactic
patterns
3. Learn patterns from known relations
4. Extract related terms using found
lexico-syntactic patterns
14Discovering Domain Relevant Terms
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
- Apply a TFIDF measure KFIDF
15Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
Collocation learner
16Learning Term Collocations
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
- Examples man-eating shark, dead serious, depend
on, blue-collard - Measures
- Mutual Information (probabilities)
- Occurrence of one word predicts the occurrence of
another - Not practical for sparse data
- Log-Likelihood Measures (contingency tables)
- - Tells how much more likely the occurrence of
one pair is over the another - T-test
- - Accept or reject the null hypothesis (terms are
independent)
17Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
Results
Conclusion
Relation Extractor
18Introduction
Methods Used
Learning Relations withLexico-syntactic patterns
GermaNet
SPPC
Results
Conclusion
Example of a lexico-syntactic pattern finding
relations Pattern or other Sentence Bruises,
wounds, or other injuries are common. Hyponym
Relations (Bruises, Injuries), (Wounds,
Injuries) ---------------------------------------
---------------------------------------- Pattern
as well as Sentence Cocaine as well as
Hashish, and LSD Near synonyms? -- Now we can
match LSD to Drug domain
19Introduction
Methods Used
Learning Relations withLexico-syntactic patterns
GermaNet
SPPC
Results
Conclusion
Term relation extractor applies newly extracted
lecixo-syntactic patterns
Extracted terms
List of related terms with possible hyponymous
relations
GermaNet (semantic relationships)
Domain independent patterns
Domain specific patterns
With Near Synonyms search GermaNet to find
common hyponyms, then assign the newly found
hyponymous relation to the term not encode in the
GermaNet
Terms with semantic relations (synonymy,
hyponymy, meronymy)
Put semantically similar fragments Into
Landau-Finkelstien and Morins Algorithm to
cluster patterns
20Introduction
Methods Used
Results
GermaNet
SPPC
Approach
Conclusion
- Theres a correlation between corpus size and
precision - LogLike delivers best result compared to Mutual
Information And T-Test - Noun-Verb collocations were most prominent and
had best results - In Drug domain, N-V 56 precision and N-N
41 precision
21Introduction
Methods Used
Conclusion
GermaNet
SPPC
Approach
Results
- KFIDF proves promising for single-word term
extraction - Statistical measures are suitable for free-word
order languages like German - Extracting term relations useful for real-world
IE
22Introduction
Methods Used
My Evaluation
GermaNet
SPPC
Approach
Results
- Uses well known existing systems
- Seemingly no human interaction
- Domain Adaptive (robust)
- - Precision does not seem to be too impressive,
and recall? Id like to see more results - We see from the past few papers that automatic
ontology generation approaches consist of - Combining multiple strategies (statistics,
existing ontologies) - Have a cyclic, machine learning nature.