Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier

Description:

Learns patterns based on term collocation methods. Relation Extractor. Introduction ... Noun-Verb collocations were most prominent and had best results ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 23
Provided by: thegood6
Category:

less

Transcript and Presenter's Notes

Title: Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier


1
An Domain Adaptive Approach to Automatic Acquisiti
on of Domain Relevant Terms and their Relations
with Bootstrapping
  • Article by Feiyu Xu, Daniela Kurz, Jakub
    Piskorski, Sven Schmeier
  • Article Summary by Mark Vickers

2
Presentation Layout
  • Introduction to Research
  • Methods used
  • Overview of GermaNet
  • Overview of SPPC
  • Details of their Approach
  • Results
  • Conclusion

3
Goal
  • Introduction

Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
  • Automatic Acquisition of Domain Relevant terms
    and their relations
  • How?
  • Single-word Terms TFIDF classification
  • Domain Relevant Relations
  • Use Lexico-syntactic patters
  • Existing Ontologies
  • Collocation methods

4
Input
  • Introduction

Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
  • No seed words
  • No syntactic patterns
  • Just a collection of classified documents

5
Methods Used
Introduction
  • Methods Used

GermaNet
SPPC
Approach
Results
Conclusion
  • Builds on Other Systems
  • GermaNet
  • (They built an Ontology Inference Machine
  • to search GermaNet)
  • For Accessing Semantic relations
  • SPPC (Shallow Processing Production Center)
  • For Linguistic Annotation

6
Accessing Semantic RelationsGermaNet
Introduction
Methods Used
  • GermaNet

SPPC
Approach
Results
Conclusion
  • Developed within the LSD Project at the Division
    of Computational Linguistics of the Linguistics
    Department at the University of Tübingen, Germany
  • A lexical-semantic net
  • German nouns, verbs, and adjectives are
    semantically grouped by an underlying
    lexical concept (like a thesaurus) called
    synsets
  • Synsets are connected by semantic relations
  • Lexical relationships include synonyms,
    antonyms, and pertains to
  • Conceptual relations include hyponyms (is-a),
    meronyms (has-a), entailment, and cause
  • Based off the technology of WordNet (Princeton)

7
Accessing Semantic Relations WordNet
Introduction
Methods Used
  • GermaNet

SPPC
Approach
Results
Conclusion
8
Accessing Semantic Relations WordNet
Introduction
Methods Used
  • GermaNet

SPPC
Approach
Results
Conclusion
9
Accessing Semantic Relations WordNet
Introduction
Methods Used
  • GermaNet

SPPC
Approach
Results
Conclusion
10
Accessing Semantic RelationsInference Machine
Introduction
Methods Used
  • GermaNet

SPPC
Approach
Results
Conclusion
  • Allows GermaNets relations to be searched by
    other applications
  • Provides 3 different functions
  • Retrieval of relations assigned to words
  • Example Find all synonyms for the word bar ?
    rod, saloon,
  • Retrieval of relations between words
  • Example Find relations between
    Internet-Service-Provider and Company ? hyponym
    (so and ISP is a company)
  • Navigation in the GermaNet graph

11
Linguistic AnnotationSPPC
Introduction
Methods Used
GermaNet
  • SPPC

Approach
Results
Conclusion
  • SPPC (Shallow Processing Production Center)
  • Robust German NLP that uses cascaded optimized
    weighted finite state devices
  • SPPC parts
  • Tokenizer
  • Lexical Processor
  • Part-of-Speech Filtering
  • Named-entity Finder
  • Chunk recognizer

12
Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
  • Three Main components
  • TFIDF-based single-word term classifier
  • Lexico-syntactic pattern finder
  • Learns patterns based on known relations
  • Learns patterns based on term collocation methods
  • Relation Extractor

13
Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
1. Extract Single-word terms
Single-word term extraction (KFIDF)
2. Learn multi-word terms identify syntactic
patterns
3. Learn patterns from known relations
4. Extract related terms using found
lexico-syntactic patterns
14
Discovering Domain Relevant Terms
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
  • Apply a TFIDF measure KFIDF

15
Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
Collocation learner
16
Learning Term Collocations
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
  • Examples man-eating shark, dead serious, depend
    on, blue-collard
  • Measures
  • Mutual Information (probabilities)
  • Occurrence of one word predicts the occurrence of
    another
  • Not practical for sparse data
  • Log-Likelihood Measures (contingency tables)
  • - Tells how much more likely the occurrence of
    one pair is over the another
  • T-test
  • - Accept or reject the null hypothesis (terms are
    independent)

17
Their Extraction Engine
Introduction
Methods Used
GermaNet
SPPC
  • Approach

Results
Conclusion
Relation Extractor
18
Introduction
Methods Used
Learning Relations withLexico-syntactic patterns
GermaNet
SPPC
  • Approach

Results
Conclusion
Example of a lexico-syntactic pattern finding
relations Pattern or other Sentence Bruises,
wounds, or other injuries are common. Hyponym
Relations (Bruises, Injuries), (Wounds,
Injuries) ---------------------------------------
---------------------------------------- Pattern
as well as Sentence Cocaine as well as
Hashish, and LSD Near synonyms? -- Now we can
match LSD to Drug domain
19
Introduction
Methods Used
Learning Relations withLexico-syntactic patterns
GermaNet
SPPC
  • Approach

Results
Conclusion
Term relation extractor applies newly extracted
lecixo-syntactic patterns
Extracted terms
List of related terms with possible hyponymous
relations
GermaNet (semantic relationships)
Domain independent patterns
Domain specific patterns
With Near Synonyms search GermaNet to find
common hyponyms, then assign the newly found
hyponymous relation to the term not encode in the
GermaNet
Terms with semantic relations (synonymy,
hyponymy, meronymy)
Put semantically similar fragments Into
Landau-Finkelstien and Morins Algorithm to
cluster patterns
20
Introduction
Methods Used
Results
GermaNet
SPPC
Approach
  • Results

Conclusion
  • Theres a correlation between corpus size and
    precision
  • LogLike delivers best result compared to Mutual
    Information And T-Test
  • Noun-Verb collocations were most prominent and
    had best results
  • In Drug domain, N-V 56 precision and N-N
    41 precision

21
Introduction
Methods Used
Conclusion
GermaNet
SPPC
Approach
Results
  • Conclusion
  • KFIDF proves promising for single-word term
    extraction
  • Statistical measures are suitable for free-word
    order languages like German
  • Extracting term relations useful for real-world
    IE

22
Introduction
Methods Used
My Evaluation
GermaNet
SPPC
Approach
Results
  • Conclusion
  • Uses well known existing systems
  • Seemingly no human interaction
  • Domain Adaptive (robust)
  • - Precision does not seem to be too impressive,
    and recall? Id like to see more results
  • We see from the past few papers that automatic
    ontology generation approaches consist of
  • Combining multiple strategies (statistics,
    existing ontologies)
  • Have a cyclic, machine learning nature.
Write a Comment
User Comments (0)
About PowerShow.com