RIDLUR: Relationship Identification Leveraging Userdefined Rules - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

RIDLUR: Relationship Identification Leveraging Userdefined Rules

Description:

Classifying Semantic Relations in Bioscience Texts ... For example, 'is going to investigate' is a verb chunk. NE Transducer. Find the named entities. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 12
Provided by: micha414
Category:

less

Transcript and Presenter's Notes

Title: RIDLUR: Relationship Identification Leveraging Userdefined Rules


1
RIDLURRelationship Identification Leveraging
User-defined Rules
  • Meta-data Extraction Group
  • CSCI6350
  • Zixin Wu
  • Vincent Howard
  • Michael Moore

2
Background Reading
  • How to Create Topic Maps
  • Identifies all associations between two items in
    an index (topic map)
  • Uses GATE/JAPE
  • Classifying Semantic Relations in Bioscience
    Texts
  • Identifies relations between two fixed entities
  • S-CREAM - Semi-automatic Creation of Metadata
  • Annotation and Creation of metadata-enriched
    documents
  • Uses machine learning techniques to create rules
    for extraction
  • Semantic Enhancement Engine (SEE)
  • Does not use natural language processing
    techniques
  • Automatically determines target ontology
  • Keyword Extraction from the Web for FOAF Metadata
  • Uses context to identify relationships (two
    entities occur in multiple web pages)
  • Instance-Based Learning and Information
    Extraction for the Generation of Metadata
  • Uses simple (Perl) based regular expressions to
    identify entities and relationships
  • Doesnt work well for dirty data or a large
    variety of knowledge types

3
Research Problems
  • Information Extraction of structured,
    semi-structured and unstructured data
  • Entity Identification
  • Entity Disambiguation
  • Relationship Identification
  • Mapping to ontology
  • Classification
  • Identifying importance to target domain (one and
    many)

4
Problems to be addressed
  • Relationship Identification
  • Identify relationships in unstructured text
  • Applicability to target domain
  • Only concerned with relationships defined in the
    target ontology
  • Allows relationship extraction rules to be
    defined in the context of the relationship

5
System Architecture
6
System Architecture
  • GATE (Natural language processing tool)
  • English Tokeniser
  • Separate words and punctuations as "Tokens".
  • Sentence Splitter
  • Separate every sentence.
  • POS Tagger
  • Annotate the grammar function to every token.
  • Morphological analyzer
  • Annotate the root of every verb.
  • Gazetteer
  • Annotate the type of every token, by looking up
    lists.
  • OrthoMatcher
  • For example, "IBM" and "Big Blue" is the same
    thing.
  • Verb Group Chunker
  • For example, "is going to investigate" is a verb
    chunk.
  • NE Transducer
  • Find the named entities. For example, "Mr. Smith"
    is a person.

7
System Architecture
  • User perspective
  • Define regular expressions based on relationships
    in target ontology
  • Translated into JAPE Rules (complex regular
    expressions)
  • Generate automata on-the-fly.
  • Matching text returned to user
  • Simple text
  • RDF

8
Abilities and Limitations
  • Abilities
  • Present user with text that matched the regular
    expressions defined
  • Identifies instances of relationships defined in
    the target ontology
  • Associate each regular expression with one
    relationship in the target ontology
  • Limitations
  • Does no machine learning
  • Does not handle coreference and subordinate
    clause
  • Does not create instance data
  • Simple user interface
  • Only allows specifying one schema/knowledgebase

9
Example
Mr. Smith lived in Athens.
lived in Athens
subj
in
Verb2Subject subj Verb2Object in
Mr. Smith (Person)
Athens (Location)
Mr. Smith and Bob lived in Athens.
lived in Athens
subj
in
Verb2Subject subjset_member Verb2Object in
set
Athens (Location)
set_member
Mr. Smith (Person)
Bob (Person)
Verb2Subject subj(set_member)
10
References
  • How to Create Topic Maps, Kerk, R. (University
    Halle-Wittenberg), Groschupf, S. (Media Style
    Labs).
  • Classifying Semantic Relations in Bioscience
    Texts, Rosario, B, Hearst, M. UC Berkeley.
    Proceedings of ACL. 2004.
  • S-CREAM - Semi-automatic Creation of Metadata,
    Handschuh, S., Stabb, S., Ciravenga, F. Institute
    AIFB, Department of CS, University of Sheffield.
  • Semantic Enhancement Engine A Modular Document
    Enhancement Platform for Semantic Applications
    over Heterogeneous Content, Hammond, B.,
    Sheth, A., Kochut, K. Semagix and Department of
    Computer Science, University of Georgia.Real
    World Semantic Web Applications, 2002.
  • Keyword Extraction from the Web for FOAF
    Metadata, Mori, J. et. all. University of Tokyo.
  • Instance-Based Learning and Information
    Extraction for the Generation of Metadata,
    Lattner, A., Herzog, O. Center for Computing
    Technologies - TZI, University of Bremen,
    Germany. 3rd International Conference on
    Knowledge Management, 2003.

11
References
  • Extracting Personal Names from Emails Applying
    Named Entity Recognition To Informal Text,
    Minkov, E., Cohen, W., Wang, R. Carnegie Mellon
    University. Computational Linguistics, 2004.
  • Computational Linguistics Meets Metadata, or the
    Automatic Extraction of Keywords From Full Text
    Content, Deegan, M, et. all. 2004.
  • Automatic Document Metadata Extraction using
    Support Vector Machines, Han, H. , et. all.
    JCDL, 2003.
  • Exploiting Dictionaries in Named Entity
    Extraction Combining Semi-Markov Extraction
    Processes and Data Integration Methods, Cohen,
    W. (Carnegie Mellon University), Sarawagi, S.
    (IIT Bombay). KD, 2004.
  • A Comparison of String Metrics for Matching
    Names and Records, Cohen, W., Ravikumar, P.,
    Flenberg, S. Carnegie Mellon University. 2003.
  • From Unstructured Data to Actionable
    Intelligence, Rao, R. IEEE IT Pro, 2003.
  • SemTag and Seeker Bootstrapping the Semantic
    Web via Automated Semantic Annotation, Dill, S.,
    et. all. IBM Almaden Research Center. 2003.
  • Web Directories as Training Data for Automated
    Metadata Extraction, Kavalec, M., Svatek, V.,
    Strossa, P. University of Economics, Prague,
    Czech Republic. 2004.
  • Adaptive Name Matching in Information
    Integration, Bilenkio, M., Mooney, R.
    (University of Texas at Austin), Cohen, W.,
    Ravikumar, P., Flenberg, S. (Carnegie Mellon
    University). IEEE Intelligent Systems, 2003.
  • A Generalized Framework for an Ontology-Based
    Data-Extraction System, Wessman, A., Liddle, S.,
    Embley, D. ISTA. 2005.
  • Automatic Discovery of WordNet Relations,
    Hearst, M. WordNet An Electronic Lexical
    Database. MIT Press. 1998.
Write a Comment
User Comments (0)
About PowerShow.com