Semiautomatic Generation of Resilient DataExtraction Ontologies - PowerPoint PPT Presentation

About This Presentation
Title:

Semiautomatic Generation of Resilient DataExtraction Ontologies

Description:

Knowledge. Preparation. Application. Specification. Domain. Allocation. Ontology. Generation ... knowledge. Specifies application domain. Allocates domain ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 12
Provided by: deg7
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semiautomatic Generation of Resilient DataExtraction Ontologies


1
Semiautomatic Generation of Resilient
Data-Extraction Ontologies
  • Yihong Ding
  • Data Extraction Group
  • Brigham Young University
  • Sponsored by NSF

2
Introduction
  • Wrapper-driven data extraction
  • Pros data-source-specified, high performance
  • Cons lack of resiliency and scalability
  • Ontology-driven data extraction
  • Pros application-domain-specified, resilient and
    scalable
  • Cons hard to create
  • Objective
  • Generating data-extraction ontologies

3
Generation Architecture
pre-processing
interact if necessary
Concept Selection
Extraction Processing
Relation Retrieval
Constraint Discovery
Integrated Knowledge Base
Results Storage
Result Evaluation
pre-processing
Knowledge Sources
4
Knowledge Base Construction
  • Knowledge Sources
  • Mikrokosmos (?K) Ontology
  • Data-Frame Library
  • Additional Lexicons
  • WordNet
  • Integration of Knowledge Base

5
Application Specification
Record 1 00 GrandAM SE, Sunfire Red, CD, AC, PW,
PLGreat Condition, 10,800, Call 798-3446
Record 2 02 Buick Century Custom, Pwr Seat,
Nada Retail 13,695Only 12,695. 221-1250 Record
3 02 Buick Century, lo mi, mint cond, 11,999.
373-4445 dlr 2755 Record 4 00 Buick Century
Stk HU7159 Green 9,319, 714-2200To Apply By
Phone, 1-877-228-9486, OREM Utah
6
Domain Allocation concept selection
  • Select concepts using string-matching with object
    values
  • Resolve conflict by context or semantic meanings

7
Domain Allocation relationship retrieval
  • Find paths among selected concept nodes
  • Retrieve cluster representing application domain

Record 1 00 GrandAM SE, Sunfire Red, CD, AC, PW,
PLGreat Condition, 10,800, Call 798-3446
Record 2 02 Buick Century Custom, Pwr Seat,
Nada Retail 13,695Only 12,695. 221-1250 Record
3 02 Buick Century, lo mi, mint cond, 11,999.
373-4445 dlr 2755 Record 4 00 Buick Century
Stk HU7159 Green 9,319, 714-2200To Apply By
Phone, 1-877-228-9486, OREM Utah
8
Domain Allocation constraint discovery
  • Discover participation times for each object
    values
  • Specify discovered values to be participation
    constraints

02 Buick Century, lo mi, mint cond, green, pwr
seat, 11,999. 373-4445 dlr 2755
AUTOMOBILE 01 has MAKE 1
AUTOMOBILE 0 has FEATURE 1
00 Buick Century Stk HU7159 Green 9,319,
714-2200To Apply By Phone, 1-877-228-9486, OREM
Utah
AUTOMOBILE 01 has PRICE 11
9
Ontology Generation
  • Initial ontology automatically generated
  • Updated ontology user tuning
  • Expectation
  • Rejecting existence much easier than adding new
  • Modification as less as possible

10
Evaluation and Results
  • Evaluation
  • Compare Generated vs. Expert-created
  • POG (Precision of Ontology Generation)
  • PROG (Pseudo-Recall of Ontology Generation)
  • EPROG (Effective-PROG)
  • Results
  • Three testing domains Apt-Rental, Used-Auto-Ads,
    Nation-Essence
  • Average POG less than 0.23
  • Lowest EPROG is around 0.70, highest is almost
    1.0

11
Conclusion
  • Exploits existing knowledge
  • Specifies application domain
  • Allocates domain inside the knowledge base
  • Generates a data-extraction ontology
  • Shows effective recall of more than 70 on
    average
Write a Comment
User Comments (0)
About PowerShow.com