DataExtraction Ontology Generation by Example - PowerPoint PPT Presentation

About This Presentation
Title:

DataExtraction Ontology Generation by Example

Description:

Canon. PowerShot S40. 4.0. 1600 x 1200. 1024 x 768. 640 x 480. Architecture. Data Frame Library ... Canon. 4.0. 2272 x 1074. 3. 2. Object and Relationship Sets ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 15
Provided by: yuch3
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: DataExtraction Ontology Generation by Example


1
Data-Extraction Ontology Generation by Example
Yuanqiu (Joe) Zhou Data Extraction Group Brigham
Young University Sponsored by NSF
2
Motivation
  • Semi-structured Web data need to be extracted for
    further manipulations.
  • Contrast to other wrapper generation techniques,
    BYU ontology-based data-extraction technique is
    resilient.
  • By-Example approach makes it possible to help
    common users generate ontologies easily.

3
Web-based System GUI
4
Architecture
Data Frame Library
Sample Pages
Ontology Generator
User Defined Form
System GUI
Extraction Engine
Test Pages
Populated Database
5
Extraction Ontology
  • Object and Relationship Sets and Constraints
  • Extraction Patterns
  • Keywords
  • Context Expressions

6
Ontology GenerationObject and Relationship Sets
and Constraints
7
Ontology GenerationObject and Relationship Sets
and Constraints
A
B1
B1, B2 B
B2
8
User Created Form
Object and Relationship Sets and
Constraints DigitalCamera - object DigitalCame
ra 01 Brand 1 DigitalCamera 01 Model
1 DigitalCamera 01 CCDResolution
1 DigitalCamera 01 ImageResolution
1 DigitalCamera 01 Zoom 1 Zoom 01
DigitalZoom 1 Zoom 01 OpticalZoom 1
9
Ontology GenerationExtraction Patterns
  • Data Frame Library
  • Lexicons
  • Synonym Dictionaries or thesauri
  • Regular Expressions
  • Matching extraction patterns
  • Only one (bingo!)
  • More than one (use extraction pattern filters)
  • No matching extraction pattern (create one)

10
Ontology GenerationKeywords
  • Features a high-quality 4.0 Megapixel Resolution
    CCD
  • The new Nikon Coolpix 995 boasts of a 3.34
    Megapixel CCD
  • 3 effective megapixel

11
Ontology GenerationContext Expressions
  • 3.5x optical zoom (2.5x digital)
  • a superior 4x Optical Zoom Nikkor lens, plus 4x
    stepless digital zoom
  • optical 3X /digital 6X zoom

12
Extraction Ontology
DigitalCamera - object DigitalCamera 01
Brand 1 DigitalCamera 01 ImageResolution
1 DigitalCamera 01 Zoom
1 DigitalCamera 01 CCDResolution
1 Zoom01 OpticalZoom1 Brand
matches 10 constant extract
"\bNikon\b", extract "\bCanon\b",
extract "\bOlympus\b", extract
"\bMinolta\b", extract "\bSony\b" end
CCD Resolution matches 20 constant
extract "\b\d(\.\d1,2)?\b" keyword
"\bMegapixel\b, "\bCCD\b",
"\bCCD Resolution\b" end OpticalZoom
matches 10 constant extract "\b\d(\.\d)"
context
"\b\d(\.\d)?(x)\b" keyword
"\boptical\b" end
13
Measurements
  • How much of the ontology was generated with
    respect to how much could have been generated?
  • How many components generated should not have
    been generated?
  • What comparisons can we make about the precision
    and recall ratios of extraction data between a
    system-generated ontology and an expert-generated
    ontology?
  • How many sample pages are necessary for
    acceptable system performance?

14
Contributions
  • Proposes a by-example approach to
    semi-automatically generate data-extraction
    ontologies
  • Constructs a Web-based tool to generate
    data-extraction ontologies
Write a Comment
User Comments (0)
About PowerShow.com