Automatic Creation and Simplified Querying of Semantic Web Content - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Creation and Simplified Querying of Semantic Web Content

Description:

Automatic Creation and Simplified Querying of Semantic Web Content ... Ambiguities: 'Are there any Ford mustangs, 2002, that are red? ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: davidw8
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Automatic Creation and Simplified Querying of Semantic Web Content


1
Automatic Creation and Simplified Querying of
Semantic Web Content
  • An Approach Based on Information-Extraction
    Ontologies

Yihong Ding, David W. Embley, and Stephen W.
Liddle Brigham Young University
2
Fundamental Problems
  • Lack of semantic web content
  • Difficulty of content creation
  • Inability to use semantic web content easily

3
Proposed Solutions
  • Automatically annotate data-rich web pages
    (turning them into semantic web pages)
  • Provide for free-form, textual queries of
    semantic web content

4
A Show-Case Vision
Find me the price and mileage of red Nissans I
want a 1990 or newer.
5
Demo I Data Extraction
6
Demo II Semantic Annotation
7
Demo III Free-Form Query
8
Explanation How it Works
  • Extraction Ontologies
  • Semantic Annotation
  • Free-Form Query Interpretation

9
Extraction Ontologies
Object sets Relationship sets Participation
constraints Lexical Non-lexical Primary object
set Aggregation Generalization/Specialization
10
Formalism Extraction Ontologies
(a quick side note)
  • Fully formalized in predicate calculus
  • Object set 1-place predicate
  • N-ary relationship set n-place predicate
  • Constraint closed predicate-calculus formula
  • As a description logic ALCN (Attributive
    Language with Complement and Numeric Restrictions)

11
Extraction Ontologies
Data Frame
Internal Representation float
Values
External Rep. \s\s(\d1,3)(\.\d2)?
Left Context
Key Word Phrase
Key Words (Pprice)(Ccost)
Operators
Operator gt
Key Words (more\sthan)(more\scostly)
12
Data-Extraction Results Car Ads
Salt Lake Tribune
Recall Precision Year 100 100 Make
97 100 Model 82 100 Mileage
90 100 Price 100 100 PhoneNr 94
100 Feature 91 99
Training set for tuning ontology 100 Test set
116
13
Car Ads Comments
  • Dynamic sets
  • Missed MERC, Town Car, 98 Royale
  • Could use lexicon of makes and models
  • Unspecified variation in lexical patterns
  • Missed 5 speed (instead of 5 spd), p.l (instead
    of p.l.)
  • could adjust lexical patterns
  • Misidentification of attributes
  • Classified AUTO in AUTO SALES as automatic
    transmission
  • Could adjust exceptions in lexical patterns
  • Typographical errors
  • Chrystler, DODG ENeon, I-15566-2441
  • Could look for spelling variations and common
    typos

14
General Extraction Results
  • 20 Domains (cars, obituaries, cameras, jobs,
    games, prescription drugs, )
  • Simple, unified domains nearly 100 recall and
    precision
  • Complex, loosely defined domains (e.g.
    obituaries 82 recall and 74 precision)
  • Typical 80 recall and precision

15
Generality Resiliency ofExtraction Ontologies
(another quick side note)
  • Assumptions about web pages (generality)
  • Data rich
  • Narrow domain
  • Document types
  • Simple multiple-record documents (easiest)
  • Single-record documents (harder)
  • Records with scattered components (even harder)
  • Declarative (resiliency)
  • Still works when web pages change
  • Works for new, unseen pages in the same domain
  • Scalable, but takes work to declare the
    extraction ontology

16
Semantic Annotation
17
Free-Form Query Interpretation
  • Parse Free-Form Query
    (with data extraction ontology)
  • Select Ontology
  • Formulate Query Expression
  • Run Query Over Semantically Annotated Data

18
Parse Free-Form Query
Find me the and of all
s I want a

price
mileage
red
Nissan
1996
or newer
gt Operator
19
Select Ontology
Find me the price and mileage of all red Nissans
I want a 1996 or newer
Similarity value 5
Similarity value 2
20
Formulate Query Expression
  • Conjunctive queries and aggregate queries
  • Mentioned object sets are all of interest in the
    result.
  • Values and operator keywords determine
    conditions.
  • Color red
  • Make Nissan
  • Year gt 1996

gt Operator
21
Formulate Query Expression
For
Let
Where
Return
22
Run QueryOver Semantically Annotated Data
23
Query Interpretation ResultsPilot Experiment
with Car Ads
  • 15 car-ads free-form queries from 3 volunteer CS
    students
  • Results
  • Recognizing object sets of interest
  • Recall 85
  • Precision 90
  • Recognizing constraints
  • Recall 61
  • Precision 79
  • Problems
  • Regular expressions not tuned up and lexicons
    incomplete
  • Ambiguities Are there any Ford mustangs, 2002,
    that are red? (Is 2002 a year, mileage, or
    price?)
  • Caveats
  • No disjunction
  • No negation

24
GeneralQuery Interpretation Results
  • AskOntos
  • (Pilot Experiment on 5 domains cars, real
    estate, countries, movies, diamonds)
  • Object sets of interest recognized
  • Recall 90
  • Precision 90
  • Conditions recognized
  • Recall 71
  • Precision 88

25
Pragmatics
All is not rosy
  • Technical problems
  • Extraction and query-interpretation accuracy
  • Execution speed
  • Harvesting
  • Crawling?!
  • Information behind forms on the hidden web
  • Social problems
  • Cooperation from web site developers
  • End-user concerns
  • Motivation
  • Trust

26
Conclusions
  • Automatically create semantic-web content
  • Do data extraction over an ordinary web page
  • Create semantic-web page
  • Cache page
  • Store external semantic annotation wrt an
    ontology
  • Query semantic web pages
  • Free-form queries
  • Return results
  • Table
  • Link to original web page (scrolled and
    highlighted)
  • Pragmatic considerations

www.deg.byu.edu
Write a Comment
User Comments (0)
About PowerShow.com