Meaningful Labeling of Integrated Query Interfaces - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Meaningful Labeling of Integrated Query Interfaces

Description:

How to show that Class is a hypernym of Class of Tickets in the Airline domain? E. Dragut et al ... cheap. Class of Ticket. alldest. airfare. aa. c_TicketClass ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 26
Provided by: ramonla
Category:

less

Transcript and Presenter's Notes

Title: Meaningful Labeling of Integrated Query Interfaces


1
Meaningful Labeling of Integrated Query Interfaces
Eduard C. Dragut (speaker) Clement Yu Weiyi Meng
University of Illinois at Chicago University of
Illinois at Chicago SUNY at Binghamton
VLDB 2006, Seoul, Korea
2
A Motivating Scenario
  • Looking for a ticket
  • Chicago Seoul, September 10th September 17th

delta.com
orbitz.com
expedia.com
  • A user looking for the best price for a ticket
  • Has to explore multiple sources
  • It is tedious, frustrating and time-consuming

3
The goal
  • Provide a unified way to query multiple sources
    in the same domain

The Web
Unified query interface
Airfare.com
priceline.com
united.com
delta.com
nwa.com
4
Overview Integrating Query Interfaces
(Deep) Web
5
Overview Integrating Query Interfaces
  • Integration Steps
  • Structural merging of query interfaces He03 et
    al, Dragut06 et al
  • Grouping constraints
  • Ancestor-Descendant relationships
  • Determining the domain of each global field in
    the integrated interface He03 et al
  • Meaningful labeling of the integrated interface
  • The topic of this presentation

6
Motivation of Naming
  • A query interface needs to be easily understood
    by any user, irrespective of his/her background
  • The study of query interfaces in the seven
    domains used in our experiment revealed that the
    designers of query interfaces follow some
    hidden norms
  • there are certain relationships between the
    labels of the fields in the same groups
  • E.g., all plurals
  • the labels of the (super) groups semantically
    characterize the set of fields underneath them
  • The semantic ambiguity problem
  • Synonyms and homonyms are the two sources of
    naming conflicts Batini86 et al, Bright94 et al

7
The objectives
  • The main goal is to provide a systematic way to
    label fields in the integrated query interface so
    that the concepts on the integrated query
    interface are easily understood by ordinary
    users.
  • Validated undergoing a survey
  • Provide a set of desirable properties required in
    order to have consistent labels for the
    attributes within an integrated interface so that
    users have no difficulty in understanding it.
  • Not covered in detail

8
Naming Algorithm
  • The input
  • A set of query interfaces in the same domain
  • E.g. Airline domain Delta, AA, NWA, Orbitz,
    Travelocity
  • Each query interface is represented
    hierarchically Wu04
  • The mapping between the fields of the query
    interfaces.
  • Organized in clusters (e.g. Wu04 et al, B.He03
    et al)
  • The set of groups of fields given by the merge
    algorithm Dragut06 et al
  • The integrated query interface given by the merge
    algorithm as a schema tree Dragut06 et al

vacations.net
9
An Example of Input
  • Three fragments of query interfaces represented
    hierarchically
  • The mapping between them, i.e. the set of clusters

10
Naming Algorithm - Sketch
  • Step 1 Consistent labeling of the fields
  • Fields in the same group - use intersect-and-union
    strategy
  • Isolated fields, no consistency required
  • Root fields - treated as a group
  • Output each group of fields (or field) has a set
    of candidate labels, possibly empty
  • Step 2 Consistent labeling of the internal nodes
  • For each internal node, starting from the lowest
    level to the root, apply a set of inference
    rules on labels
  • Output each internal node has a set of candidate
    labels, possibly empty
  • Step 3 Enforce consistency within the entire
    integrated interface
  • Not covered

11
Preliminaries
  • Normalization e.g., He03 et al, Madhavan01 et al
    , Rahm01 et al
  • E.g. Adults (18-64) becomes adult
  • Semantic relationships among complex labels need
    to be established
  • E.g., synonymy, hypernymy/ hyponymy
  • Main issues
  • Thesauruses provide semantic relationships only
    for individual content words (e.g., WordNet
    Fellbaum98)
  • How to show that Area of Study is a synonym of
    Field of Work in the Job domain?
  • How to show that Class is a hypernym of Class of
    Tickets in the Airline domain?

12
Preliminaries
  • Manipulation of labels
  • A label is seen as a set of normalized content
    words
  • E.g., area, study corresponds to Area of Study
  • E.g., field, work corresponds to Field of Work
  • Area of Study is a synonym of Field of Work
  • Area is synonym of Field (by WordNet)
  • Study is synonym of Work (by WordNet)
  • Most descriptive vs. most general labels
  • e.g. Category, Job Category, Area of Work,
    Function
  • Category and Function too general
  • Job Category and Area of Work descriptive,
    avoids confusion

13
Consistent Labeling of Groups of Fields
  • Assumption
  • The labels given by a query interface for the
    fields in the same group are consistent
  • Organize the labels of a group in a relation-like
    form, called group relation
  • General idea to build a consistent solution
  • Combine multiple rows of consistent labels until
    a label is assigned to each field in the group

14
Consistent Labeling of Groups of Fields
  • Levels of Consistency
  • String Level
  • Two distinct tuples belong to this level of
    consistency if they have the same label for a
    cluster in the group relation
  • Equality Level
  • Two distinct tuples belong to this level of
    consistency if they have equal labels for a
    cluster in the group relation
  • Synonymy Level
  • Two distinct tuples belong to this level of
    consistency if they have synonym labels for a
    cluster in the group relation

15
Consistent Labeling of Internal Nodes
  • The problem
  • Given an internal node in the integrated
    interface, determine a label that is semantically
    suitable for it, i.e. its semantic is rich enough
    to cover the semantics of all its descendant leaf
    nodes
  • An example
  • a fragment of the integrated interface of real
    Estate domain

16
Consistent Labeling of Internal Nodes
  • In assigning labels to internal nodes we mainly
    exploit two types of knowledge
  • The semantic relationship among the labels of the
    internal nodes in the individual schema trees
  • The relationship between internal nodes of source
    schema trees with overlapping sets of descendent
    leaves
  • The two types of knowledge are employed to derive
    a set of logical inference rules among the
    textual labels
  • Some of them will be exemplified next

17
Consistent Labeling of Internal Nodes
  • First logical inference
  • Informally, consider two internal nodes v1 and v2
    of two distinct source schema trees with the
    property that
  • v1s set of descendant leaves is a subset of
    v2s set of descendant leaves nodes,
  • and v1s label is a hypernym of v2s label
  • Then the labels of the two nodes are semantically
    equivalent within the given domain of discourse
  • An example

18
Consistent Labeling of Internal Nodes
  • Second logical inference (the idea)
  • The same label is assigned to internal nodes in
    multiple source query interfaces and the
    descendant leaves of each such internal node are
    among those of the internal node in the
    integrated interface for which a label is sought.
  • An example
  • Fragment integrated query interface
  • Within source query interfaces

19
Consistent Labeling of Internal Nodes
  • Third logical inference (hypernymy scenario)
  • Informally, consider two internal nodes v1 and v2
    of two distinct source schema trees with the
    property that
  • v1s label is a hypernym of v2s label
  • Then v1s label semantically covers the union of
    the descendant nodes of the two nodes.
  • An example
  • Fragment integrated query interface
  • Within source query interfaces

20
Where can the instances help?
  • Discard labels as values
  • The problem is known as schema element name as
    value Xu03, Dhamankar04
  • Example, in the Book domain labels like Hardcover
    or Paperback are data instances of fields with
    labels like Format or Binding
  • Reconcile most general vs. most descriptive
  • The idea is to bound the meaning of the most
    general label to a more descriptive one

21
Experiment
  • Setup
  • Seven real world domain
  • Used also in Wu04 et al, Madhavan05 et al,
    Dragut06 at al

22
Experiment
  • Human Acceptance
  • Questions asked
  • Do you have any difficulty in filling in an entry
    for each field?
  • If you do, identify the fields you had difficulty
    filling in.
  • Are the fields understandable on the source
    interfaces?
  • 11 Survey respondents reported the following

23
Example Integrated Interfaces
  • Airfare domain integrated interface

Four people found the group confusing
24
Example Integrated Interfaces
  • Auto domain integrated interface
  • No surveyed person has identified any problem
    for this integrated query interface

25
End
  • Please visit the project web site
  • http//www.cs.uic.edu/edragut/QIProject.html

Thank you for your time and patience!
Write a Comment
User Comments (0)
About PowerShow.com