Querying Web Data The WebQA Approach Sunny K'S' Lam and M' Tamer Ozsu - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Querying Web Data The WebQA Approach Sunny K'S' Lam and M' Tamer Ozsu

Description:

Answer Extractor (AE) takes a list of ranked records and a WebQAL expression, ... Answer Extractor. Answer Extractor (AE) analyzes each record produced by the ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 21
Provided by: Bin77
Category:

less

Transcript and Presenter's Notes

Title: Querying Web Data The WebQA Approach Sunny K'S' Lam and M' Tamer Ozsu


1
Querying Web Data The WebQA ApproachSunny
K.S. Lam and M. Tamer Ozsu
  • CS6411
  • Bin Liu
  • April, 2003
  • bliu_at_ece.gatech.edu

2
Abstract
  • Building a systemWebQA
  • Provides a declarative query-based approach to
    retrieve Web data using question-answer
    technology
  • Steps
  • First use meta-search techniques to gather
    candidate responses
  • Then use information extraction technologies to
    find out answers to the specific question

3
Difficulties in searching Web
  • 1. Wide Distribution of data
  • 2. High percentage of volatile data
  • 3. Large Volume
  • 4. Unstructuredness
  • 5. Redundancy
  • 6. Inconsistency of redundant copies
  • 7. Heterogeneity of data representation
  • 8. Data dynamism

4
Challenges against DBMS
  • No schema describing the logical organization of
    the data ? query model is not clear.
  • Questionable to search the entire Web
  • Lack of good Web Query Language
  • Existing approaches devised SQL-like languages
    based on graph models of the data
  • The models can not be generated for most of Web
    data

5
Contributions of WebQA
  • Query Web data without depending on a schema
  • Query the entire Web, but constrained by the
    capabilities of search engines and metasearchers
  • Does not depend on a schema to exploit existing
    categorization of Web data (e.g., Yahoo
    categories)
  • Return actual answers, not URLs (need complex
    processing join aggregate)
  • Easily scale with new data sources
  • Tolerates fuzzy user queries and produced results

6
Assumptions
  • Does not consider Continuous Queries
  • Persistent queries that allow systems to return
    answers when the answers become available
  • Answer user queries directly, not analyze
    predicate (like Web mining)
  • Focus only on factual queries, not procedural
    queries (e.g., How do I make pancakes)

7
WebQA System Architecture
8
Procedures
  • An incoming query is parsed
  • The best set of sources that can provide answers
    to query are identified
  • Query is decomposed into multiple subqueries that
    are submitted to each source
  • Answer collector interacts with these sources and
    obtains results from each of them
  • Complex Query Evaluator performs more complex
    operations (e.g., join aggregate) on retrieved
    data

9
Client/Server Communication Architecture
  • QA server generates a thread for each query
    request

10
WebQA Prototype System Architecture
11
General Steps
  • Query Parser (QP) converts the input query in
    natural language question to a valid WebQAL
    expression.
  • WebQA engine checks if valid WebQAL expression is
    in cache.
  • If query is in cache, retrieved answer from cache
    return user
  • otherwise, processing continues
  • Summary Retriever (SR) takes the WebQAL
    expression and produces a list of ranked records
    (ranking is done globally).
  • A record consists of
  • a text passage from a source,
  • the data source name where the passage comes
    from, and
  • a local rank (rank within the source).
  • Answer Extractor (AE) takes a list of ranked
    records and a WebQAL expression, and extracts a
    list of answers as output.
  • Engine returns the list of answers to specified
    user interface.

12
Query Parser
13
Query Parser
  • If query in Natural Language (NL)
  • QP analyzes, categorizes, translates it into
    WebQAL
  • If query in WebQAL
  • QP analyze, verify the correctness

14
Categorize queries
  • Seven categories
  • Name
  • Place
  • Time
  • Quantity
  • Abbreviation
  • Weather
  • Other
  • Assumption One question belongs to exactly one
    category

15
Categorize syntax
  • Categorization is keyword rule based
  • General syntax of WebQAL expressions
  • Category -output Output Option
  • -keywords Keyword List
  • Stopword elimination (remove a, the)
  • Verb-to-noun conversion

16
Categorization Example
  • Which country produced the most computers in the
    world?
  • Produced most computers
  • Producer most compuers
  • Place keywords producer most computers
  • place -output country -keywords most population
    world.

17
Retriever Module
18
Summary Retriever Module
  • Certain search engines and data sources are
    better at answering certain types of questions
    than others
  • Source Ranker uses such knowledge and category of
    query to produce a ranked list of sources to be
    used in search.

19
Answer Extractor
  • Answer Extractor (AE) analyzes each record
    produced by the Record Consolidator/Ranker and
    extracts answers from these records.
  • The extraction algorithm is based on word
    frequency.
  • The records are analyzed and a candidate list of
    answers are identified.
  • Each candidate is verified by checking it against
    the category and output parameters that are
    captured in the WebQAL expression.

20
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com