Querying Web Data - PowerPoint PPT Presentation

About This Presentation
Title:

Querying Web Data

Description:

Answer Extractor (AE) Summary. Retriever (SR) Search Engine. Web Data. Source. Web Site ... Answer Extractor. Candidate. Retriever. Rearrange. Output. Converter ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 18
Provided by: siteUo8
Category:
Tags: data | extractor | querying | web

less

Transcript and Presenter's Notes

Title: Querying Web Data


1
Querying Web Data
The WebQA Approach
  • Author Sunny K.S.Lam and M.Tamer Özsu
  • CSI5311 Presentation
  • Dongmei Jiang and Zhiping Duan

2
Agenda
  • Properties of Web Data
  • Approaches of Web Data Searching
  • WebQA Introduction
  • WebQA System Architecture
  • WebQA Implementation
  • WebQA System Evaluation
  • Conclusion

3
Web Data Searching
  • Search Engine is Enough?
  • Web Data Query is Necessary?

4
Characteristics of Web Data
  • Properties of Web Data
  • Wide distribution, large volume
  • High percentage of volatility
  • Unstructuredness, redundancy, inconsistency of
    redundant copies
  • Representation heterogeneity
  • Dynamism
  • DB Perspective Difficulties of Querying Web Data
  • No schema
  • Short of scalability in searching the whole web
  • No exact web query language

5
Web Data Searching Approaches
  • Information Retrieval Approach
  • Search engine and Metasearchers
  • Database-Oriented Web Querying
  • Information Integration
  • Semistructured Data Querying
  • Special Web Query Languages
  • Question-Answer

6
Question-Answer Approach
  • Basic principle
  • Web pages that could contain the answer to the
    user query are retrieved
  • The answer is extracted from these pages.
  • NLP and Information Retrieval (IR)
    technologies
  • Answer extracted by Information Extraction
    (IE) techniques.
  • Example Systems
  • Mulder Kwork et al, 2001
  • WebQA Lam Özsu, 2002

7
WebQA
  • Question-answer approach
  • Accepts short factual queries
  • Returns the exact answers
  • Aims at
  • Accept fuzziness in user queries
  • Return actual answers, not URLs
  • Query entire webs and easily scale with new data
    sources

8
WebQA System Architecture

User
Query Parser
Answer Formatter
Semantic Cache Manager
Resource Locator/ Decomposer
Complex Query Evaluation
Cache
Answer Collector

Search Engine
Web Data Source
Web Site
Search Engine
Reference 1
9
WebQA Prototype Architecture

User
Query Parser (QP)
Answer Extractor (AE)
Valid WebQAL Query
Keywords Category
List of Ranked Records
Summary Retriever (SR)
Keywords / Description
Keywords/ Description

Search Engine
Web Data Source
Web Site
Search Engine
Reference 1
10
Query Parser
User query
ltName, Place, Time, Quantity, Abbreviation,
Weather, and Othergt
NL question
WebQAL
Categorizer
WebQAL Checker
NL question
category
WebQAL Generator
Valid WebQAL
Reference 1
Query Example which country produced the most
computers in the world? WebQAL
Syntax ltcategorygt -output ltoutput optiongt
-keywords ltkeyword listgt place output
country keywords producer most computers
11
Summary Retriever
WebQAL
  • Source Ranker identifies better data resources
    to answer certain types of questions.
  • Ranked records are based on the source ranking
    first and local ranking second.

Keyword Generator
Source Ranker
List of Ranked Records
Record Consolidator/Ranker
Record Retriever
Record Retriever
Wrapper 2
Wrapper 1
Web Site
Remote Database
Reference 1
12
Answer Extractor
  • Candidate is retrieved based on
  • word frequency of occurrence of the answer and
    the score of the rule that adds it to the
    candidate list.
  • The higher the score, the more likely is the
    candidate the answer to the users query.
  • The shorter the answer, the higher the score .

Reference 1
13
WebQA Implementation Architecture
Web Server
QA Server
Client 1
QA Server Thread
QA Engine
Question/Answer (HTTP)
JSPs, HTMLs
Question answer
Client 2
QA Server Thread
(string)
Q/A
. . .
QA Server Thread
Client N
Reference 3
14
System Evaluation
Evaluation is using TREC-9 and measured in two
aspects accuracy and efficiency
Reference 3
15
Conclusion
  • WebQA is in Question-Answer approach.
  • query input, exact answer
  • NLP, IR and IE technologies
  • Data schema-independent.
  • Query multiple Web sources
  • Search engines
  • Data sources (CIA World Factbook)
  • Web Sites.

16
Future work
  • To develop a full-fledged Web query system
  • Execution algorithms for more complex queries
  • Common aggregation functions on retrieving
    answers
  • To think about other query types
  • Continuous query
  • Ex notify me whenever the Ottawas temperature
    drops below zero
  • Procedural query
  • Ex How do I make pancakes?

17
References
  1. S. Lam and M.T. Özsu, "Querying Web Data - The
    WebQA Approach. WISE 2002.
  2. D. Florescu and A. Levy and A. Mendelzon.
    Database techniques for the World Wide Web A
    survey. SIGMOD Record, 27(3)59-74, 1998.
  3. Web Data Management -Some Issues, M.T. Özsu,
    Course Slides
Write a Comment
User Comments (0)
About PowerShow.com