Title: Querying Web Data
1Querying Web Data
The WebQA Approach
- Author Sunny K.S.Lam and M.Tamer Özsu
- CSI5311 Presentation
- Dongmei Jiang and Zhiping Duan
2Agenda
- Properties of Web Data
- Approaches of Web Data Searching
- WebQA Introduction
- WebQA System Architecture
- WebQA Implementation
- WebQA System Evaluation
- Conclusion
3Web Data Searching
- Search Engine is Enough?
- Web Data Query is Necessary?
4Characteristics of Web Data
- Properties of Web Data
- Wide distribution, large volume
- High percentage of volatility
- Unstructuredness, redundancy, inconsistency of
redundant copies - Representation heterogeneity
- Dynamism
- DB Perspective Difficulties of Querying Web Data
- No schema
- Short of scalability in searching the whole web
- No exact web query language
5Web Data Searching Approaches
- Information Retrieval Approach
- Search engine and Metasearchers
- Database-Oriented Web Querying
- Information Integration
- Semistructured Data Querying
- Special Web Query Languages
- Question-Answer
6Question-Answer Approach
- Basic principle
- Web pages that could contain the answer to the
user query are retrieved - The answer is extracted from these pages.
- NLP and Information Retrieval (IR)
technologies - Answer extracted by Information Extraction
(IE) techniques. - Example Systems
- Mulder Kwork et al, 2001
- WebQA Lam Özsu, 2002
7WebQA
- Question-answer approach
- Accepts short factual queries
- Returns the exact answers
- Aims at
- Accept fuzziness in user queries
- Return actual answers, not URLs
- Query entire webs and easily scale with new data
sources
8WebQA System Architecture
User
Query Parser
Answer Formatter
Semantic Cache Manager
Resource Locator/ Decomposer
Complex Query Evaluation
Cache
Answer Collector
Search Engine
Web Data Source
Web Site
Search Engine
Reference 1
9WebQA Prototype Architecture
User
Query Parser (QP)
Answer Extractor (AE)
Valid WebQAL Query
Keywords Category
List of Ranked Records
Summary Retriever (SR)
Keywords / Description
Keywords/ Description
Search Engine
Web Data Source
Web Site
Search Engine
Reference 1
10Query Parser
User query
ltName, Place, Time, Quantity, Abbreviation,
Weather, and Othergt
NL question
WebQAL
Categorizer
WebQAL Checker
NL question
category
WebQAL Generator
Valid WebQAL
Reference 1
Query Example which country produced the most
computers in the world? WebQAL
Syntax ltcategorygt -output ltoutput optiongt
-keywords ltkeyword listgt place output
country keywords producer most computers
11Summary Retriever
WebQAL
- Source Ranker identifies better data resources
to answer certain types of questions. - Ranked records are based on the source ranking
first and local ranking second.
Keyword Generator
Source Ranker
List of Ranked Records
Record Consolidator/Ranker
Record Retriever
Record Retriever
Wrapper 2
Wrapper 1
Web Site
Remote Database
Reference 1
12Answer Extractor
- Candidate is retrieved based on
- word frequency of occurrence of the answer and
the score of the rule that adds it to the
candidate list. - The higher the score, the more likely is the
candidate the answer to the users query. - The shorter the answer, the higher the score .
Reference 1
13WebQA Implementation Architecture
Web Server
QA Server
Client 1
QA Server Thread
QA Engine
Question/Answer (HTTP)
JSPs, HTMLs
Question answer
Client 2
QA Server Thread
(string)
Q/A
. . .
QA Server Thread
Client N
Reference 3
14System Evaluation
Evaluation is using TREC-9 and measured in two
aspects accuracy and efficiency
Reference 3
15Conclusion
- WebQA is in Question-Answer approach.
- query input, exact answer
- NLP, IR and IE technologies
- Data schema-independent.
- Query multiple Web sources
- Search engines
- Data sources (CIA World Factbook)
- Web Sites.
16Future work
- To develop a full-fledged Web query system
- Execution algorithms for more complex queries
- Common aggregation functions on retrieving
answers - To think about other query types
- Continuous query
- Ex notify me whenever the Ottawas temperature
drops below zero - Procedural query
- Ex How do I make pancakes?
17References
- S. Lam and M.T. Özsu, "Querying Web Data - The
WebQA Approach. WISE 2002. - D. Florescu and A. Levy and A. Mendelzon.
Database techniques for the World Wide Web A
survey. SIGMOD Record, 27(3)59-74, 1998. - Web Data Management -Some Issues, M.T. Özsu,
Course Slides