FACT: A Learning Based Web Query Processing System - PowerPoint PPT Presentation

About This Presentation
Title:

FACT: A Learning Based Web Query Processing System

Description:

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 35
Provided by: dom1
Category:

less

Transcript and Presenter's Notes

Title: FACT: A Learning Based Web Query Processing System


1
FACT A Learning Based Web Query Processing System
  • Hongjun Lu, Yanlei Diao
  • Hong Kong U. of Science Technology
  • Songting Chen, Zengping Tian
  • Fudan University

2
Outline
  • Introduction
  • Learning Based Web Query Processing
  • FACT A Prototype System
  • Preliminary System Evaluation
  • Conclusions

3
How Do We Query the Web?
  • Use a search engine
  • Form query key words
  • An example Find room rates of hotels in Hong
    Kong
  • used search engine www.yahoo.com
  • keywords Hong Konghotel

4
forward
Hotel 1
3
Hotel 2
Look at the Number!
done
5
Query the Web -- Current Situation
  • Search engines return a long list of URLs. User
    is required to browse the web pages to find the
    information.
  • The information required is often not on the
    returned page -- navigation through hyperlinks is
    often required (those links may or may not that
    obvious).
  • The target information is in different forms
    (paragraphs, lists, tables )
  • A lot of web pages to be browsed

Are we happy with this?
6
Efforts to Improve the Situation
  • Search engines
  • better index, improve precision/recall,
    metasearch engines, better presentation of
    results, .
  • IR techniques to Web
  • document clustering/indexing, better model,
    similarity functions, documents ranking, ...
  • Intelligent agent
  • user profiling, hyperlink recommendation, ...
  • Database approach
  • wrappers, query languages,

7
Our Dream
  • Querying the Web as easy as querying a relational
    database
  • SQL query returns a table of hotel prices
  • SELECT room rates
  • FROM web.hotel
  • WHERE city hong kong
  • May remain a dream for a while -(

8
A Practical goal
  • Use keywords to express query requirements
  • simple, no need to know schema of data
  • inaccurate
  • Relieve users from tedious browsing as much as
    possible
  • Not URLs, not Web sites, even not Web pages
  • Present query results to users as accurate and
    concise as possible
  • Tables, lists, paragraphs, containing user
    required information

9
Query Results -- Queried Segments
  • Return query results as accurate and concise as
    possible.
  • Basic idea
  • Breaking a Web page into segments a row in a
    table, a table, an item in a list, a list, a
    paragraph,
  • returning only queried segments to users
  • queried segments segments that contain the
    information the user is interested in .

10
Outline
  • Introduction
  • Learning Based Web Query Processing
  • FACT A Prototype System
  • Preliminary System Evaluation
  • Conclusions

11
Learning Based Query Processing
  • The fundamental difficulties in Web query
    processing
  • Web is a huge, ever growing, heterogeneous,
    semi-structured data source
  • Most users of Web are naïve users issuing ad hoc
    queries
  • Learn the knowledge for query processing from the
    User!

12
A Learning Based Technique
  • Learn from the user when he browses from the
    first few URLs
  • to navigate through the web pages
  • to identify the required information in a web
    page
  • Process the rest URLs automatically and retrieve
    queried segments

13
forward
Hotel 1
3
Hotel 2
User browses it!
done
14
Back
User clicks here!
15
Room information
User marks it!
16
back
Fact starts here!
17
roomrates
Fact chooses it!
18
xxx
Fact finds it!
19
Outline
  • Introduction
  • Learning Based Web Query Processing
  • FACT A Prototype System
  • Preliminary System Evaluation
  • Conclusions

20
A Query Processing System
  • A learning based query processing system
  • User Interface accepts user queries, presents
    query results, a browser capable of capturing
    user actions
  • Query Analyzer analyzes and transforms user
    queries
  • Session Controller coordinates learning and
    locating
  • Learner generates knowledge from captured user
    actions
  • Locator applies knowledge and locates query
    results
  • Crawler Parser retrieves pages and parses to
    trees
  • Knowledge Base stores learned knowledge

21
Reference Architecture
22
A Query Session
23
Training Strategies
  • Sequential
  • First n sites user browses and system learns
  • Next N-n sites system processes
  • Random
  • Randomly choose n sites user browses and system
    learns
  • the system processes the rest
  • Interleaved
  • First n0 sites, user browses and system learns
  • Next n - n0 site, system makes decision. For
    incorrect ones, user browses and system re-learns
  • Next N-n sites system processes

24
Outline
  • Introduction
  • Learning Based Web Query Processing
  • FACT A Prototype System
  • Preliminary System Evaluation
  • Conclusions

25
System Evaluation
  • Functionality
  • Performance
  • precision, recall, correctness
  • efficiency in a site, how many pages the system
    visits to find a result
  • training efficiency how many training samples
    are needed
  • User interface

26
(No Transcript)
27
System Evaluation - Effectiveness
  • Given a set of keywords, the system makes N
    decisions
  • N N1 N2 N3 N4
  • Precision N1 / (N1N3) ,
  • Recall N1 / relevant sites ,
  • Correctness (N1N2) / N .

28
System Evaluation - Efficiency
  • How efficiently the system finds a queried
    segment in a site?
  • Level of a Queried Segment the length of the
    shortest path to find it
  • Absolute Path length Crawled pages,
  • Relative Path Length Crawled pages / Level of
    the Queried Segment .

29
Basic Performance

Q11 Hong Hong Hotel Room Rate Q12 Hong Kong
Hotel
Sequential training
30
Query Q12 Effects of training Strategies
31
Improved Performance
Interleaved training
32
Outline
  • Introduction
  • Learning Based Web Query Processing
  • FACT A Prototype System
  • Preliminary System Evaluation
  • Conclusions

33
Conclusions
  • Proposed and implemented learning based Web query
    processing with the following features
  • Returning succinct results segments of pages
  • No a prior knowledge or preprocessing, suited for
    ad hoc queries
  • exploiting page formatting and linkage
    information simultaneously.
  • The preliminary results are promising

34
Future Work
  • Better knowledge
  • key factor that affects system performance
  • Dynamic web pages ?
  • Integrating results from another project
  • System evaluation
  • Prototype ? product ? dot com company
  • ???
Write a Comment
User Comments (0)
About PowerShow.com