Paper Presentation: - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Paper Presentation:

Description:

members.tripod.com/acor/Humpback.htm. www.bam.com. www.spritpcs.com ... members.tripod.com/perfect/phones.html. Query. URL. Agglomerative Iterative Clustering ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 11
Provided by: shuilun
Category:

less

Transcript and Presenter's Notes

Title: Paper Presentation:


1
Paper Presentation
Agglomerative clustering of a search engine query
log Doug Beeferman, Lycos Ins. Adam Berger, CMU
CS
  • Shui-Lung Chuang
  • Dec. 21, 2000

2
Introduction
  • To find similar queries and URLs by mining a
    collection of user transactions
  • Term suggestion
  • Ontology generation / web pages organization
  • Observation

Gameboy
Nintendo
Game Emulator
3
Related Work
  • Related Work
  • Query log analysis 3
  • Clustering URLs
  • content-based
  • collaborative
  • Clustering queries
  • query-session-based
  • According to 2, 76 of 40,000 web users try
    rephrasing the query on the same search engine
    after a failed search.
  • collaborative
  • content-based

4
Log Click-through data
  • Through HTTP and sophisticated web page design, a
    search engine can log query-click transaction

Query American Airline
5
Bipartite-Graph Representation

Query
URL
Bell Atlantic Mobile
members.tripod.com/acor/Humpback.htm
www.bam.com
endangered humpback whales
members.tripod.com/perfect/phones.html
cellular phones
wireless device
Shreveport Times
www.spritpcs.com
www.mismicellular.com
Sprint PCS
www.nwlouisiana.com/times.html
6
Agglomerative Iterative Clustering

Definition
Q
U
Clustering Algorithm
  • merge qi,qj for which ?(qi,qj) is largest
  • merge ui,uj for which ?(ui,uj) is largest
  • repeat unless

Complexity
7
Example

Bell Atlantic Mobile
members.tripod.com/acor/Humpback.htm
cellular phones
wireless device
www.bam.com
endangered humpback whales
members.tripod.com/perfect/phones.html
www.spritpcs.com
Sprint PCS
www.mismicellular.com
Shreveport Times
www.nwlouisiana.com/times.html
8
Experiment
  • Experiment Data
  • Measure Approach
  • Apply in real-world term suggestion on Lycos
  • baseline approach (based on query sessions)
  • full replacement
  • hybrid

9
Experiment (cont.)
Term suggestion interface

Result
10
Conclusion
  • Remarks
  • Log is a fertile source for web data mining work
  • The kind of log available on hand determine the
    approach
  • Reference
  • 1 Doug Beeferman and Adam Berger, Agglomerative
    clustering of a search engine query log, KDD2000,
    pp. 407-416, 2000.
  • 2 NPD Search and Portal Site Survey, Published
    by NPD New Media Services, Reported in
    www.searchenginewatch.com
  • 3 C. Silverstein,M. Henzinger, H. Maris and M.
    Moricz, Analysis of a very large AltaVista query
    log, DEC SRC Technical Note, 1998.
Write a Comment
User Comments (0)
About PowerShow.com