Title: Paper Presentation:
1 Paper Presentation
Agglomerative clustering of a search engine query
log Doug Beeferman, Lycos Ins. Adam Berger, CMU
CS
- Shui-Lung Chuang
- Dec. 21, 2000
2Introduction
- To find similar queries and URLs by mining a
collection of user transactions - Term suggestion
- Ontology generation / web pages organization
-
- Observation
Gameboy
Nintendo
Game Emulator
3Related Work
- Related Work
- Query log analysis 3
- Clustering URLs
- content-based
- collaborative
- Clustering queries
- query-session-based
- According to 2, 76 of 40,000 web users try
rephrasing the query on the same search engine
after a failed search. - collaborative
- content-based
4Log Click-through data
- Through HTTP and sophisticated web page design, a
search engine can log query-click transaction
Query American Airline
5Bipartite-Graph Representation
Query
URL
Bell Atlantic Mobile
members.tripod.com/acor/Humpback.htm
www.bam.com
endangered humpback whales
members.tripod.com/perfect/phones.html
cellular phones
wireless device
Shreveport Times
www.spritpcs.com
www.mismicellular.com
Sprint PCS
www.nwlouisiana.com/times.html
6Agglomerative Iterative Clustering
Definition
Q
U
Clustering Algorithm
- merge qi,qj for which ?(qi,qj) is largest
- merge ui,uj for which ?(ui,uj) is largest
- repeat unless
Complexity
7Example
Bell Atlantic Mobile
members.tripod.com/acor/Humpback.htm
cellular phones
wireless device
www.bam.com
endangered humpback whales
members.tripod.com/perfect/phones.html
www.spritpcs.com
Sprint PCS
www.mismicellular.com
Shreveport Times
www.nwlouisiana.com/times.html
8Experiment
- Experiment Data
- Measure Approach
- Apply in real-world term suggestion on Lycos
- baseline approach (based on query sessions)
- full replacement
- hybrid
9Experiment (cont.)
Term suggestion interface
Result
10Conclusion
- Remarks
- Log is a fertile source for web data mining work
- The kind of log available on hand determine the
approach - Reference
- 1 Doug Beeferman and Adam Berger, Agglomerative
clustering of a search engine query log, KDD2000,
pp. 407-416, 2000. - 2 NPD Search and Portal Site Survey, Published
by NPD New Media Services, Reported in
www.searchenginewatch.com - 3 C. Silverstein,M. Henzinger, H. Maris and M.
Moricz, Analysis of a very large AltaVista query
log, DEC SRC Technical Note, 1998.