Title: Event Detection with Common User Interests
1Event Detection with Common User Interests
- Hu Meishan, Sun Aixin, Lim Ee-Peng
- School of Computer Engineering, Nanyang
Technological University - School of Information Systems, Singapore
Management University - WIDM08
2Outline
- Introduction
- Background events and user interests
- Motivation
- Event detection with common user interests
- Problem definition
- Proposed solution
- Query profile and its properties
- Online event detection
- Experimental evaluation
- Future work
- Conclusions
3Event detection the traditional approach
- Task
- Given a stream of news articles, group them
according to the events they describe. - Drawbacks
- Many detected events may not be interesting to
users - Events that are interested to users but not
heavily reported in news are not detected.
4User interests and events
- User-created content reflect their interests
- many bloggers discuss their interested topics in
their posts - many users search for documents about their
interested topics by submitting keyword queries. - When an event is happening, we often observe
- a large number of blog posts discussing the event
published - a surge in the number of event related queries
- Popular queries are often event-related
5An example of popular queries and the related
event
Mentions by Day number of posts mentioning
benazir bhutto per day in the past 30 days.
- Event
- Benazir Bhutto assassinated on 27 Dec 07
Source http//www.technorati.com/pop (top
searches captured on 28 Dec 07)
6Motivations
- User-interested events can be detected by
utilizing the user-created content, e.g.,
queries, blog posts, etc. - Challenges
- Not all popular queries are event related.
- Multiple queries might be related to the same
event. - Not all documents in the stream are worth
processing in the event detection.
7Outline
- Introduction
- Background events and user interests
- Motivation
- Event detection with common user interests
- Problem definition
- Proposed solution
- Query profile and its properties
- Online event detection
- Experimental evaluation
- Future work
- Conclusions
8A new framework on top of existing systems
- A 4-step approach
- Popular query identification
- Query profile construction
- Event-related profile identification
- Online event detection
9Temporal query profile and properties
10Characteristic of event-related query profiles
- If a query q issued at time t is related to some
event, there is likely a large number of
documents describing the event matching q
published within a short period before t. - A event-related query often demonstrated
- a large number of documents matching it at the
search time. - a short time-span among documents in the
constructed query profile.
11Characteristic of profiles related the same event
- If query q and q are related to the same event
at time t, there likely is a large number of
documents describing the event matching both q
and q published close to t. - Two query profiles can be grouped into the same
event if they are similar in content.
12Characteristic of profiles describing event
evolution
- If a query q is related to an event that lasts
for some time, the documents matching q for two
searches at time t1 and t2, both within the
period of the event, are likely to describe the
evolution of the event. - Not only similarity but also novelty between
query profiles determines whether a query profile
should be included into an event.
13Online incremental clustering illustrated
14Data and statistics
- 1 query stream
- the most popular 15 queries published by a blog
search engine requested every 3 hours from
2006-11-08 1AM to 2008-03-31 10PM - 2 independent document streams
- TR blog posts traced by Technorati,
http//www.technorati.com - GN news articles traced by Google News,
http//news.google.com
15Accuracy of the event detections
- True event If the event is recorded in a
Wikipedia article and the time of the recorded
event is within a short period of the detected
duration. - Segment event If the detected event is wrongly
split from a true event to which it should
belong. - Mixed event If the detected event contains
queries and documents from two or more events
recorded in Wikipedia. - Unknown event If we cannot locate an entry
recording the event in Wikipedia.
16Outline
- Introduction
- Background events and user interests
- Motivation
- Event detection with common user interests
- Problem definition
- Proposed solution
- Query profile and its properties
- Online event detection
- Experimental evalution
- Future work
- Conclusions
17Future work
- In the current framework, events are detected
from one query stream and one document stream,
however, it is possible to detect events from
multiple query streams and multiple document
streams. - E.g., to associate the query profile constructed
in blog data stream to that constructed in news
data stream. - A novel interface is in demand for browsing and
searching the detected events.
18Outline
- Introduction
- Background events and user interests
- Motivation
- Event detection with common user interests
- Problem definition
- Proposed solution
- Query profile and its properties
- Online event detection
- Experimental evaluation
- Future work
- Conclusions
19Conclusions
- Motivated by the close relation between
user-created content and real world events, we
defined the problem of detecting events of common
user interests. - To address the problem, we proposed
- a framework that extents traditional event
detection approach by seamlessly integrating the
stream of documents and the stream of popular
queries to form a stream of query profiles. - a notion of query-profile and its properties that
can facilitate the process of event detection. - We use real world data in experiments and
achieved high detection accuracy.
20 21Appendix statistic for profile filtering
22Appendix parameters used in the detection