Title: Novelty Detection in ATLAS
1Novelty Detection in ATLAS
- Topic Detection and Tracking (TDT)
- New Event Detection Task in TDT
- Existing Problem
- Topic-Conditioned Novelty Detection
- Future Work
- References
2TDT (Topic Detection and Tracking)
- Started from 1997
- Five tasks
- Topic Tracking
- Topic Detection
- New Event Detection (Novelty Detection, a.k.a.
First Story Detection) - Story Link Detection
- Segmentation
- Benchmark Evaluation
3Topic Tracking
4Topic Detection
5New Event Detection (a.k.a. First Story Detection)
6New Event Detection
- Characteristics (difficulties)
- Online learning (vs. retrospective learning)
- Unsupervised learning (vs supervised learning)
- Large number of targets (events)
- Harder to correctly predict the first one
7NED Approach I Cosine Similarity as Similarity
Metric
8NED Approach I Cosine Similarity Formula
- Where represents the ith document as a
vector - represents the term weight of the kth
term in the ith document
9NED Approach I Use Centroid to Represent Cluster
10NED Approach II More Information Learning
Machine
Score gt Threshold
Old
New
112002 NED Benchmark Evaluation Results
12Problems in Current Approaches
Story 1 airplane crash Korean 747 . ,
injured dead January 10, investigate
reason
Story 2 TWA-800 airplane crash
investigate dead December unknown people
13Events and Topics
- Event an action happening during a certain time
period and at a certain location. - Topic a recurring and broader class of events.
14Topic-Conditioned Novelty Detection
15Topic-Conditioned Novelty Detection
16Future Work
- Explore new similarity metrics
- Metrics are the most fundamental item in
clustering - Explore more learning machines
- Learning Machines can be any efficient online
regression/classification algorithm, like SVM,
logistic regression, etc. - Use clustering instead of classification at
topic-level - Try to reduce human efforts as far as we can
- Explore the role of Named Entities (NEs)
- Named Entities like persons name, location,
organization, date, etc., would be informative
for novelty detection, many of them can useful
features.
17References
- Topic-Conditioned Novelty Detection. Yiming Yang,
Jian Zhang, Jaime Carbonell and Chun Jin. SIGKDD
2002. - A Study on Retrospective and Online Event
Detection. Yiming Yang, Tom Piece and Jaime
Carbonell. SIGIR 98 - The 2001 topic detection and tracking task
definition and evaluation plan. In
http//www.nist.gov/speech/tests/tdt/tdt2001/evalp
lan.htm - Nists 1998 topic detection and tracking
evaluation. J. Fiscus, G. Doddington, J.
Garofolo, and A. Martin. In proceedings of the
DARPA Broadcast News Transcription and
Understanding Workshop. - New Event and Link Detection at CMU for TDT 2002.
J. Carbonell, Y. Yang, R. Brown, J. Zhang, and J.
Ma.