Title: Index Processing for Complex Events Detection
1Index Processing for Complex Events Detection
- By Zhang Yelei
- Supervisor Feng Ling,
- Pavel Serdyukov
2Outline
- Introduction
- System Setup
- Complex Event Detection
- Summary
3Outline
- Introduction
- System Setup
- Complex Event Detection
- Summary
4Introduction (1)
- Complex Event Processing
- Extensively researched to detect situation
changes (events) in a timely manner - Relies on event specifications
- Detects complex events based on primitive events
- Related area data stream processing
5Introduction (2)
- Complex Event Processing
- A popular model (used in some active database
systems like Snoop, ODE, and software like Amit,
esper, etc.)
E
E a, b, c
c
Check if a, b exist
6Introduction (3)
- Complex Event Processing
- Drawbacks of the model
- Start event detection from the end, not proactive
- Event history searching is frequently conducted
- Ignores the uncertainty of events
- How to deal with monitors?
- Load monitors for all complex event expressions
into memory unnecessary, sometimes unrealistic - Search event expressions when a new primitive
event instance comes in inefficient
7Outline
- Introduction
- System Setup
- Complex Event Detection
- Summary
8System Setup (1)
- Conditions/Requirements
- Complex event expressions and primitive event
definitions are stored in database - Event detection should be proactive
- Event detection should be continuous
- Report complex events based on their possibility
and importance - Each new primitive event instance adds the
probability of current or forthcoming occurrence
of a complex event. - Low possibility of a specific (e.g. malicious)
activity can be of a high interest to us.
9System Setup (2)
- Data Source -- RFID System
- Includes RFID readers and tags
- The reader continuously sends out tag information
if it detects a tag nearby - Currently, RFID system has been applied to
logistic transportation systems, and health
care systems
10System Setup (3)
0.1
0.15
0.05
Ken is drinking tea, 0.3
11System Setup (4)
- Top-K algorithm as a solution
- Define a time window
- Build inverted index lists
- Thus, a time window containing primitive event
instances is similar to a query containing
query terms in web IR.
12System Setup (5)
- Key elements for top-k algorithm
- Aggregation function must be monotone. For
example, MIN, MAX, SUM - Minimize database access costs by
- Utilize sequential access efficiently
- Stop database accesses as soon as possible
13System Setup (6)
- System Architecture
- Event database is populated using google count
probability
14Outline
- Introduction
- System Setup
- Complex Event Detection
- Summary
15Complex Event Detection (1)
- Original Threshold Algorithm
- Calculate threshold for top k results
- Calculate possible best score for the following
results - Compare these 2 values if the best score is less
than the threshold value, then stop searching - Generate the candidate list
16Complex Event Detection (2)
- Challenges
- Multi-dimensional top-k processing
- Produce only the required data for event
processing dynamically and efficiently - Research problem
- How to minimize database access costs
17Complex Event Detection (3)
- Deal with challenges avoid unnecessary database
access (1) - Add another dimension of PEs that a CE
contains - The best score equals to
( mgtn ) or (
mltn ) - m the number of inverted index lists to be used
- t the number of discovered primitive events
- n the number of PEs that a CE contains
- HSj the possible highest score for an inverted
index list - Thus, the best score is smaller enough for better
pruning
18Complex Event Detection (4)
- Deal with challenges avoid unnecessary database
access (2) - Share inverted index list among different
instances of a same type
133005, May 5th, 2006
Cached Inverted Index Lists for PE type i
133008, May 5th, 2006
DB
133010, May 5th, 2006
19Complex Event Detection (5)
- Deal with challenges avoid unnecessary database
access (3) - Reuse the partial time window that is not
outdated.
Inverted index lists cach
Update outdated and new-coming links!
20Complex Event Detection (6)
- Deal with challenges produce the lists (1)
- Receiving data
DB
OK!
21Complex Event Detection (7)
- Deal with challenges produce the lists (2)
- Preparing data for top-k processing
Inverted index lists for importance
22Complex Event Detection (8)
- Deal with challenges multi-dimensionality (1)
Problem comes
0.8, 0.6
0.7, 0.66
Whats the order to fetch data from the cach and
database? Whats the fetch depth?
23Complex Event Detection (9)
- Deal with challenges multi-dimensionality (2)
- Maintain a matrix of indicators globally
- Stores possible highest score for each pair of
entity and stage count. - Fetch is conducted on the pair with highest score.
E1
E5
E4
E3
E2
2
3
4
5
Do the fetch operation until the highest score is
lower than the threshold value (Highest score
decreases, threshold increases)
6
24Complex Event Detection (10)
- Summarizing complex event processing
- Continuously build/maintain lists based on time
windows - Initialize the cach of inverted lists
- Use highest score matrix to decide which group of
inverted lists to search - Keep searching until the stop condition is
satisfied - During the search, inverted lists are cached in
memory gradually if necessary
25Outline
- Introduction
- System Setup
- Complex Event Detection
- Summary
26Summary (1)
- Present a new model for complex event detection
- Adapt threshold algorithm to event detection
- Minimize database costs by
- Only fetching inverted index lists and caching
them into memory when instances of new types
appears. - Reusing partial time window that is not outdated.
- Continuously comparing best score and threshold
value to stop database access as early as
possible.
27Summary (2)
- Whats been done
- Prototype system implemented in JDK 1.4.2, and
MySQL - Simple tests
- To do (evaluation part)
- Compare the method with the method without using
stage count, and the traditional method - Study the effects of different parameters on the
efficiency of this method. (window span, basic
window length, frequency of data stream, size of
data set, fetch depth on index lists) - Future work
- Incorporate primitive event duration to add more
prediction power to the system
28Questions