Title: Scalable%20Trigger%20Processing
1Scalable Trigger Processing
- Discussion of publication by
- Eric N. Hanson et al
- Int Conf Data Engineering 1999
- CS561
2Motivation
- Triggers popular for
- Integrity constraint checking
- Alerting, logging, etc.
- Commercial database systems
- Limited triggering capabilities
- 1 trigger/update-type on table or at best 100.
- But Current technology doesnt scale well
- And, internet and web-based applications may need
millions of triggers.
3An Example Trigger
- Example stock ticker notification
- Stock holding 100IBM
- Query Inform an agent whenever the price of the
stock holding crosses 10,000 - Create Trigger stock-watch
- from quotes q
- on update(q.price)
- when q.nameIBM and 100q.price gt 10,000
- do raise event ThresholdCrossed(100q.price).
- Note We may need 1,000 or millions of such
triggers - Web interface may allow users to create such
triggers
4What Next?
- Problem description
- TriggerMan system architecture
- Predicate index
- Trigger processing
5Problem Definition
- Given Relational DB, Trigger statements, Data
Stream - Find Triggers corresponding to each stream item
- Objective Scalable trigger processing system
- Assumptions
- Number of distinct structures of trigger
expressions is relatively small - All trigger expression structures small enough to
fit in main memory
6The Problem, once more.
- Requires millions of triggers (on huge data).
- Steps for trigger processing
- Event monitoring
- Condition evaluation
- Executing triggered action
- Response time for database operations critical !
7Related Work
- Parallel Processing
- Gupt89,Hell98
Indexing
ECA Model (not scalable)
AI Forg82,Mira87 (smaller rule set)
Range Predicates, Marking-Based Hans96b,
Ston90 (large memory, complicated storage)
8Overall Driving Idea
- If large number of triggers are created, then
many have the same format. - Triggers share same expression signature except
that parameters substituted. - Group predicates from trigger conditions based on
expression signatures into equivalence classes - Store them in efficient main memory data
structures
9TriggerMan System
10Components
- TriggerMan Datablade (lives inside Informix)
- Data Sources
- Local/remote tables/streams must capture
updates and transmit to TriggerMan (place in a
queue) - TriggerMan Client applications
- Create /drop triggers, etc.
- TriggerMan Driver
- Periodically involve TmanTest() fn to perform
condition testing and action execution. - TriggerMan console
- Direct user interaction interface for trigger
creation, system shutdown, etc.
11TriggerMan Syntax
- Trigger syntax
- create trigger lttriggerNamegt in setName
- optionalFlags
- from fromList
- on eventSpec
- when condition
- group by attributeList
- having groupCondition
- do action
12Example Salary Increases
- Update Freds salary when Bobs salary is updated
- create trigger updateFred
- from emp
- on update (emp.salary)
- when emp.name Bob
- do execSQL update emp set salaryNEW.emp.salary
where emp.nameFred
13Example Real Estate Database
- If new house added which is in neighborhood that
salesperson Iris reprensents then notify her - House (hno,address,price,nno,spno)
- Salesperson (spno,name,phone)
- Represents (spno,nno)
- Neighborhood (nno,name,location)
- create trigger IrisHouseAlert
- on insert to house
- from salesperson s, house h, represents r
- when s.name Iris and s.spnor.spno and
r.nnoh.nno - do raise event NewHouseInIrisNeighborhood(h.hno,
h.address)
14Trigger Condition Structure
- Expression signature
- Expression signature consists of
- Data source ID
- Operation code, e.g. insert, delete, etc.
- Generalized Expression (parameterized)
FROM Data src emp ON Event update WHEN
boolean exp.
Emp.name
CONSTANT
15Condition structure (contd)
- Steps to obtain canonical representation of WHEN
clause - Translate expression to CNF
- Group each conjunct by data source they refer to
- Selection Predicate will be of form
- (C11 OR C12 OR ..) AND ... AND (Ck1 OR ),
- where each Cij refers to same tuple
variable. - Each conjunct refers to zero, one, or more data
sources - Group conjuncts by set of sources they refer to
- If one data source, then selection predicate
- If two data sources, then JOIN predicate
16Triggers for stock ticker notification
- Create trigger T1 from stock
- when stock.ticker GOOG and stock.value lt
500 - do notify_person(P1)
- Create trigger T2 from stock
- when stock.ticker MSFT and stock.value lt
30 - do notify_person(P2)
- Create trigger T3 from stock
- when stock.ticker ORCL and stock.value lt
20 - do notify_person(P3)
- Create trigger T4 from stock
- when stock.ticker GOOG
- do notify_person(P4)
17Expression Signature
- Idea Common structures in condition of triggers
-
- Expression Signature
- E1 stock.ticker const1 and stock.value lt
const2 -
- Expression Signature
- E2 stock.ticker const3
- Expression signature defines equivalence class of
all instantiations of expression with different
constants
T1 stock.ticker GOOG and stock.value lt
500 T2 stock.ticker MSFT and stock.value lt
30 T3 stock.ticker ORCL and stock.value lt 20
T4 stock.ticker GOOG
18What to do now
- Only a few distinct expression signatures, build
data structures to represent them explicitly (in
memory) - Create constant tables that store all different
constants, and link them to their expression
signature
19Main Structures
- A-treat Network
- Network for trigger condition testing
- For a trigger to fire, all conditions must be
true - Expression Signature
- Common structure in a trigger
- E1 stock.ticker const1 and stock.value lt
const2 - Constant Tables
- Constants for each expression signature
20A-Treat Network to represent a trigger
- For each trigger condition
- stock.ticker const1 and stock.value lt const2
21Condition Testing
- A-Treat network is a discrimination network for
trigger condition testing. - For a predicate to be satisfied, all its
conjuncts should be true. - This is checked using A-Treat network.
22A-Treat network (Hanson 1992)
- Define rule SalesClerk
- If emp.salgt30,000
- And emp.dnodept.dno
- And dept.namesales
- And emp.jnojob.jno
- And job.titleclerk
- Then Action
23Expression Signature Table
Ex. ID Data Source Signature Description Constant Table Number of Constants Constant Organization
E1 stock const_e1 2 Main Memory
E2 stock const_e2 1 Main memory
E1 stock.ticker const1 and stock.value lt
const2 E2 stock.ticker const3
24Constant Tables
- Tables of constants in trigger conditions
Const_e1
Ex. ID Trigger ID Constant 1 Constant 2 Next Node Rest
E1 T1 GOOG 500 Node 2
E1 T2 MSFT 30 Node 2
E1 T3 ORCL 20 Node 2
Const_e2
Ex. ID Trigger ID Constant 1 Next Node Rest
E2 T4 GOOG Null
T1 stock.ticker GOOG and stock.value lt
500 T2 stock.ticker MSFT and stock.value lt
30 T3 stock.ticker ORCL and stock.value lt 20
T4 stock.ticker GOOG
25Tables
- Primary tables
- trigger_set (tsID, name, comments, creation_date,
isEnabled) - Trigger (triggerID, tsID, name, comments,
trigger_text, creation_date, isEnabled, ) - Trigger cache in main memory for recently
accessed triggers.
26Predicate Index
- Tables
- expression_signature(sigID, dataSrcID,
signatureDesc, constTableName, constantSetSize,
constantSetOrganization) - const_tableN(exprID, triggerID, nextNetworkNode,
const1, constK, restOfPredicate) - Root of predicate index linked to data source
predicate indices - Each data source contains an expression signature
list - Each expression signature links to its constant
table. - Index expressions on most selective conjunct
(rest on fly).
27Predicate Index
hash(src-ID)
Goal Given an update, identify all predicates
that match it.
28Processing Trigger Definition
- Parse the trigger and validate it
- Convert the when clause to conjunctive normal
form - Group the conjuncts by the distinct sets of tuple
variables they refer to - Form a trigger condition graph, that is,
undirected graph with node for each tuple
variable and edge for join predicates. - Build A-Treat network
29Processing trigger definition (2)
- For each selection predicate
- If predicate with same signature not seen before
- Add signature of predicate to list
- And, add signature to expression_signature table
- If signature has a constant placeholder in it,
create a constant table for the signature. - Add constants
- Else if predicate has constants, add a row to the
constant table for the expression
30Alternate Organizations
- Storage for the expression signatures
equivalence class - Main memory lists
- Main memory index
- Non-indexed database table
- Indexed database table
- For each expression signature, choose a
structure depending on number of triggers.
efficiency
Scalability
31Processing update descriptors
- On getting an update descriptor (token)
- (data src ID, operator code, old/new tuple)
- Locate data source predicate index from root of
predicate index. - For each expression signature, find constant
matching the token using index. - Check additional predicate clauses against the
token. - When all predicate clauses of a trigger have
matched, pin the trigger in main memory - Bring in A-treat network representing that
trigger to process aremaining part of trigger,
like join, etc. - If trigger condition is satisfied, execute action.
32Processing an Update
Update Stock (tickerGOOG, value495)
Root
Index of stock.tickerconst1
Other source Predicate index
E1
E2
E1 stock.ticker const1 and stock.value lt const2
Trigger ID Constant 1 Constant 2 Next Node
T1 GOOG 500 Node 2
T2 MSFT 30 Node 2
T3 ORCL 20 Node 2
const_e1
const_e2
const_e1
33Some optimizations
- For I 1 to N
- Create trigger Ti from R when R.a100 do
- List of trigger ids for R.a100
- Constant sets and trigger Ids for equality
conditions stored as - Lists
- Clustered constant index
- All entries of const1,..constk stored together
- Enables fast retrieval of triggers ids together
34Concurrency
- Better scalability even on single processor
35Opportunities of Concurrency
- Tasks can be
- Process a token to check matching rule
- Run a rule action
- Process a token against a set of conditions
- Process a token to run a set of rule actions
triggered by token.
36Concurrency
- Identified elements that can be parallelized
- Token-level
- Multiple tokens processed in parallel
- Condition-level
- Multiple selection conditions tested concurrently
- Rule-action-level
- Multiple rule actions fired at the same time
- Data-level
- Set of data values in the network processed in
parallel
37Concurrency
- N NUM_CPUS TMAN_CONCURRENCY_LEVEL
- N driver processes
- Each driver process calls TmanTest() after T time
- Balance switching overhead and avoid long
executions
38TmanTest()
- while(total execution time of this invocation of
TmanTest lt THRESHOLD and work is left in the task
queue) -
- Get a task from the task queue and execute it.
- Yield the processor so other Informix tasks can
use it -
- if task queue is empty
- return TASK_QUEUE_EMPTY
- return TASKS_REMAINING
39Concurrency
- For k1 to M
- Create trigger T_K
- from R
- when R.company "IBM"
- do raise event
- notify_user("userK", . )
- Partition trigger sets into N sets.
40Conclusion Overall Key Points
- If a large number of triggers are created, many
of them have almost the same format - Group triggers with same structure together into
expression signature equivalence classes - Number of distinct signatures is small enough to
fit into main memory (index) - Develop a selection predicate index structures
- Architecture to build a scalable trigger system.