Title: NiagaraCQ
 1NiagaraCQ
- A Scalable Continuous Query System for Internet 
Databases 
  2Outline
- Problem 
 - NiagaraCQ 
 - Selection Placement Strategies 
 - Dynamic Regrouping Algorithm
 
  3Problem
Lack of a scalable and efficient system which 
supports persistent queries, that allow users to 
receive new results when they become 
available Notify me whenever the price of Dell 
stock drops by more than 5 and the price of 
Intel stock remains unchanged over next three 
months. 
 4NiagaraCQ
- Support continues queries 
 -  Change-based queries 
 -  Timer-based queries 
 - Scalability 
 - Performance 
 - Adequate to the Internet 
 - User Interface - high level query language
 
  5Command Language
- Create continuous query 
 -  CREATE CQ_name 
 -  XML-QL query 
 -  DO action 
 -  START start_time EVERY time_interval 
 -  EXPIRE expiration_time 
 - Delete continuous query 
 - DELETE CQ_name
 
  6Expression Signature
Represent the same syntax structure, but possibly 
different constant values, in different 
queries. Where ltQuotesgt ltQuotegt ltSymbolgtINTClt/gt
 lt/gt lt/gt element_as g in http//www.cs.wisc.edu
/db/quotes.xml construct g Where ltQuotesgt 
ltQuotegt ltSymbolgtMSFTlt/gt lt/gt lt/gt element_as 
g in http//www.cs.wisc.edu/db/quotes.xml const
ruct g 
 7Expression Signature (2)
  Quotes.Quote.Symbol constant in 
quotes.xml 
 8Query Plan
Trigger Action I
Trigger Action J
Select SymbolINTC
Select SymbolMSFT
File Scan
File Scan
quotes.xml
quotes.xml 
 9Group Signature
Common expression signature of all queries in the 
group
  Quotes.Quote.Symbol constant in 
quotes.xml 
 10Group Constant Table
Constant_value Destination_buffer 
INTC Dest . I
MSFT Dest . J  
 11Group Plan
..
Trigger Action I
Trigger Action J
Split
Join
Symbol  Constant_value
File
File Scan
Constant Table
quotes.xml 
 12Incremental Grouping Algorithm
- Group optimizer traverses the query plan bottom 
up.  - Matches the querys expression signature with the 
signatures of existing groups. 
Trigger Action
Select SymbolAOL
File Scan
quotes.xml 
 13Incremental Grouping Algorithm (2)
- Group optimizer breaks the query plan into two 
parts.  -  Lower  removed 
 -  Upper  added onto the group plan. 
 - Adds the constant to the constant table.
 
Trigger Action
Select SymbolAOL
File Scan
quotes.xml 
 14Pipeline Approach
- Tuples are pipelined from the output of one 
operator into the input of the next operator.  - Disadvantages 
 -  Doesnt work for grouping timer-based queries. 
 -  Split operator may become a bottleneck. 
 -  Not all parts should be executed.
 
  15Intermediate Files 
 16Intermediate Files (2)
- Advantages 
 - Intermediate files and data sources are monitored 
uniformly.  - Each query is scheduled independently. 
 - The potential bottleneck problem of the pipelined 
approach is avoided.  - Disadvantages 
 -  Extra disk I/Os. 
 -  Split operator becomes a blocking operator.
 
  17Virtual Intermediate Files
Where ltQuotesgt ltQuotegt ltChange_ratiogtclt/gt lt/gt 
lt/gt element_as g in quotes.xml, 
cgt0.05 construct g Where ltQuotesgt 
ltQuotegt ltChange_ratiogtclt/gt lt/gt lt/gt element_as 
g in quotes.xml, cgt0.15 construct 
g gt Quotes.Quote.Change_Ratio 
 constant in quotes.xml
Overlap 
 18Virtual Intermediate Files (2)
- All outputs from split operator are stored in one 
real intermediate file.  - This file has index on the range attribute. 
 - Virtual intermediate files store a value range. 
 - Modification of virtual intermediate files can 
trigger upper-level queries.  - The value range is used to retrieve data from the 
real intermediate file. 
  19Event Detection
- Types of Events 
 - Data-source change 
 - Timer 
 - Types of data sources 
 - Push-based 
 - Pull-based
 
  20Timer-based
- Timer events are stored in an event list, sorted 
in time order.  - Each entry stores query ids. 
 - Query will be fired if its data source has been 
modified since its last firing time.  - After a timer event, the next firing times are 
calculated and the queries are added into the 
corresponding entries. 
  21Incremental Evaluation
- Queries are been invoked only on changed data. 
 - For each file, NiagaraCQ keeps a delta file. 
 - Queries are run over delta files. 
 - Incremental evaluation of join operators requires 
complete data files.  - Time stamp is added to each tuple in order to 
support timer-based. 
  22Memory Caching
- Query plans - using LRU policy that favors 
frequently fired queries.  - Data files - favors the delta files. 
 - Event list  only a time window 
 
  23System Architecture 
 24Continues Queries Processing
CQM adds continuous queries with file and timer 
information to enable ED to monitor the events
If file changes and timer events are satisfied, 
ED provides CQM with a list of firing CQs
1
CQM invokes QE to execute firing CQs
Continuous Query Manager (CQM)
ED asks DM to monitor changes to files
Event Detector (ED)
5
2
, 3
6
4
7
DM informs ED of changes to pushed-based data 
sources
Query Engine (QE)
Data Manager (DM)
8
When a timer event happens, ED asks DM the last 
modified time of files
File scan operator calls DM to retrieve selected 
documents
DM only returns changes between last fire time 
and current fire time 
 25Selection Placement Strategies
Where ltQuotesgtltQuotegtltSymbolgtslt/gt 
ltPricegtplt/gtlt/gt element_as g lt/gt in 
quotes.xml, p gt 90 ltCompaniesgtltCompanygtltSymbolgt
slt/gtlt/gt element_as tlt/gt in profiles.xml 
construct g, t Where ltQuotesgtltQuotegtltSymbolgts
lt/gt ltPricegtplt/gtlt/gt element_as g lt/gt in 
quotes.xml, p gt 100 ltCompaniesgtltCompanygtltSymbol
gtslt/gtlt/gt element_as tlt/gt in profiles.xml 
construct g, t 
 26Expressions Signatures
gt Quotes.Quote.Price constant in 
quotes.xml SymbolSymbol quotes.xml 
profiles.xml 
 27Where to place the selection operator ?
- Below the join - PushDown 
 - (s1R S) U (s2R S) U  U (snR S) 
 - Above the join  PullUp 
 -  s1(R S) U s2(R S) U  U sn(R S) 
 - PullUp achieves an average 10-fold performance 
improvement over PushDown. 
  28PushDown - Query Plan
Join
Select Pricegt90
profiles.xml
quotes.xml 
 29PushDown - Groups Plans 
 30PullUp - Groups Plans 
 31PullUp Vs. PushDown 
- Only one join group and one selection group 
 - Maintains a single intermediate file 
 - Irrelevant tuples being joined 
 - Very large intermediate file 
 - Changes in profiles.xml affect the intermediate 
file (file_k)  maintenance overhead. 
  32Filtered PullUp
quotes.xml
Grouped Join Plan
Join
Selection Pricegt90
profiles.xml
quotes.xml 
 33Filtered PullUp Vs. PullUp 
- Relevant tuples being joined 
 - Reduce the size of intermediate file 
 - Reduce the cost of PullUp by 75 
 - Complexity  the selection predicate may need to 
be dynamically modified (query with pricegt70) 
  34Dynamic Re-grouping
- Let Q1 (A B C) and Q2 (B C) be two 
continuous queries submitted sequentially.  - Incremental grouping algorithm chooses a plan ((A 
 B) C).  - Neither of these groups can be used for Q2. 
 
ABC
ABC
BC
AB
BC 
 35Dynamic Re-grouping (2)
- Existing queries are not regrouped with new 
grouping opportunities introduced by subsequent 
queries.  - Reduction in the overall performance - queries 
are continuously being added and removed.  - Naive regrouping-algorithm  periodically perform 
a global query optimization  - Expensive 
 - Redundant work (already done by incremental opt.) 
 
  36Data Structures
- A query graph  directed acyclic graph, with each 
node representing an existing join expression in 
the group plan.  - Node  
 - char query //ASCII query plan 
 - SIG_TYPE sig //signature of the query string 
 - int final_node_count //number of users that 
require this query.  -  //0 non-final node gt0 final node 
 - listltChildgt children //children of this node, 
where ChildNode, weight  - listltNodegt parents //parents of this node 
 - float updateFreq //update frequency of this 
node  - float cost //the cost for computing this node 
 - //Following data structures used only for dynamic 
regrouping  - int reference_count //reference count 
 - bool visited //a flag that records whether 
 -  //purgeSibling has performed on this node 
 
  37Data Structures (2)
- A group table  array of hash tables. 
 -  i-th hash table - queries with query length 
(number of joins) i.  -  Hash table entry - mapping from a query string 
to the corresponding node in the graph.  
Array
Hash
Node 
 38Data Structures (3)
- A query log  array of vectors. 
 -  Stores new nodes that have been added since the 
last regrouping.  -  Cleared after regrouping.
 
Array
Vector
Node 
 39Incremental Grouping Algorithm
- Top-down local exhaustive search 
 - If the query exists, increases the final node 
count by 1.  - Else 
 - Enumerates all possible sub-query in a top-down 
manner and probes the group table to check 
whether a sub-query node exists.  - Computes the minimal cost of using existing 
sub-query nodes.  - Computes the minimal cost without using existing 
sub-query nodes.  - The least-costly plan will be chosen.
 
  40Dynamic Regrouping Algorithm
- Phase 1  constructing links among existing nodes 
and new nodes.  - Phase 2  find minimal-weighted solution from the 
current solution by removing redundant nodes. 
ABC
BC
AB 
 41Phase 1 constructing links among existing nodes 
and new nodes
- Main idea - for any pair of nodes in the graph, 
if one node is a sub-query of another node, it 
creates a link between them if it did not exist 
before.  - Relationships are only evaluated between existing 
nodes and nodes added since last regrouping.  - The difference of levels between a parent and a 
child is always 1. 
  42Phase 1 - Algorithm
- bottom-up 
 -  for each node in level i query log 
 -  if node has parents in level i1 group table 
 -  connect node to parent 
 -  if node has children in level i-1 group table 
 -  connect node to children 
 
  43Phase 2 A greedy algorithm for level-wise graph 
minimization
- Main idea  traverse the query graph 
level-by-level and attempt to remove any 
redundant nodes at one level a time.  - Starts from the second level from the top. 
 - Subset of level i nodes retain if 
 - Nodes at level i1 have at least one child in 
this set.  - These nodes have a minimum total cost. 
 - Nodes that are not selected are removed 
permanently. 
  44Phase 2 - Algorithm
MinimizeGraph()  for each level L in 
group-table  // L ranging from the maximum 
number of join-1 to 1 for each node N in 
the level-L group table 
InitializeSet(N) for each node 
N in finalSet PurgeSiblings(N) 
 while (remain set is not empty)  
 scan each node R in the remain set  
 if (Rs reference count  0)  
 remove R from the remain set 
 deleteNode(R) 
  else if (R.cost/R.reference_
count lt 
 Current_minimum)  
 MR 
Current_minimum 
 R.cost/R.reference_count 
   //scan  
 remove M from the remain set 
PurgeSiblings(M)  //while  
//for each level   //MinimizeGraph
InitializeSet(Node N)  if N is a final 
node Add N into final_set else  
 add N into the remain_set 
N.reference_count  
 number of parents of N  
N.visited  false  purgeSiblings(Node N)  
 For each parent P of N  if 
(!P.visited)  Decrease the 
reference count of Ns siblings 
of same parent P by 1 
 P.visited  true    
 45Cost Analysis
- N  number of queries 
 - Number of nodes is proportional to the number of 
queries  CN  - Each query contains no more then 10 joins. 
 -  Each level contain about CN/10 nodes
 
  46Cost Analysis  Phase 1
- R or KR  regrouping frequencies 
 - In frequency R 
 - N/R  number of regrouping 
 - CR  number of nodes that will be joined with 
 existing nodes.  - mCR  number of nodes after m-1 regrouping. 
 - m(CR)2  number of comparisons for m-th 
 regrouping (ignoring a constant reduction). 
  47Cost Analysis  Phase 1 (2)
- Total number of comparisons, frequency R 
 -  (CR)22(CR)2N/R(CR)2  
 -  N(NR)C2/2  O(N2) 
 - Total number of comparisons, frequency KR 
 -  (CKR)2(N/(KR))(CKR)2  
 -  N(NKR)C2/2 
 - The ratio 
 -  N(NKR)C2/2/N(NR)C2/2  (NKR)/(NR)
 
  48Cost Analysis  Phase 2
- Worst case  each pass remove one node. 
 - Cost for a level 
 -  (CN/10) (CN/10-1) 1  
 -  CN(CN10)/200  O(N2) 
 - Purge siblings 
 -  (CN/10  CN/10)  (CN)2/100  O(N2) 
 - All 9 levels O(N2)
 
  49References
- NiagaraCQ A Scalable Continuous Query System for 
Internet Databases  -  http//www.cs.wisc.edu/niagara/papers/NiagaraCQ.p
df  - Design and Evaluation of Alternative Selection 
Placement Strategies in Optimizing Continuous 
Queries  -  http//www.cs.wisc.edu/niagara/papers/Icde02.pdf 
 -   
 - Dynamic Re-grouping of Continuous Queries 
 -  http//www.cs.wisc.edu/niagara/papers/507.pdf