Title: Applying Control Theory to Data Stream Processing Systems
1Applying Control Theory to Data Stream Processing
Systems Wei Xu (xuw_at_cs.berkeley.edu) Bill
Kramer Peter Bodik
Problem TCQ drops tuples when result queue is
full Goal of control By controlling data rate
to TCQ node Regulate queue length on TCQ
node Prevent dropping tuples Maximize
throughput (and adapts when disturbance happens)
- Preprocessing Data
- Logs are in different format
- Information we need may be implicit
- Merge information from various sources
- Sampling
- Sanitize the data
- Data stream processing
- Continuous queries
- Using Telegraph CQ
- Preprocessing expressed as SQL queries
- Queries over a sliding time window
- Run multiple instances for scalability
Feature Selection Clustering Visualization
Problem Actual output is not the same as desired
rate for various reasons Goal Providing an
accurate data source using feedback control by
controlling the desired data rate setting on
the output thread
See Poster Clustering DNS Problems
load splitter
combiner
SLT 1
SLT 2
Scalable Software Architecture for Data Stream
Processing
If not careful with feedback control System
can become unstable under normal load Control
theory analysis help make correct design