Title: Applying Control Theory to Stream Processing Systems
1Applying Control Theory to Stream Processing
Systems
- Wei Xu (xuw_at_cs.berkeley.edu)
- Bill Kramer (kramer_at_lbl.gov)
- Peter Bodik (bodikp_at_cs.berkeley.edu)
2Just for fun
Jan. 7, 2005
VOD system on an AirBus 330 We need 20 minutes
to reboot the system
3Outline
- System log as data streams
- Applying control theory
- Accurate data source
- Controlling queue length
- Lessons learned
4Introduction
- Goal of our project
- A flexible and scalable architecture for system
log processing - Explore general techniques of applying control
theory to systems - Problem data rate up to 1 TB a day
- data are very complex
5Example of system log data
- request data
- Apache log, etc
- performance data
- CPU, mem etc.
- failure data
- Detected problems /error messages
- reports from operators
6Preprocessing
- Sanitize the data
- Remove/encrypt sensitive information before the
data get into permanent storage - Sanitize in different levels
- Put logs into common format
- Merge information from various sources
- Sampling
7The big picture
8Early Experiences
- Ad-hoc Scripts
- Tedious
- Hard to change
- Relational databases
- Static schema
- Hard to support temporal queries
- Have to store all the data
9Stream processing ?
- system log data are data streams
- preprocessing is a continuous query
- Telegraph Continuous Query (TCQ)
- data stream processing engine
- SQL queries
- sliding time window
- adaptive execution optimized on-the-fly
- performance doesnt depend on queries
10Data preprocessing architecture
load splitter
combiner
SLT 1
SLT 2
11Outline
- System log as data streams
- Applying control theory
- Accurate data source
- Controlling queue length
- Lessons learned
12Why do we need control?
- Data source does not provide accurate data rate
13Control Problems
- Not accurate for various reasons
- Scheduling
- Time spent on I/O
- Etc.
- Providing an accurate data source using feedback
control - By controlling the input of desired rate
14The Control Architecture
1500
1900
1600
P Controller (with precompensation)
u(k)Kpe(k)
U(k)u(k-1)(KpKI)e(k)-Kpe(k-1)
15Result An accurate data source
P Controller with Pre-compensation
PI Controller
16Zoom In
A lot of small disturbance in a Java
program Incremental garbage collection
P Controller
PI Controller
17Outline
- System log as data streams
- Applying control theory
- Accurate data source
- Controlling queue length
- Lessons learned
18Problem performance disturbance
- Significant network traffic
- Memory Leak
- System Process Interference
- Packets dropped during transferring stream
- Other failures
Also, performance of a node depends on
SELECTIVITY of relational operator Depends on
input data
19Description of the system
TCQ Complex internal structure
Input Buffer
Controlled Data Source
20Why do we need control?
- TCQ node drops tuples when result queue fill up
Source
Buffer
TCQ
Result Q
21Control Problems
- Regulate queue length on TCQ node
- By controlling buffer output rate
- Prevent dropping tuples
- Maximize throughput
- Tolerate disturbance
22System with Control
23Controller
U(k)u(k-1)(KpKI)e(k)-Kpe(k-1)
24Result regulating queue length
Source
Buffer
TCQ
Result Q
25Result Under CPU Contention
Source
Buffer
TCQ
Result Q
26Outline
- System log as data streams
- Applying control theory
- Accurate data source
- Controlling queue length
- Lessons learned
27Why theory is useful?
- One of my implementations .. What happened?
Source
Buffer
TCQ
Result Q
28What is going on?
Controlled Output Thread(Code Reuse)
Queue Length Controller
Desired Queue length
Data Rate to TCQ
Actual Queue Length
29Theory meets reality
Queue length
Time
30Conclusion
- Advantages of feedback control
- Make system more robust under disturbance
- Allows more time for failure detection
- Treat complex systems as black boxes
- Cope with the system characteristics instead of
having to change it - Theoretical analysis
- Implementation is easy
- System statistics can also be used for SLT
31Future Work
- Load balancer
- Load control across multiple tiers
- Scheduling of multiple streams
32Backup Slides
33Tricky part of parameter estimation
Model evaluation Making the system operate in
desired range
Data rate vs free space
Free Space
Non-Linear range
Easy for data source, but queue length ..