Title: Tuning Database Configuration Parameters with iTuned
1Tuning Database Configuration Parameters with
iTuned
- Vamsidhar Thummala
- Collaborators Songyun Duan, Shivnath Babu
- Duke University
2Performance Tuning of Database Systems
- Physical design tuning
- Indexes SIGMOD98, VLDB04
- Materialized views SIGMOD00, VLDB04
- Partitioning SIGMOD82, SIGMOD88, SIGMOD89
- Statistics tuning ICDE00, ICDE07
- SQL Query tuning VLDB04
- Configuration parameter or Server parameter
tuning
This talk
3Database Configuration Parameters
- Parameters that control
- Memory distribution
- shared_buffers, work_mem
- I/O optimization
- wal_buffers, checkpoint_segments,
checkpoint_timeout, fsync - Parallelism
- max_connections
- Optimizers cost model
- effective_cache_size, random_page_cost
default_statistics_target, enable_indexscan
4Need for Automated Configuration Parameter Tuning
(1/2)
5Need for Automated Configuration Parameter Tuning
(2/2)
- Number of threads related to configuration
parameter tuning vs. other under PostgreSQL
performance mailing list
- Recently, there has been some effort from
community to summarize the important parameters
PgCon08
6Typical Approach Trial and Error
7Doing Experiments to Understand the Underlying
Response Surface
- TPC-H 4 GB database, 1 GB memory, Query 18
8Challenges
- Large number of configuration parameters
- Total 100
- 10-15 are important depending on OLTP vs. OLAP
- Brute-Force will not work
- Results in exponential number of experiments
- Parameters can have complex interactions
- Sometimes non-monotonic and counterintuitive
- Limits the one-parameter-at-a-time approach
- No holistic configuration tuning tools
- Existing techniques focus on specific memory
related parameters or recommend default settings
9Our Solution iTuned
- Practical tool that uses planned experiments to
tune configuration parameters - An adaptive sampling algorithm to plan the
sequence of experiments (Planner) - A novel workbench for conducting experiments in
enterprise systems (Executor) - Features for scalability like sensitivity
analysis and use of parallel experiments to
reduce - Total number of experiments
- Per experiment running time/cost
10Outline of the talk
- iTuned Planner
- iTuned Executor
- Evaluation
- Conclusion
11Problem Abstraction
- Given
- A database D and workload W
- Configuration Parameter Vector X ltx1, x2, , xm
gt - Cost Budget R
- Goal Find high performance setting X subject to
the budget constraint - Problem Response surface y ?(X) is unknown
- Solution Approach
- Conduct experiments to learn about the response
surface - Each experiment has some cost and gives sample
ltXi, yi gt
12iTuned Planner
- Uses an adaptive sampling algorithm
Stopping Criteria Based on cost budget, R
13Improvement of an Experiment
- Improvement IP(X) is defined as
- y(X) y(X) if y(X) lt y(X)
- 0 otherwise
- Issue IP(X) is known only after y(X) is known,
i.e., an experiment has to be conducted at X to
measure y(X) - We estimate IP(X) by calculating the Expected
Improvement, EIP(X) - To calculate EIP(X), we need to approximate
Improvement at each configuration setting
Probability density function of (Uncertainty
estimate)
14Conducting Experiment at XNEXT using Expected
Improvement
Projection on 1D
15Generating pdf through Gaussian Process
- We estimate the performance as
- Where is a regression model,
is the residual of the model, captured through
Gaussian Process - Gaussian Process, captures the uncertainty
of the surface - is specified by mean and covariance
functions - We use zero-mean Gaussian process
- Covariance is a kernel function that inversely
depends on the distance between two samples Xi
and Xj - Residuals at nearby points exhibit higher
correlation
16Calculating Expected Improvement using Gaussian
Process
- Lemma Gaussian Process models as a
uni-variate Gaussian with mean and variance
as - Theorem There exists a closed form for EIP(X)
- See paper for proof and details
17Tradeoff between Exploration vs. Exploitation
- Settings X with high EIP are either
- Close to known good settings
- Assists in exploitation
- In highly uncertain regions
- Assists in exploration
EIP(X)
Gaussian Process tries to achieve the balance
between exploration vs. exploitation
18Outline of the talk
- iTuned Planner
- iTuned Executor
- Evaluation
- Conclusion
19Goal of the Executor
- To conduct experiments
- Without impacting production system
- As close to real production runs as possible
- Traditional choices
- Production system itself
- May impact running applications
- Test system
- Hard to replicate exact production settings
- Manual set-up
20iTuned Executor
- Exploits the underutilized resources to conduct
experiments - Production systems, Stand-by systems, Test
systems, On the cloud - Design
- Mechanisms Home garage containers, efficient
snapshots of data - Policies Specified by admins
- If CPU, memory, disk utilization is below 20 for
the past 10 minutes, then 70 resources can be
taken for experiments
21Example Mechanism set-up on Stand-by System using
ZFS, Solaris, and PITR
Standby Environment
Production Environment
Standby Machine
Middle Tier
Write Ahead Log shipping
Copy on Write
Database
Policy Manager
Interface
Experiment Planner Scheduler
Engine
22Outline of the talk
- iTuned Planner
- iTuned Executor
- Evaluation
- Conclusion
23Empirical Evaluation (1)
- Two database systems, PostgreSQL v8.2 and MySQL
v5.0 - Cluster of machines with 2GHz processor and 3GB
RAM - Mixture of workloads
- OLAP Mixes of TPC-H queries
- Varying queries, query_types, and MPL
- Varying scale factors (SF 1 to SF 10)
- OLTP TPC-W and RuBIS
- Number of parameters varied up to 30
24Empirical Evaluation (2)
- Techniques compared
- Default parameter settings shipped (D)
- Manual rule-based tuning (M)
- Smart Hill Climbing (S)
- State-of-the-art technique
- Brute-Force search (B)
- Run many experiments to find approximation to
optimal setting - iTuned (I)
- Evaluation metrics
- Quality workload running time after tuning
- Efficiency time needed for tuning
25Comparison of Tuning Quality
Simple Workload with one TPC-H Query (Q1)
Complex Workload with mix of TPC-H Queries
(Q1Q18)
26iTuneds Efficiency and Scalability
- Run experiments in parallel
- Abort low-utility experiments early
27iTuneds Sensitivity Analysis
- Identify important parameters quickly
- Use Sensitivity Analysis to reduce experiments
28Related work
- Parameter tuning
- Focus on specific classes of parameters (mainly
memory related buffer pools) ACM TOS08,
VLDB06 - Statistical Approach for Ranking Parameters
SMDB08 - Brute force approach to experiment design
- Tools like DB2 Configuration advisor and pg_tune
recommend default settings - Adaptive approaches to sampling SIGMETRICS06
- Work related to iTuneds executor
- Oracle SQL Performance Analyzer SIGMOD09,
ICDE09 - Virtualization, snapshots, suspend-resume
29Conclusion
- iTuned automates the tuning process by adaptively
conducting experiments - Our initial results are promising
- Future work
- Apply database-specific knowledge to keep
optimizer in loop for end-to-end tuning - Query plan information
- Workload compression
- Experiments in cloud
30Questions?