Title: Glenn Paulley
1Self-management featuresin SQL Anywhere server
- Glenn Paulley
- Director, Engineering
- http//iablog.sybase.com/paulley
2Sybase iAnywhere products
- SQL Anywhere
- Full-function, small-footprint relational DBMS
with support for triggers, stored procedures,
materialized views, intra-query parallelism, hot
failover, ANSI SQL 2003 support including OLAP
queries, multidatabase capability - Mobilink/SQL Remote
- Two-way data replication/synchronization
technologies for replicating data through
different mechanisms to support
occasionally-connected devices - Ultralite
- fingerprint database supports ad-hoc SQL on
very small devices - UltraliteJ
- 100 Java fingerprint database for Blackberry and
iPhone
3Design goals of SQL Anywhere
- Ease of administration
- Comprehensive yet comprehensible tools
- Good out-of-the-box performance
- Embeddability features ? self-tuning
- Many environments have no DBAs
- Cross-platform support
- 32- and 64-bit Windows (XP, Server, 2003, 2000,
9x), Windows CE/Pocket PC, Linux 32- and 64-bit,
HP-UX, AIX, Solaris (SPARC and Intel), Mac OS/X,
Compaq Tru-64 - Interoperability
4Why is self-management important?
- In a word complexity
- Application development is becoming more complex
new development paradigms such as ORM toolkits,
distributed computation with synchronization
amongst database replicas, and so on - Databases are now ubiquitous in IT because they
solve a variety of difficult problems - Yet most companies continue to own and manage a
variety of different DBMS products, which
increases administrative costs - Ubiquity brings scale, in several ways
- To keep TCO constant, one must improve the
productivity of each developer
5Embeddability
6Physical database design
7Physical database design
- Logical and physical database design tradeoffs
have a significant impact on performance - The fundamental problem is that the potential
benefits of design changes must be amortized over
the execution characteristics of a workload - Representative workloads are difficult to create
- Issues include data skew, correlation, mitigating
artificial contention that occurs as a result of
simulating the actual system - Most DBMS vendors now offer tools to assist in
index or materialized view creation (more on this
later) - However, not all implications are covered by any
of these tools
8Physical database design with SQL Anywhere
- A SQL Anywhere database is composed of up to 15
dbspaces - Each dbspace is an OS file
- One each used for temporary file, transaction log
- Others are for user data
- UNIX raw partitions are not supported
- Image copy is done via simple OS file copy
- No image copy utility is necessary
- Easy to deploy a database to a separate machine
or CE device - Files are interoperable on all supported
platforms without user intervention - on-the-fly data conversion done when necessary
9Physical database design
- By default, databases are created with
- Case-insensitive searching enabled
- Not blank-padded no difference in the storage of
CHAR vs. VARCHAR attributes on the disk - SQL Anywhere supports different character sets,
collations for international contexts - Includes support for NCHAR data type using
multi-byte UTF8 charset - Supported page sizes are 2K, 4K, 8K, 16K, 32K
- All dbspaces must use the same page size
- No hard limits on row length, number of BLOB
columns - BLOB values (up to 2GB each) stored in separate
page arena
10Physical database design
- Database files grow automatically as necessary to
accommodate any increase in data volume - Server can execute a user call-out procedure when
a disk-full panic occurs - Server offers a temporary file governor to ensure
that connections cannot be starved when they
require space for intermediate results - Indexes on primary, foreign keys created
automatically - Server will automatically detect redundant
(duplicate or subsumed) indexes - two or more logical indexes share the same
physical structure
11Physical database design
- Transaction log
- SQL Anywhere uses a logical logging and recovery
scheme at the row level (not physiological, or
page-based) - Transaction log can be translated directly into
INSERT, UPDATE, DELETE SQL statements - DBTRAN utility ships with the software
- Assists in recovery from catastrophic failures
- Provides the backbone for two-way synchronization
with Mobilink, SQL Remote
12Self-management features
13Memory management
- SQL Anywhere uses a single heterogeneous buffer
pool with few predefined limits - Buffer pool can automatically grow and shrink
depending on database workload and OS
requirements for other applications - Buffer pool comprises
- Table data pages
- Index pages
- Checkpoint log pages
- Bitmap pages
- Heap pages (data structures for connections,
stored procedures, triggers) - Free (available) pages
- All page frames are the same size
14Page replacement scheme
- Buffer pool has no preset bounds for sort space,
heap space, table pages - Buffer pools page replacement scheme must be
able to manage various categories of pages and
requests on a demand basis - Workloads change over time periods
- Static configurations must tradeoff service
levels for administrative effort - One basic requirement avoid polluting the cache
with the results of pages from sequential scans - Identify pages that are frequently used
15Page replacement scheme
- Pages flow through the buffer pool over time
- Window (the size of the cache) is stratified into
eight segments based on time of reference - A pages score is incremented as it moves from
segment to segment - Need to differentiate reference locality (ie
multiple references to a single page based on row
fetch) from other forms of reuse - Replacement based on individual page score
- A modified Clock algorithm goes through each
page in turn if a pages score is less than a
threshold value, page is reused - If page is retained, score is decayed
- LRU-k and 2Q algorithms are frequently referenced
in the literaturehowever we have found their
overheads to be too high
16Task scheduling model
- Basic idea reduce server resource requirements
by establishing a limited pool of worker
threads to execute requests - Reduces address space, virtual and real memory
requirements - Introduces possibility of thread deadlock
- Task implementation differs depending on the
platform fibers are used on Windows and Linux
platforms - Scheduler assigns work across a pool of workers
- Worker thread pool typically small, default is 20
- Size of pool establishes the multiprogramming
level - Workers are assigned dynamically to connections
on demand
17Task scheduling model
- Unassigned requests wait for an available thread
- Server uses cooperative multitasking to switch
between requests - Server supports intra-query parallelism
- With queries, optimizer determines degree of
parallelism at optimization time
18Task scheduling model
- Waiting requests are candidates for swapping
- Try to maintain constant memory footprint by
using a virtual memory technique - Heaps for user connections can be swapped to
disk if server load demands it - Originally implemented in the first SQL Anywhere
release in 1989 - Pointers within the heap are swizzled to permit
reloading of heaps on a demand basis - Diagnostic connections can use a special,
reserved thread to execute its requests - Enables diagnosis of a hung engine
19Dynamic memory management
- A SQL Anywhere server will grow and shrink the
buffer pool as necessary to accommodate both - Database server load
- Physical memory requirements of other
applications - Enabled by default on all supported platforms
- User can set lower, upper bounds, default initial
size - OS provides API to determine memory size
(requires virtual memory support) - Implementation varies slightly by platform
20Dynamic memory management
- Basic idea match buffer pool size to operating
systems working set size - Feedback control loop
21Memory allocation to server threads
- Query optimizer assumes the availability of an
amount of available memory for each computation - Some execution operators (notably hash-based
operators) have the ability to free memory
without changing the computation (just making it
less efficient) - Release of memory is done partition-at-a-time so
effect is not abrupt - Other operators can be constructed with
alternative plans that offer a cheaper execution
technique if either - the optimizers choices are found to be
suboptimal at run-time, or - the operator requires a low-memory strategy at
the behest of the memory governor
22Other self-management features
- On startup, buffer pool is automatically loaded
with those database pages that were first
requested on the last startup - SQL Anywhere supports events
- fired on a schedule, or when specific server
events occur, such as - Idle time greater than a threshold
- Backup
- Connect/disconnect
- File size
- Disk space
23Self-tuning query optimization
- and adaptive query execution strategies
24Automatic statistics management
- A feature of SQL Anywhere since 1992
- Early implementations used a hash-based structure
to manage column density and frequent-value
statistics - See Lynch, 14th VLDB (1988) for a similar
approach - Today
- Self-tuning column histograms
- On both base and temporary tables
- Statistics are updated on-the-fly automatically
- Join histograms built for intermediate result
analysis - Server maintains index statistics in real-time
- Index sampling during optimization
25Column histograms
- Self-tuning implementation
- Incorporates both standard range buckets and
frequent-value statistics - Updated in real-time with the results of
predicate evaluation and update DML statements - By default, statistics are computed during the
execution of every DML request - Novel technique used to capture statistics on
strings for use in optimizing LIKE predicates - Histograms computed automatically on LOAD TABLE
or CREATE INDEX statements - Can be created/dropped explicitly if necessary
- But retained by default across unload/reload
26Query optimizer
- SQL Anywhere optimizes requests each time they
are executed - Optimizer takes into account server context
- Working set
- Available cache
- Values of procedure, host variables
- Assumption optimization is cheap
- SQL Anywhere uses a proprietary, cost-based join
enumeration algorithm that primarily constructs
left-deep trees - Optimization process includes both heuristic and
cost-based complex rewritings - Advantages plans are responsive to server
environment, buffer pool contents/size, data
skew no need to administer packages
27Join enumeration
- Optimization space is over left-deep trees
- Optimization process includes both heuristic and
cost-based complex rewritings - Exceptions for complex, right-deep nested LEFT
OUTER JOINS - Space pruning based on a governor that
allocates quota to different portions of the
search space - Join method, degree of parallelism, physical scan
method (index selection), and expensive predicate
evaluation are also part of search space - Each choice is re-evaluated at each step
- Superb optimization times even for complex
queries - No hard limits tested with
- 500 quantifiers in a single block
- 100-way join on a CE device with 3MB buffer pool
28Bypassing the query optimizer
- Simple, single-table queries are optimized
heuristically - Access plans for queries in stored
procedures/triggers/events are cached - Plans undergo a training period where plan
variance is determined - If no variance (even without variable values),
plan is cached and reused - Query is periodically re-optimized on a
logarithmic scale to ensure plan does not become
sub-optimal
29Adaptive query processing
- In some cases the optimizer will generate
alternative access plans that can be executed if
actual intermediate result sizes are poorly
estimated - Server switches to alternative plan automatically
at run time - Memory-intensive operators, such as hash join,
have low-memory strategies that are used when
buffer pool utilization is high
30Adaptive query processing
- Some operations, such as database backup, contain
sampling processes to determine the
characteristics of the I/O device used for
storage - Primary goal is to determine the number of disk
heads available - Processes can utilize the right number of CPUs
to maximize throughput - Algorithms are sensitive to CPU requirements of
the rest of the system, and automatically scale
down CPU usage as necessary
31Intra-query parallelism
- SQL Anywheres approach is to parallelize a
left-deep plan when doing so is advantageous - There is an upper bound to the number of EXCHANGE
operators in a data-flow tree can be set by the
user to override - Work is partitioned independently of the
availability of worker threads at run-time - Plans are largely self-tuning with respect to
degree of parallelism - Prevents starvation of any specific subplan when
the number of available workers is less than
optimal for some period
32Management tools
33Management tools
- Framework
- DBISQL interactive SQL query tool
- Sybase Central administration/development tool
- Tools
- Graphical query plans
- Stored procedure debugger
- Stored procedure profiler
- Index consultant
- Request-level logging
- Optimization logging
34Graphical query plans
35Stored procedure debugger
- Enables debugging of a PSM or Java stored
procedure - Features
- set/unset breakpoints and watch settings
- determine value of variables
- traverse the call stack
- execute queries with the values of variables at
the time of the break
36Application Profiling scenarios
37Application Profiling architecture
38Stored procedure profiler
- Offers the ability to analyze the performance of
a stored procedure/trigger/event - Computes the cumulative invocations and elapsed
time for each statement in a procedure
39Index consultant
- Recommends indexes to improve query performance
- Main idea is to improve application performance
- Particularly useful when DBA has limited
experience - Permits what-if analysis on an existing query
load and database instance - Allows DBA to find statements that are the most
sensitive to the presence of an index - Can be used to find indexes that are unnecessary,
i.e. those that are not utilized by the optimizer - Can be used to estimate disk storage overhead of
additional indexes
40Future plans
- Increasingly difficult to keep optimization costs
low in the face of increasingly more challenging
workloads - Tradeoff time for space (cached access plans)
- Advantages of plan caching are workload-specific
caching expense pays off only if optimization
costs can be amortized over several executions - Improve statistics, cost model management to
permit self-repair, which will improve access
plan quality - Numerous other opportunities for self-tuning
using feedback control mechanisms - Memory allocation, multiprogramming level,
intra-query parallelism controls, additional
adaptive query optimization techniques - Integrate server internals management with VM
hypervisors to improve operation within guest VMs