Glenn Paulley - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Glenn Paulley

Description:

... CE/Pocket PC, Linux 32- and 64-bit, HP-UX, AIX, Solaris (SPARC and Intel), Mac ... depending on the platform; fibers are used on Windows and Linux platforms ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 41

Provided by: sybas

Category:

more less

Transcript and Presenter's Notes

Title: Glenn Paulley

1
Self-management featuresin SQL Anywhere server

Glenn Paulley
Director, Engineering
http//iablog.sybase.com/paulley

2
Sybase iAnywhere products

SQL Anywhere
Full-function, small-footprint relational DBMS
with support for triggers, stored procedures,
materialized views, intra-query parallelism, hot
failover, ANSI SQL 2003 support including OLAP
queries, multidatabase capability
Mobilink/SQL Remote
Two-way data replication/synchronization
technologies for replicating data through
different mechanisms to support
occasionally-connected devices
Ultralite
fingerprint database supports ad-hoc SQL on
very small devices
UltraliteJ
100 Java fingerprint database for Blackberry and
iPhone

3
Design goals of SQL Anywhere

Ease of administration
Comprehensive yet comprehensible tools
Good out-of-the-box performance
Embeddability features ? self-tuning
Many environments have no DBAs
Cross-platform support
32- and 64-bit Windows (XP, Server, 2003, 2000,
9x), Windows CE/Pocket PC, Linux 32- and 64-bit,
HP-UX, AIX, Solaris (SPARC and Intel), Mac OS/X,
Compaq Tru-64
Interoperability

4
Why is self-management important?

In a word complexity
Application development is becoming more complex
new development paradigms such as ORM toolkits,
distributed computation with synchronization
amongst database replicas, and so on
Databases are now ubiquitous in IT because they
solve a variety of difficult problems
Yet most companies continue to own and manage a
variety of different DBMS products, which
increases administrative costs
Ubiquity brings scale, in several ways
To keep TCO constant, one must improve the
productivity of each developer

5
Embeddability
6
Physical database design
7
Physical database design

Logical and physical database design tradeoffs
have a significant impact on performance
The fundamental problem is that the potential
benefits of design changes must be amortized over
the execution characteristics of a workload
Representative workloads are difficult to create
Issues include data skew, correlation, mitigating
artificial contention that occurs as a result of
simulating the actual system
Most DBMS vendors now offer tools to assist in
index or materialized view creation (more on this
later)
However, not all implications are covered by any
of these tools

8
Physical database design with SQL Anywhere

A SQL Anywhere database is composed of up to 15
dbspaces
Each dbspace is an OS file
One each used for temporary file, transaction log
Others are for user data
UNIX raw partitions are not supported
Image copy is done via simple OS file copy
No image copy utility is necessary
Easy to deploy a database to a separate machine
or CE device
Files are interoperable on all supported
platforms without user intervention
on-the-fly data conversion done when necessary

9
Physical database design

By default, databases are created with
Case-insensitive searching enabled
Not blank-padded no difference in the storage of
CHAR vs. VARCHAR attributes on the disk
SQL Anywhere supports different character sets,
collations for international contexts
Includes support for NCHAR data type using
multi-byte UTF8 charset
Supported page sizes are 2K, 4K, 8K, 16K, 32K
All dbspaces must use the same page size
No hard limits on row length, number of BLOB
columns
BLOB values (up to 2GB each) stored in separate
page arena

10
Physical database design

Database files grow automatically as necessary to
accommodate any increase in data volume
Server can execute a user call-out procedure when
a disk-full panic occurs
Server offers a temporary file governor to ensure
that connections cannot be starved when they
require space for intermediate results
Indexes on primary, foreign keys created
automatically
Server will automatically detect redundant
(duplicate or subsumed) indexes
two or more logical indexes share the same
physical structure

11
Physical database design

Transaction log
SQL Anywhere uses a logical logging and recovery
scheme at the row level (not physiological, or
page-based)
Transaction log can be translated directly into
INSERT, UPDATE, DELETE SQL statements
DBTRAN utility ships with the software
Assists in recovery from catastrophic failures
Provides the backbone for two-way synchronization
with Mobilink, SQL Remote

12
Self-management features
13
Memory management

SQL Anywhere uses a single heterogeneous buffer
pool with few predefined limits
Buffer pool can automatically grow and shrink
depending on database workload and OS
requirements for other applications
Buffer pool comprises
Table data pages
Index pages
Checkpoint log pages
Bitmap pages
Heap pages (data structures for connections,
stored procedures, triggers)
Free (available) pages
All page frames are the same size

14
Page replacement scheme

Buffer pool has no preset bounds for sort space,
heap space, table pages
Buffer pools page replacement scheme must be
able to manage various categories of pages and
requests on a demand basis
Workloads change over time periods
Static configurations must tradeoff service
levels for administrative effort
One basic requirement avoid polluting the cache
with the results of pages from sequential scans
Identify pages that are frequently used

15
Page replacement scheme

Pages flow through the buffer pool over time
Window (the size of the cache) is stratified into
eight segments based on time of reference
A pages score is incremented as it moves from
segment to segment
Need to differentiate reference locality (ie
multiple references to a single page based on row
fetch) from other forms of reuse
Replacement based on individual page score
A modified Clock algorithm goes through each
page in turn if a pages score is less than a
threshold value, page is reused
If page is retained, score is decayed
LRU-k and 2Q algorithms are frequently referenced
in the literaturehowever we have found their
overheads to be too high

16
Task scheduling model

Basic idea reduce server resource requirements
by establishing a limited pool of worker
threads to execute requests
Reduces address space, virtual and real memory
requirements
Introduces possibility of thread deadlock
Task implementation differs depending on the
platform fibers are used on Windows and Linux
platforms
Scheduler assigns work across a pool of workers
Worker thread pool typically small, default is 20
Size of pool establishes the multiprogramming
level
Workers are assigned dynamically to connections
on demand

17
Task scheduling model

Unassigned requests wait for an available thread
Server uses cooperative multitasking to switch
between requests
Server supports intra-query parallelism
With queries, optimizer determines degree of
parallelism at optimization time

18
Task scheduling model

Waiting requests are candidates for swapping
Try to maintain constant memory footprint by
using a virtual memory technique
Heaps for user connections can be swapped to
disk if server load demands it
Originally implemented in the first SQL Anywhere
release in 1989
Pointers within the heap are swizzled to permit
reloading of heaps on a demand basis
Diagnostic connections can use a special,
reserved thread to execute its requests
Enables diagnosis of a hung engine

19
Dynamic memory management

A SQL Anywhere server will grow and shrink the
buffer pool as necessary to accommodate both
Database server load
Physical memory requirements of other
applications
Enabled by default on all supported platforms
User can set lower, upper bounds, default initial
size
OS provides API to determine memory size
(requires virtual memory support)
Implementation varies slightly by platform

20
Dynamic memory management

Basic idea match buffer pool size to operating
systems working set size
Feedback control loop

21
Memory allocation to server threads

Query optimizer assumes the availability of an
amount of available memory for each computation
Some execution operators (notably hash-based
operators) have the ability to free memory
without changing the computation (just making it
less efficient)
Release of memory is done partition-at-a-time so
effect is not abrupt
Other operators can be constructed with
alternative plans that offer a cheaper execution
technique if either
the optimizers choices are found to be
suboptimal at run-time, or
the operator requires a low-memory strategy at
the behest of the memory governor

22
Other self-management features

On startup, buffer pool is automatically loaded
with those database pages that were first
requested on the last startup
SQL Anywhere supports events
fired on a schedule, or when specific server
events occur, such as
Idle time greater than a threshold
Backup
Connect/disconnect
File size
Disk space

23
Self-tuning query optimization

and adaptive query execution strategies

24
Automatic statistics management

A feature of SQL Anywhere since 1992
Early implementations used a hash-based structure
to manage column density and frequent-value
statistics
See Lynch, 14th VLDB (1988) for a similar
approach
Today
Self-tuning column histograms
On both base and temporary tables
Statistics are updated on-the-fly automatically
Join histograms built for intermediate result
analysis
Server maintains index statistics in real-time
Index sampling during optimization

25
Column histograms

Self-tuning implementation
Incorporates both standard range buckets and
frequent-value statistics
Updated in real-time with the results of
predicate evaluation and update DML statements
By default, statistics are computed during the
execution of every DML request
Novel technique used to capture statistics on
strings for use in optimizing LIKE predicates
Histograms computed automatically on LOAD TABLE
or CREATE INDEX statements
Can be created/dropped explicitly if necessary
But retained by default across unload/reload

26
Query optimizer

SQL Anywhere optimizes requests each time they
are executed
Optimizer takes into account server context
Working set
Available cache
Values of procedure, host variables
Assumption optimization is cheap
SQL Anywhere uses a proprietary, cost-based join
enumeration algorithm that primarily constructs
left-deep trees
Optimization process includes both heuristic and
cost-based complex rewritings
Advantages plans are responsive to server
environment, buffer pool contents/size, data
skew no need to administer packages

27
Join enumeration

Optimization space is over left-deep trees
Optimization process includes both heuristic and
cost-based complex rewritings
Exceptions for complex, right-deep nested LEFT
OUTER JOINS
Space pruning based on a governor that
allocates quota to different portions of the
search space
Join method, degree of parallelism, physical scan
method (index selection), and expensive predicate
evaluation are also part of search space
Each choice is re-evaluated at each step
Superb optimization times even for complex
queries
No hard limits tested with
500 quantifiers in a single block
100-way join on a CE device with 3MB buffer pool

28
Bypassing the query optimizer

Simple, single-table queries are optimized
heuristically
Access plans for queries in stored
procedures/triggers/events are cached
Plans undergo a training period where plan
variance is determined
If no variance (even without variable values),
plan is cached and reused
Query is periodically re-optimized on a
logarithmic scale to ensure plan does not become
sub-optimal

29
Adaptive query processing

In some cases the optimizer will generate
alternative access plans that can be executed if
actual intermediate result sizes are poorly
estimated
Server switches to alternative plan automatically
at run time
Memory-intensive operators, such as hash join,
have low-memory strategies that are used when
buffer pool utilization is high

30
Adaptive query processing

Some operations, such as database backup, contain
sampling processes to determine the
characteristics of the I/O device used for
storage
Primary goal is to determine the number of disk
heads available
Processes can utilize the right number of CPUs
to maximize throughput
Algorithms are sensitive to CPU requirements of
the rest of the system, and automatically scale
down CPU usage as necessary

31
Intra-query parallelism

SQL Anywheres approach is to parallelize a
left-deep plan when doing so is advantageous
There is an upper bound to the number of EXCHANGE
operators in a data-flow tree can be set by the
user to override
Work is partitioned independently of the
availability of worker threads at run-time
Plans are largely self-tuning with respect to
degree of parallelism
Prevents starvation of any specific subplan when
the number of available workers is less than
optimal for some period

32
Management tools
33
Management tools

Framework
DBISQL interactive SQL query tool
Sybase Central administration/development tool
Tools
Graphical query plans
Stored procedure debugger
Stored procedure profiler
Index consultant
Request-level logging
Optimization logging

34
Graphical query plans
35
Stored procedure debugger

Enables debugging of a PSM or Java stored
procedure
Features
set/unset breakpoints and watch settings
determine value of variables
traverse the call stack
execute queries with the values of variables at
the time of the break

36
Application Profiling scenarios
37
Application Profiling architecture
38
Stored procedure profiler

Offers the ability to analyze the performance of
a stored procedure/trigger/event
Computes the cumulative invocations and elapsed
time for each statement in a procedure

39
Index consultant

Recommends indexes to improve query performance
Main idea is to improve application performance
Particularly useful when DBA has limited
experience
Permits what-if analysis on an existing query
load and database instance
Allows DBA to find statements that are the most
sensitive to the presence of an index
Can be used to find indexes that are unnecessary,
i.e. those that are not utilized by the optimizer
Can be used to estimate disk storage overhead of
additional indexes

40
Future plans

Increasingly difficult to keep optimization costs
low in the face of increasingly more challenging
workloads
Tradeoff time for space (cached access plans)
Advantages of plan caching are workload-specific
caching expense pays off only if optimization
costs can be amortized over several executions
Improve statistics, cost model management to
permit self-repair, which will improve access
plan quality
Numerous other opportunities for self-tuning
using feedback control mechanisms
Memory allocation, multiprogramming level,
intra-query parallelism controls, additional
adaptive query optimization techniques
Integrate server internals management with VM
hypervisors to improve operation within guest VMs