EDU208 Improving performance with SQL Anywhere - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

EDU208 Improving performance with SQL Anywhere

Description:

EDU208 Improving performance with SQL Anywhere. Glenn Paulley and Anil Goel ... Goals of this presentation. Describe a systematic methodology for conducting ... – PowerPoint PPT presentation

Number of Views:608
Avg rating:3.0/5.0
Slides: 43
Provided by: sybas
Category:

less

Transcript and Presenter's Notes

Title: EDU208 Improving performance with SQL Anywhere


1
EDU208 Improving performance with SQL Anywhere
Capacity Planning
  • Glenn Paulley and Anil Goel
  • Firstname.Lastname_at_ianywhere.com
  • http//iablog.sybase.com/paulley

2
Goals of this presentation
  • Describe a systematic methodology for conducting
    capacity planning and performance analysis with
    SQL Anywhere
  • Illustrate both pitfalls and best practices along
    the way

3
Resources
  • Two white papers
  • Capacity Planning with SQL Anywhere
  • Diagnosing Application Performance Issues with
    SQL Anywhere
  • both are available from http//iablog.sybase.com/p
    aulley and on http//www.sybase.com/ianywhere
  • Raj Jain, The Art of Computer Systems Performance
    Evaluation, John Wiley and Sons, New York 1991
  • Domenico Ferrari, Computer Systems Performance
    Evaluation, Prentice-Hall,1978
  • Ferrari, Serazzi, and Zeigner, Measurement and
    Tuning of Computer Systems, Prentice-Hall, 1983

4
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

5
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

6
Why capacity planning is important
  • You wish to determine if
  • Your existing hardware is adequate for your
    current workload
  • Your existing hardware can handle a larger
    workload
  • Your application will scale to increasing demands
  • A smaller hardware configuration is adequate for
    a certain workload size
  • A specific DBMS is better for your application
    than another
  • A new server version provides the promised
    improvements in performance
  • There are extant software problems preventing
    optimal resource utilization

7
Why capacity planning is difficult
  • Difficult to construct representative workloads
    with which to test
  • If the workload is NOT representative, two
    undesirable outcomes of the study are likely
    either
  • the actual systems performance will be worse
    than that predicted by the evaluation, or
  • significant effort will be expended during the
    evaluation to solve performance issues that will
    not occur in practice
  • Can consume significant time and energy
  • Both cost and benefit are typically unknown when
    you begin
  • Difficult to estimate the amount of time needed
    for problem determination and resolution
  • Benefits depend on how far the existing software
    configuration is from its optimal operating
    conditions

8
Why capacity planning is difficult
  • Database applications are often implicitly or
    explicitly tuned for a particular DBMS, DBMS
    version, or hardware platform
  • Designs or performance that arent immediate
    problems remain untouched
  • Changing anything in the environment can lead to
    a scalability problem within the application that
    will require diagnosis
  • Such bottlenecks often will require subject
    matter experts to resolve
  • An applications scalability characteristics are
    usually just as important to system performance
    as the scalability of the database server
  • There are lots of performance factors to
    consider how can we possibly make this
    manageable?

9
So.. What usually happens?
  • Create a quick, easy, read-only simplistic
    workload
  • Such a workload is usually far from being
    representative, so its results suffer from the
    problems of interpretation
  • Simple requests mean no test of query
    optimization
  • Rely on industry-standard benchmark results
  • Never are representative of your application
  • In particular, industry-standard benchmarks do
    not test query optimizers to a significant degree
  • Data distributions are not skewed
  • Query complexity rarely matches real applications
  • While query execution performance may be
    assessed, query optimization is relatively
    untested

10
So.. What usually happens?
  • Rely on a vendor-specific benchmark
  • Were not stupid
  • Use a qualitative approach, eg. an IBM Redbook on
    capacity planning
  • For an OLTP connection, we require (say) 50KB of
    buffer pool
  • For a reporting application, we require (say) 5MB
    of buffer pool for each connection
  • Assume linear scalability of both software and
    hardware
  • Almost always leads to overspecification of the
    system
  • Dont bother too costly to undertake
  • Significant element of risk

11
Some definitions
12
More definitions
13
Tenets
  • All database system vendors make performance
    tradeoffs and the details vary from system to
    system.
  • The most important aspects of a database system
    are not described in the ANSI SQL standard and
    rarely described in system documentation
  • All CPUs wait at the same speed
  • Gordon Steindel, Technical Services Manager,
    Great-West Life Assurance Company
  • There are no right answers, only tradeoffs
  • William Cowan, Associate Professor, University of
    Waterloo

14
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

15
Why do we need a workload?
  • A workload is a model of an actual production
    system
  • Three types real, artificial, synthetic
  • Real workloads
  • Actual production databases, applications
  • Representative, but difficult to scale, security
    issues
  • Artificial
  • No relationship to real system components but
    easy to create
  • Non-representative but useful to establish
    performance baselines
  • Synthetic
  • Mix of real and synthetic components, often using
    sampling
  • Usually the best approach, but representativeness
    is key

16
Characteristics of good workloads
  • Representative
  • Reproducible
  • Need to be able to re-run the workload over a
    variety of system configurations
  • Scalable
  • Alter the database size, transaction workload to
    simulate different installations, paying
    particular attention to the database working set
    and data skew
  • Convenient
  • Large enough for meaningful tests, small enough
    to be executed in finite time
  • Secure
  • Problematic to use real customer data
  • Problem these are conflicting requirements

17
Constructing a representative database is
difficult
  • Difficult to model real-world skew in a synthetic
    database that mimics a real database instance
  • Application behavior is reliant on the database
    contents, particularly date, time, and timestamp
    columns
  • Database is difficult to scale
  • Difficult to reduce or enlarge the database size
    yet retain existing distributions, correlations,
    and skew
  • RI constraints may no longer hold
  • Difficult to scale the database size in
    conjunction with the servers working set

18
Constructing an application workload is difficult
too!
  • A SQL statement trace is a good starting point,
    but
  • Capturing a real trace from a production system
    may be inconvenient
  • May be difficult to apply a real trace to a
    synthetic database instance
  • Difficult to partition the trace without
    affecting its representativeness
  • May introduce artificial contention if simulating
    a larger number of connections from a given trace
  • Re-using a trace will likely not increase the
    servers working set size
  • Sample distribution of business transactions may
    not hold across all users
  • Difficult to handle updates as they can affect
    query parameters for subsequent requests in the
    trace

19
Scalability and application design
  • Assume we have a special table, surrogate, that
    contains one row for each business object, and
    whose row value is the next key to be utilized
    for an INSERT
  • Application inserts a new client as follows
  • UPDATE surrogate SET _at_x next key, next key
    next key 1
  • WHERE object-type client
  • INSERT INTO client VALUES(_at_x, ...)
  • COMMIT
  • All insertions are now serialized
  • Thread deadlock is inevitable at higher volumes

20
Application behavior and database scalability
  • One must be careful when crafting the workload to
    avoid creating artificial contention points, such
    as in the example shown previously
  • Points of contention within the server are
    system- and release-specific
  • Usual causes hot rows or pages, DDL, client
    latency
  • Positive scalability is the target
  • Perfect scalability is very, very rare
  • Linear scalability is also rare
  • Negative scalability is possible due to contention

21
Client configuration is important
  • While server configuration is the focus,
    performance is almost always measured at the
    client, so client configuration is important
  • Bandwidth and latency are significant issues
  • How the client interacts with the server can
    brutally affect performance
  • Performance impacts can be hidden, such as
    between JConnect and the iAnywhere native JDBC
    driver

22
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

23
Primary SQL Anywhere performance factors
  • Server multiprogramming level (-gn command line
    switch)
  • Server cache size
  • Database page size
  • Number of CPUs available for server thread
    execution (license dependent)
  • Database working set size
  • Speed and configuration of the servers disk
    subsystem
  • Workload characteristics transaction
    interarrival rate, number of connections,
    workload mix
  • There can be other secondary factors

24
The factorial problem
  • Single, one-time experiments provide relatively
    little insight into the capacity planning process
  • How much is the performance problem I/O
    bandwidth? Cache? Page size? Workload? Or these
    factors in combination?
  • Are the second-level (or higher) interactions
    amongst these performance factors accounting for
    additional performance differences?
  • However
  • How does one possibly test the interactions of
    all these factors in a systematic way?
  • k factors, with the ith factor having ni levels,
    requires N experiments

25
The factorial problem (continued)
  • In addition
  • The formula above embodies the execution of each
    experiment only once
  • Cannot analyze experimental error
  • If we execute each experiment r times, we get
  • There is potential for encountering serious
    performance bottlenecks with each experiment
  • Each of these must be investigated and either
    fixed or worked-around for the performance
    analysis to continue
  • May require a complete restart of the entire
    analysis process

26
The factorial problem (continued)
  • This full factorial experimental design has the
    advantage of exploring every possible interaction
    of performance factors
  • But its main problem is its cost there are
    simply too many experiments to run
  • Three ways to reduce the number of experiments
  • Reduce the number of levels for each performance
    factor
  • Reduce the number of factors
  • Use a fractional factorial design
  • Each of these reduction techniques has tradeoffs

27
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

28
2kr Experimental designs
  • 2kr experiment designs are popular because they
  • Reduce the number of experiments to a manageable
    number
  • Relatively straightforward to analyze effects of
    the various performance factors
  • Can use 2kr design to pinpoint more specific
    performance phenomena once an overall assessment
    has been made
  • Basic idea
  • Each of the k performance factor is restricted to
    two levels
  • Not usually a good idea to choose maximal bounds,
    but rather reasonable values
  • Experiments are repeated r times to analyze
    experimental error
  • Use linear regression techniques to analyze the
    results

29
Example of the approach
  • Illustrate a 22 experimental design
  • Additional material on 2kr designs can be found
    in the Capacity Planning whitepaper
  • Performance analysis involves only two factors
  • Server multiprogramming level (-gn) (Factor A)
  • Server cache size (Factor B)
  • All other performance-related parameters remain
    unchanged
  • Workload, number of connections, database size,
    database page size, and so on

30
Example results
  • Results are measured in transactions-per-second

31
Linear regression analysis
  • With linear regression analysis, we wish to solve
    a linear regression equation of the form
  • y is termed the response variable the quantity
    being measured
  • To determine the coefficients q0, qA, qB, and qAB
    which respectively are
  • Mean TPS for the set of experiments
  • Effect of factor A on performance
  • Effect of factor B on performance
  • Effect of the combination of both factors on
    performance

32
Linear regression analysis (contd)
  • We define indicator variables XA and XB as
    follows
  • XA
  • -1 if the multiprogramming level is 20
  • 1 if the multiprogramming level is 30
  • XB
  • -1 if the cache size is 600MB
  • 1 if the cache size is 1200MB
  • Then ignore any complex mathematics and use a
    very simple spreadsheet

33
Sign table approach
34
Linear regression equation
  • Once again, y is the response variable the TPS
    of the system reflected by the experiments
  • Mean TPS of the set of experiments is 12.4 TPS
  • Increasing the multiprogramming level from 20 to
    30 yields an additional 1.3 TPS
  • Increasing the cache size yields 2.35 TPS
  • Increasing both the cache size and the server
    multiprogramming level yields an additional 0.65
    TPS

35
Allocation of variation
  • Would like to quantify the importance of each
    factor assessed in the performance evaluation
  • Measured by the proportion of the variation in
    the test result (y) each factor is responsible
    for
  • Done by computing the SST (Sum Squares Total) of
    the variance of the set of experiments
  • SST SSA SSB SSAB
  • SST 6.76 22.09 1.69 30.54

36
Allocation of variation
  • From our hypothetical example, then, we have
  • The multiprogramming level increase explains
    6.76/30.54 or 22.1 percent of the variation in
    overall performance
  • The cache size increase explains 22.09/30.54 or
    72.33 percent of the variation
  • Increasing both factors explains an additional
    1.69/30.54 or 5.5 percent of the variation in
    performance
  • Conclusion
  • For this workload and this set of experiments,
    increasing (or decreasing) the servers cache
    size yields much more of a performance gain
    (loss) than increasing (or decreasing) the
    servers multiprogramming level

37
Outline
  • Introduction
  • Workload specification
  • SQL Anywhere performance factors
  • Experiment design
  • Conclusions

38
Pitfalls
  • Linear regression assumes independent variables
  • Testing correlated variables, such as both
    additional indexes and database size, will lead
    to erroneous conclusions
  • Not all relationships are additive some are
    multiplicative or even exponential
  • They can be converted to linear functions using
    logarithms
  • Tendency to test too many factors
  • Increases the cost of experimentation, but
    additional tests may provide little additional
    insight
  • Testing lower and upper bounds for any factor is
    not recommended
  • Underlying assumption is that system behavior at
    any midpoint is unaffected by second- or
    third-level interactions

39
Pitfalls
  • Often helpful to run bare-metal performance
    tests to establish performance baselines
  • Helps to ensure your workload tests are
    reasonable
  • Gives an indication of what is possible on the
    particular platform
  • This may be particularly useful in client-server
    environments where network traffic is involved
  • Network latency or application code
    characteristics can be performance killers
  • In some cases server performance isnt really a
    factor at all
  • Use to compare with workload results when
    troubleshooting bottlenecks

40
More complex experimental designs
  • The sign table approach can be used for 2kr
    experimental designs as well
  • Simply more rows and columns
  • An Excel spreadsheet can do all the computation
    for you
  • The hard part is working through the experiments
    and solving performance bottlenecks as you go
  • Sybase iAnywhere consulting is able to assist you
    in doing the performance evaluation in-house

41
Analysis tools
  • LoadRunner is one package that simulates a set of
    users executing SQL scripts and computes summary
    statistics for you
  • Commercial package from Mercury Software,
    certainly not free
  • Sybase iAnywhere Consulting offers Floodgate, a
    performance analysis package developed from our
    own in-house Mobilink performance tests

42
Questions
  • ?
Write a Comment
User Comments (0)
About PowerShow.com