Database Optimisation and Database Benchmarking - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Database Optimisation and Database Benchmarking

Description:

normally tree based (relational algebra) 2. Convert to canonical form ... based on cost formulas and database statistics. Rule or cost based in Oracle. Slide 25 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 33
Provided by: davidala3
Category:

less

Transcript and Presenter's Notes

Title: Database Optimisation and Database Benchmarking


1
Database Optimisation and Database Benchmarking
  • David Nelson
  • April 2007

2
Contents
  • Introduction
  • Database Benchmarking
  • Wisconsin, TPC-A, TPC-B, TPC-C, AS3AP, OO7,
    BUCKY
  • Query Performance and Optimisation
  • Query Processing
  • Indexes
  • Summary and Further Reading

3
Introduction
  • Software and systems development projects include
    performance evaluation work, but sometimes not
    sufficient to prevent major performance problems
  • Tools available
  • Prototyping
  • Benchmarking
  • Query processing
  • Its a black art!

4
Database Benchmarking
  • A tool for comparing the performance of DBMS
  • summarise relative performance in a single figure
  • Usually measured in transactions per second (tps)
  • with a cost measure in terms of system cost over
    5 years
  • Two principal uses of benchmarks
  • providing comparative performance measures
  • a source of data and queries that represent
    experimental approximations to other problems

5
Data Generation Approaches
  • Artificial
  • generation of data with entirely artificial
    properties
  • designed to investigate particular aspects of a
    system, e.g. join
  • e.g. Wisconsin
  • Synthetic Workload
  • produce a simplified version of an application
  • use synthetically generated data with similar
    properties to real system, e.g. a banking
    application
  • e.g.Transaction Processing Performance Council
    (TPC)

6
Benchmarks
  • Relational Benchmarks
  • Arfiticial Wisconsin, AS3AP
  • Synthetic TPC-C (successor of TPC-A and TPC-B)
  • OO Benchmarks
  • Synthetic oo7 (successor of oo1)
  • Object-Relational Benchmarks
  • Synthetic BUCKY
  • There are many other benchmarks, these are the
    most well known

7
Wisconsin Benchmark
  • First systematic benchmark definition
  • compares particular features of DBMS rather than
    a simple overall performance metric
  • A single-user series of tests, comprising
  • selections and projections with varying
    selectivities on clustered, indexed non-indexed
    attributes
  • joins of varying selectivities
  • aggregate functions (e.g. min, max, sum)
  • updates/deletions involving key/non-key attributes

8
Wisconsin Benchmark
  • Straightforward to implement
  • Scalable, e.g. parallel architectures
  • Useful readily understandable results
  • Lack of highly skewed attribute distribution
  • Simple join queries

9
TPC-A and TPC-B
  • Measures performance of a simple banking
    transaction
  • from initiation at a terminal until response
    arrives back from server
  • benchmark encompasses time taken by server,
    network and other system components
  • terminals emulated using a negative-exponential
    transaction arrival distribution
  • TPC-B only measures performance of the database
    server

10
TPC-C
  • Based on an order entry application
  • Scope of system similar to TPC-A, but more
    complex, i.e. 9 tables rather than 5, and a
    larger range of queries
  • closer to typical database applications
  • more comprehensive test of DBMS features
  • TPC-A and TPC-B now obsolete

11
TPC-C Schema
Taken from TPC-C Standard Specification
available at www.tpc.org
12
TPC-C
  • 5 transactions covering
  • New order
  • A payment
  • Order status enquiry
  • A delivery
  • A stock level inquiry
  • 10 terminals at each warehouse
  • All 5 transactions available at each terminal
  • produce an equal number of New-Order and Payment
    transactions and to produce one Delivery
    transaction, one Order-Status transaction, and
    one Stock-Level transaction for every ten
    New-Order transactions
  • Metric
  • number of New-Order transactions executed per
    minute

13
Other TPC Benchmarks
  • TPC-H
  • Ad-hoc decision support environments
  • TPC-R (now obsolete)
  • Business reporting in decision support
    environments
  • TPC-W (now obsolete!)
  • Transactional web benchmark for e-commerce
  • TPC-App
  • Application server and web services benchmark
  • Further info on
  • www.tpc.org

14
AS3AP
  • ANSI SQL Standard Scalable and Portable Benchmark
  • based on Wisconsin
  • series of two tests
  • single-user basic features of RDBMS
  • multi-user characteristics of DB workload, e.g.
    parallel and main memory DB applications
  • based on throughput, excludes terminal time etc.
  • queries are scalable

15
oo7
  • A benchmark for Object Database Systems
  • An improvement on previous oo1 benchmark
  • Examines the performance characteristics of
    different types of retrieval/traversal, object
    creation/deletion and updates and query processor
  • A number of sample implementations are provided
  • Based on a complex parts hierarchy
  • Further info
  • ftp.cs.wisc.edu

16
oo7 Tests
  • Test 1
  • Raw traversal speed, traversal with updates,
    operations on manuals
  • Tests with/without full cache
  • Test 2
  • Exact matches, range searches, path lookup, scan,
    make, join
  • Test 3
  • Insert/update a group of composite parts

17
BUCKY
  • An Object-Relational Benchmark
  • Objective
  • To test the key features that add the object to
    object-relational database systems, as defined by
    Stonebraker
  • Inheritance, Complex Objects, ADTs
  • Not triggers
  • Based on a university database schema
  • Exists as an object-relational and a relational
    schema
  • can compare performance tradeoffs between using
    object aspects of DBMS compared to purely
    relational

18
BUCKY Schema
Taken from The BUCKY Object-Relational
Benchmark, Carey, el. al
19
BUCKY Queries
  • Aim is to test various object queries,
    involving
  • 1. row types with inheritance
  • 2. inter-object references
  • 3. set-valued attributes
  • 4. methods of row objects
  • 5. ADT attributes and their methods
  • Two BUCKY performance metrics
  • O-R Efficiency Index, for comparing O-R and
    relational implementations
  • O-R Power Rating, for comparing O-R systems

20
Query Performance
  • Query performance is necessary to achieve
    acceptable performance of a RDBMS
  • Various ways in which this can be achieved
  • De-normalisation of data to reduce joins
  • Creating indexes on frequently retrieved
    attributes
  • Clustering tables to reduce the number of disk
    reads
  • Automatic optimisation of queries

21
Query Performance
  • Automatic query optimisation can dramatically
    improve query execution time
  • e.g. Consider the simple SQL query
  • select s.student_no, s.student_name,
    c.course_name
  • from student s, course c
  • where s.course_id c.course_id
  • and s.age gt 25
  • This query is more optimal if the selections and
    projections are performed before the join

22
Example
  • 1000 students of which only 100 are over the age
    of 25, and there are 50 courses
  • Alternative 1 Join first
  • read the 1000 students, read all courses 1000
    times (once for each student), construct an
    intermediate table of 1000 records (which may be
    too large to fit in memory)
  • restrict the result to those over the age of 25
    (100 rows at most)
  • project the result over the required attributes

23
Example
  • Alternative Restrict first
  • read 1000 tuples but restrict to those over the
    age of 25, returning an intermediate table of
    only 100 rows - which has a much better potential
    of being storable in main memory
  • join the result with the course table, again
    returning an intermediate table of only 100 rows
  • project the result over the required attributes
  • Obviously this version is BETTER!
  • Could be improved further by doing the projection
    before the join.

24
Query Processing Stages
  • Four stages in query processing
  • 1. Cast the query into internal form
  • normally tree based (relational algebra)
  • 2. Convert to canonical form
  • 3. Choose candidate low-level procedures
  • using indexes, clustering, etc.
  • 4. Generate query plans and choose and run
    the optimal query
  • based on cost formulas and database statistics
  • Rule or cost based in Oracle

25
Query Cast into Internal Form
RESULT
PROJECT over student_no,
RESTRICT where age gt 25
JOIN over course_id
S
C
26
Canonical Form
  • Canonical form
  • given a set Q of queries, and a notion of
    equivalence between two queries q1 and q2 in set
    Q, then there exists a subset C of Q, the set of
    canonical forms for Q, if and only if every query
    q in Q is equivalent to only one query c in C.
  • The query c is the canonical form of the query q
  • Uses expression transformation rules

27
Expression Transformation Rules
  • Examples (not complete)
  • (A WHERE p1) WHERE p2 A WHERE p1 and p2
  • (A PROJECT x,y) PROJECT y A PROJECT y
  • (A UNION B) PROJECT x (A PROJECT x) UNION (B
    PROJECT x)
  • (A JOIN B) PROJECT x (A PROJECT x1) JOIN (B
    PROJECT x2)
  • A JOIN B B JOIN A
  • (A JOIN B) JOIN C A JOIN (B JOIN C)
  • (A JOIN B) PROJECT x A PROJECT x
  • where x is FK from B to A

28
Choose Candidate Low-Level Procedures
  • How to execute the query represented by that
    converted form
  • Take into consideration
  • Indexes
  • Other physical access paths
  • Distribution of data values
  • Clustering
  • Specify as a series of low-level operations
  • Each low level operation has a set of predefined
    implementation procedures

29
Generate Query Plans/Choose Cheapest
  • Each query plan from stage 3 will have a cost
    formula generated from the cost formula for each
    low-level procedure
  • Oracle supports
  • Rule Based
  • Rank queries according to algebra operations
  • 15 rules in Oracle
  • Cost Based
  • Optimal rule based query may not in fact be
    optimal due to cost of operating query, e.g. join
    order
  • Need to gather statistics

30
Indexes
  • Indexes can be created on relations to increase
    search/access performance
  • Types of index
  • Primary
  • Clustering
  • Secondary
  • Can be sparse or dense
  • Most DBMSs use B-trees to hold indexes

31
Summary
  • Have looked at methods for performance
    optimisation
  • benchmarking is a tool for evaluating and
    comparing performance of real systems
  • OO7 benchmark is to ODBS what Wisconsin is to
    RDBMS
  • RDBMS systems normally include query optimisers
    to attempt to convert user queries into optimal
    queries
  • Indexes and clustering can significantly improve
    query optimisation

32
Further Reading
  • Bitton, De Witt and Turbyfill Benchmarking
    Database Systems A Systematic Approach, Proc.
    9th VLDB 1983.
  • Carey, et. al The BUCKY Object-Relational
    Benchmark,
  • http//www.cs.wisc.edu/naughton/bucky.html
  • Date, chapter on Query Optimisation
  • Connolly and Begg, OODB chapter has information
    on benchmarking and indexes
Write a Comment
User Comments (0)
About PowerShow.com