SIGMOD 97 Industrial Session 5

About This Presentation
Title:

SIGMOD 97 Industrial Session 5

Description:

SIGMOD 97 Industrial Session 5 – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 53
Provided by: charle305
Learn more at: https://www.tpc.org

less

Transcript and Presenter's Notes

Title: SIGMOD 97 Industrial Session 5


1
SIGMOD 97Industrial Session 5
  • Standard Benchmarks for Database Systems
  • Chair Dave DeWitt
  • (Jim Gray, in absentia)

2
TPC-CThe OLTP Benchmark
  • Charles Levine
  • Microsoft
  • clevine_at_microsoft.com

3
Benchmarks What and Why
  • What is a benchmark?
  • Domain specific
  • No single metric possible
  • The more general the benchmark, the less useful
    it is for anything in particular.
  • A benchmark is a distillation of the essential
    attributes of a workload
  • Desirable attributes
  • Relevant è meaningful within the target domain
  • Understandable
  • Good metric(s) è linear, orthogonal, monotonic
  • Scaleable è applicable to a broad spectrum of
    hardware/architecture
  • Coverage è does not oversimplify the typical
    environment
  • Acceptance è Vendors and Users embrace it

4
Benefits and Liabilities
  • Good benchmarks
  • Define the playing field
  • Accelerate progress
  • Engineers do a great job once objective is
    measurable and repeatable
  • Set the performance agenda
  • Measure release-to-release progress
  • Set goals (e.g., 10,000 tpmC, lt 50 /tpmC)
  • Something managers can understand (!)
  • Benchmark abuse
  • Benchmarketing
  • Benchmark wars
  • more on ads than development

5
Benchmarks have a Lifetime
  • Good benchmarks drive industry and technology
    forward.
  • At some point, all reasonable advances have been
    made.
  • Benchmarks can become counter productive by
    encouraging artificial optimizations.
  • So, even good benchmarks become obsolete over
    time.

6
What is the TPC?
  • TPC Transaction Processing Performance Council
  • Founded in Aug/88 by Omri Serlin and 8 vendors.
  • Membership of 40-45 for last several years
  • Everybody whos anybody in software hardware
  • De facto industry standards body for OLTP
    performance
  • Administered by Shanley Public Relations ph
    (408) 295-8894 777 N. First St., Suite 600 fax
    (408) 295-9768 San Jose, CA 95112-6311 email
    td_at_tpc.org
  • Most TPC specs, info, results are on the web
    page http//www.tpc.org

7
TPC-C Overview
  • Moderately complex OLTP
  • The result of 2 years of development by the TPC
  • Application models a wholesale supplier managing
    orders.
  • Order-entry provides a conceptual model for the
    benchmark underlying components are typical of
    any OLTP system.
  • Workload consists of five transaction types.
  • Users and database scale linearly with
    throughput.
  • Spec defines full-screen end-user interface.
  • Metrics are new-order txn rate (tpmC) and
    price/performance (/tpmC)
  • Specification was approved July 23, 1992.

8
TPC-Cs Five Transactions
  • OLTP transactions
  • New-order enter a new order from a customer
  • Payment update customer balance to reflect a
    payment
  • Delivery deliver orders (done as a batch
    transaction)
  • Order-status retrieve status of customers most
    recent order
  • Stock-level monitor warehouse inventory
  • Transactions operate against a database of nine
    tables.
  • Transactions do update, insert, delete, and
    abortprimary and secondary key access.
  • Response time requirement 90 of each type of
    transaction must have a response time 5
    seconds, except stock-level which is 20 seconds.

9
TPC-C Database Schema
Item 100K (fixed)
10
TPC-C Workflow
1
Select txn from menu 1. New-Order 45 2.
Payment 43 3. Order-Status 4 4. Delivery
4 5. Stock-Level 4
Cycle Time Decomposition (typical values, in
seconds, for weighted average txn) Menu
0.3 Keying 9.6 Txn RT 2.1 Think
11.4 Average cycle time 23.4
2
Measure menu Response Time
Input screen
Keying time
3
Measure txn Response Time
Output screen
Think time
Go back to 1
11
Data Skew
  • NURand - Non Uniform Random
  • NURand(A,x,y) (((random(0,A) random(x,y))
    C) (y-x1)) x
  • Customer Last Name NURand(255, 0, 999)
  • Customer ID NURand(1023, 1, 3000)
  • Item ID NURand(8191, 1, 100000)
  • bitwise OR of two random values
  • skews distribution toward values with more bits
    on
  • 75 chance that a given bit is one (1 - ½ ½)
  • skewed data pattern repeats with period of
    smaller random number

12
NURand Distribution
13
ACID Tests
  • TPC-C requires transactions be ACID.
  • Tests included to demonstrate ACID properties
    met.
  • Atomicity
  • Verify that all changes within a transaction
    commit or abort.
  • Consistency
  • Isolation
  • ANSI Repeatable reads for all but Stock-Level
    transactions.
  • Committed reads for Stock-Level.
  • Durability
  • Must demonstrate recovery from
  • Loss of power
  • Loss of memory
  • Loss of media (e.g., disk crash)

14
Transparency
  • TPC-C requires that all data partitioning be
    fully transparent to the application code. (See
    TPC-C Clause 1.6)
  • Both horizontal and vertical partitioning is
    allowed
  • All partitioning must be hidden from the
    application
  • Most DBMSs do this today for single-node
    horizontal partitioning.
  • Much harder multiple-node transparency.
  • For example, in a two-node cluster

Any DML operation must be able to operate against
the entire database, regardless of physical
location.
15
Transparency (cont.)
  • How does transparency affect TPC-C?
  • Payment txn 15 of Customer table records are
    non-local to the home warehouse.
  • New-order txn 1 of Stock table records are
    non-local to the home warehouse.
  • In a distributed cluster, the cross warehouse
    traffic causes cross node traffic and either 2
    phase commit, distributed lock management, or
    both.
  • For example, with distributed txns

Number of nodes Network Txns 1 0 2 5.5 3 7
.3 n 10.9
16
TPC-C Rules of Thumb
  • 1.2 tpmC per User/terminal (maximum)
  • 10 terminals per warehouse (fixed)
  • 65-70 MB/tpmC priced disk capacity (minimum)
  • 0.5 physical IOs/sec/tpmC (typical)
  • 250-700 KB main memory/tpmC (how much do you
    have?)
  • So use rules of thumb to size 10,000 tpmC system
  • How many terminals?
  • How many warehouses?
  • How much memory?
  • How much disk capacity?
  • How many spindles?

17
Typical TPC-C Configuration (Conceptual)
Presentation Services
Database Functions
Emulated User Load
C/S LAN
Term. LAN
Driver System
Client
Hardware
Response Time measured here
RTE, e.g. Empower preVue LoadRunner
TPC-C application Txn Monitor and/or database
RPC library e.g., Tuxedo, ODBC
TPC-C application (stored procedures)
Database engine Txn Monitor e.g., SQL Server,
Tuxedo
Software
18
Competitive TPC-C Configuration Today
  • 8070 tpmC 57.66/tpmC 5-yr COO 465 K
  • 2 GB memory, disks 37 x 4GB 48 x 9.1GB (560 GB
    total)
  • 6,700 users

19
TPC-C Current Results
  • Best Performance is 30,390 tpmC _at_ 305/tpmC
    (Digital)
  • Best Price/Perf. is 7,693 tpmC _at_ 42.53/tpmC
    (Dell)

TPC-C results as of 5/9/97
20
TPC-C Results (by OS)
TPC-C results as of 5/9/97
21
TPC-C Results (by DBMS)
TPC-C results as of 5/9/97
22
Analysis from 30,000 ft.
  • Unix results are 2-3x more expensive than NT.
  • Doesnt matter which DBMS
  • Unix results are more scalable
  • Unix 10, 12, 16, 24 way SMPs
  • NT 4-way SMP w/ Intel 8-way SMP on Digital
    Alpha
  • Highest performance is on clusters
  • only a few results (trophy numbers?)

23
TPC-C Summary
  • Balanced, representative OLTP mix
  • Five transaction types
  • Database intensive substantial IO and cache load
  • Scaleable workload
  • Complex data data attributes, size, skew
  • Requires Transparency and ACID
  • Full screen presentation services
  • De facto standard for OLTP performance

24
Reference Material
  • TPC Web site www.tpc.org
  • TPC Results Database www.microsoft.com/sql/tpc
  • IDEAS web site www.ideasinternational.com
  • Jim Gray, The Benchmark Handbook for Database and
    Transaction Processing Systems, Morgan Kaufmann,
    San Mateo, CA, 1991.
  • Raj Jain, The Art of Computer Systems Performance
    Analysis Techniques for Experimental Design,
    Measurement, Simulation, and Modeling, John Wiley
    Sons, New York, 1991.
  • William Highleyman, Performance Analysis of
    Transaction Processing Systems, Prentice Hall,
    Englewood Cliffs, NJ, 1988.

25
TPC-D The Industry Standard Decision Support
Benchmark
  • Jack Stephens
  • Informix
  • jms_at_informix.com

26
Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

27
TPC-D Overview
  • Complex Decision Support workload
  • The result of 5 years of development by the TPC
  • Benchmark models ad hoc queries
  • extract database with concurrent updates
  • multi-user environment
  • Specification was approved April 5, 1995.

DSS Queries
Business Analysis
Business Operations
OLTP Transactions
28


Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

29
TPC-D Schema
Customer SF150K
Nation 25
Region 5
Order SF1500K
Supplier SF10K
Part SF200K
Time 2557
LineItem SF6000K
PartSupp SF800K
Legend Arrows point in the direction of
one-to-many relationships. The value below each
table name is its cardinality. SF is the Scale
Factor. The Time table is optional. So far,
not used by anyone.
30
Schema Usage
31
TPC-D Database Scaling and Load
  • Database size is determined from fixed Scale
    Factors (SF)
  • 1, 10, 30, 100, 300, 1000, 3000, 10000 (note
    that 3 is missing, not a typo)
  • These correspond to the nominal database size in
    GB. (I.e., SF 10 is approx. 10 GB, not
    including indexes and temp tables.)
  • Indices and temporary tables can significantly
    increase the total disk capacity. (3-5x is
    typical)
  • Database is generated by DBGEN
  • DBGEN is a C program which is part of the TPC-D
    spec.
  • Use of DBGEN is strongly recommended.
  • TPC-D database contents must be exact.
  • Database Load time must be reported
  • Includes time to create indexes and update
    statistics.
  • Not included in primary metrics.

32

Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

33
TPC-D Query Set
  • 17 queries written in SQL92 to implement business
    questions.
  • Queries are pseudo ad hoc
  • Substitution parameters are replaced with
    constants by QGEN
  • QGEN replaces substitution parameters with random
    values
  • No host variables
  • No static SQL
  • Queries cannot be modified -- SQL as written
  • There are some minor exceptions.
  • All variants must be approved in advance by the
    TPC

34
Sample Query Definition
  • 2.3 Forecasting Revenue Query (Q6)This query
    quantifies the amount of revenue increase that
    would have resulted from eliminating company-wide
    discounts in a given percentage range in a given
    year. Asking this type of what if query can be
    used to look for ways to increase revenues.
  • 2.3.1 Business QuestionThe Forecasting Revenue
    Change Query considers all the lineitems shipped
    in a given year with discounts between
    DISCOUNT0.01 and DISCOUNT-0.01. The query list
    the amount by which the total revenues would have
    decreased if these discounts had been eliminated
    for lineitems with item quantities less than
    QUANTITY. Note that the potential revenue
    increase is equal to the sum of (L_EXTENDEDPRICE
    L_DISCOUNT) for all lineitems with quantities
    and discounts in the qualifying range.
  • 2.3.2 Functional Query DefinitionSELECT
    SUM(L_EXTENDEDPRICEL_DISCOUNT) AS REVENUE FROM
    LINEITEM WHERE L_SHIPDATE gt DATE DATE AND
    L_SHIPDATE lt DATE DATE INTERVAL 1 YEAR
    AND L_DISCOUNTBETWEEN DISCOUNT - 0.01 AND
    DISCOUNT 0.01 AND L_QUANTITY lt QUANTITY
  • 2.8.3 Substitution ParametersValues for the
    following substitution parameters must be
    generated and used to build the executable query
    text.
  • 1. DATE is the first of January of a randomly
    selected year within 1993-1997
  • 2. DISCOUNT is randomly selected within 0.02 ..
    0.09
  • 3. QUANTITY is randomly selected within 24 .. 25

35
Sample Query Definition (cont.)
  • 2.8.4 Query ValidationFor validation against the
    qualification database the query must be executed
    using the following values for the substitution
    parameters and must produce the following
    outputValues for substitution parameters1.
    DATE 1994-01-012. DISCOUNT 0.063. QUANTITY
    24Query validation output data1 row
    returned REVENUE 11450588.04
  • Query validation demonstrates the integrity of an
    implementation
  • Query phrasings are run against 100MB data set
  • Data set must mimic the design of the test data
    base
  • Answers sets must match those in the
    specification almost exactly
  • If the answer sets dont match, the benchmark is
    invalid!

36
Query Variations
  • Formal Query Definitions are ISO-92 SQL
  • EQT must match except for Minor Query
    Modification
  • Date/Time Syntax AS clauses
  • Table Naming Conventions Ordinal Group By/Order
    By
  • Statement Terminators Coding Style (I.e., white
    space)
  • Any other phrasing must be a Pre-Approved Query
    Variant
  • Variants must be justifiable base on a criteria
    similar to 0.2
  • Approved variants are include in the
    specification
  • An implementation may use any combinations of
    Pre-Approved Variants, Formal Query Definitions
    and Minor Query Modifications.

37
TPC-D Update Functions
  • Update 0.1 of data per query stream
  • About as long as a medium sized TPC-D query
  • Implementation of updates is left to sponsor,
    except
  • ACID properties must be maintained
  • The update functions must be a set of logically
    consistent transactions
  • New Sales Update Function (UF1)
  • Insert new rows into ORDER and LINEITEM tables
    equal to 0.1 of table size
  • Old Sales Update Function (UF2)
  • Delete rows from ORDER and LINEITEM tablesequal
    to 0.1 of table size

38

Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

39
TPC-D Execution Rules
  • Power Test
  • Queries submitted in a single stream (i.e., no
    concurrency)
  • Each Query Set is a permutation of the 17
    read-only queries
  • Sequence
  • Throughput Test
  • Multiple concurrent query streams
  • Single update stream
  • Sequence

40
TPC-D Execution Rules (cont.)
  • Load Test
  • Measures the time to go from an empty database to
    reproducible query runs
  • Not a primary metric appears on executive
    summary
  • Sequence

41
TPC-D Metrics
  • Power Metric (QppD)
  • Geometric Mean
  • Throughput (QthD)
  • Arithmetic Mean
  • Both Metrics represent Queries per Gigabyte
    Hour

42
TPC-D Metrics (cont.)
  • Composite Query-Per-Hour Rating (QphD)
  • The Power and Throughput metrics are combined to
    get the composite queries per hour.
  • Reported metrics are
  • Power QppD_at_Size
  • Throughput QthD_at_Size
  • Price/Performance /QphD_at_Size
  • Comparability
  • Results within a size category (SF) are
    comparable.
  • Comparisons among different size databases are
    strongly discouraged.

43

Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

44
Disclosure Requirements
  • All results must comply with standard TPC
    disclosure policies
  • Results must be reviewed by a TPC auditor
    certified for TPC-D
  • A Full Disclosure Report and Executive Summary
    must be on file with the TPC before a result is
    publicly announced
  • All results are subject to standard TPC review
    policies
  • Once filed, result are In Review for sixty days
  • While in review, any member company may file a
    challenge against a result that they think failed
    to comply with the specification
  • All challenges and compliance issues are handled
    by the TPCs judiciary, the Technical Advisory
    Board(TAB) and affirmed by the membership

45
TPC-D Current Results
Performance
Price/Performance
TPC-D results as of 5/9/97
46

Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

47
Do good, Do well and TO-DO
  • First, the good news
  • TPC-D has improved products
  • First real quantification of optimizer
    performance for some vendors
  • TPC-D has increased competition
  • Then some areas that bear watching
  • Workload is maturing indexing and query fixes
    are giving way to engineering
  • SMP/MPP price barrier is disappearing, but so is
    some of the performance difference
  • Meta knowledge of the data is becoming critical
    better stats, smarter optimizers, wiser data
    placement

48
Things we missed...
  • And finally the trouble spots
  • No metric will please, customers, engineers, and
    marketing managers
  • TPC-D has failed to showcase multi-user decision
    support
  • No results yet on 10G or 30G
  • Decision support is moving faster than the TPC
    OLAP, data marts, data mining, SQL3, ADTs,
    Universal IBM, Informix, Oracle

49

Outline
  • Overview
  • The Database
  • The Queries
  • The Execution Rules
  • The Results
  • Early Lessons
  • The Future

50
TPC-D, version 2 Overview
  • Goal define a workload to take over for TPC-D
    1.x in time with its lifecycle (2 year from now)
  • Two areas of focus
  • Address the known deficiencies of the 1.x
    specification
  • Introduce data skew
  • Require multi-user executions
  • What number of streams is interesting?
  • Should updates scale with users? with data
    volume?
  • Broaden the scope of the query set and data set
  • Snowstorm schema
  • Larger query set
  • Batch and Trickle update models

51
An extensible TPC workload?
  • Make TPC-D extensible
  • Three types of extensions
  • Query new question on the same schema
  • Schema new representations and queries on the
    same data
  • Data new data types and operators
  • Simpler adoption model than full specification
  • Mini-spec presented by three sponsors
  • Eval period for prototype/refinement (Draft
    status)
  • Acceptance as an extension
  • Periodic review for renewal, removal or promotion
    to base workload
  • The goal is an adaptive workload more responsive
    to the market and more inclusive of new
    technology without losing comparability or
    relevance

52
Want to learn more about TPC-D?
  • TPC WWW site www.tpc.org
  • The latest specification, tools, and results
  • The version 2 white paper
  • TPC-D Training Video
  • Six hour video by the folks who wrote the spec.
  • Explains, in detail, all major aspects of the
    benchmark.
  • Available from the TPC Shanley Public
    Relations ph (408) 295-8894 777 N. First St.,
    Suite 600 fax (408) 295-9768 San Jose, CA
    95112-6311 email td_at_tpc.org
Write a Comment
User Comments (0)