Administering SQL Server 2000: VLDB Best Practices - PowerPoint PPT Presentation

1 / 67

About This Presentation

Title:

Administering SQL Server 2000: VLDB Best Practices

Description:

S rskilt formad datastruktur f r att l ttare komma t data ... Fr gor som specificerar ett smalt s kkriteria, svaret blir ett f tal rader ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 68

Provided by: DavidCa186

Category:

more less

Transcript and Presenter's Notes

Title: Administering SQL Server 2000: VLDB Best Practices

1
DATABASE INDEXES , BLOB VLDB Best
Practices Bernt Castman KTH, Kista Ryan
Trout Microsoft Corporation
2
Vad är ett index?

Särskilt formad datastruktur för att lättare
komma åt data
Lagras tillsammans med data eller separat
Uppdateras normalt av systemet (ej
fulltextindex)
Utan index måste all dataåtkomst gå via sökning i
tabeller

3
forts

Genom att använda index kan man reducera antalet
I/O-operationer och snabba upp alla andra
operationer också samt minska antalet
blockeringar och tabell-låsningar
Om en fråga accessar mer än 20 av raderna i en
tabell är en tabellscanning bättre än index.
Detta gäller även om samma värde på indexkolumn
finns i många rader

4
Relationship Between an Index and a Table
5
Index är bäst

Frågor som specificerar ett smalt sökkriteria,
svaret blir ett fåtal rader
Frågor som specificerar en räcka värden, svaret
blir ett fåtal rader
Sökning inom JOIN, kolumner som ofta används i
JOIN är goda indexkandidater
Sökning som vill ha data i en specifik ordning,
utan att behöva sortera
efternamn, förnamn- sortering kräver inte ORDER BY

6
forts

Indexera med måtta
Indexera inte små tabeller
Använd "covering queries" där önskvärt svar finns
inom indexets lagringsutrymme, se mera senare

7
Hur organiseras index?

Index byggs oftast upp i trädstruktur
Root node är början av index
innehåller ett antal indexrader
dessa rader innehåller nycklar och en pointer
till en indexsida
Trädet kan ha flera indexnivåer

8
Index-Uppdatering

Index-nyckel avser den eller de kolumner i en
tabell som används för att generera index
Index uppdateras hela tiden
Vid tex en ny insert i tabellen av en datarad kan
en omfattande ändring behöva göras i indexträdet
På grund av sin karaktär (fast längd etc) ger
detta snabb uppdatering av index även om man
måste skapa utrymme för en ny indexsida
Detta kallas page split och ger ofta
kaskad-effekter på resten av trädet
Kan undvikas genom balansering med "tuning fill
factor"

9
Simple Index

Definieras av en kolumn i en tabell
Refereras till med SQL-attributet WHERE
Väljs bara en kolumn där samma data förekommer i
många rader tex "state blir inte index särskilt
effektivt

10
Composite Index

Sammansatt index definieras från mer än en kolumn
tex state, street,city
Kan adresseras genom att använda olika index
Kan bestå av högst 16 kolumner med max 900 bytes
"data"
Vid frågor med sammansatta index behöver man inte
placera alla indexnycklar i WHERE

11
Exempel

Index skapat på kolumn a,b och c av en tabell
Kan sökas med SELECT ... WHERE
EQUAL a AND b AND c går fortast
EQUAL a AND b
EQUAL a ger alltid INDEX SCAN

12
INDEX SCAN REAL WORLD A Customer Location Table

Suppose we have a table that contains information
about the location of customers of our business.
An index is created on the columns state, county,
and city and is stored in the tree structure in
the following order state, county, city.
If a query specifies the state column value as
Texas in the WHERE clause, the index will be
used. Because values for the county and city
columns are not given in the query, the index
will return a number of rows based on all of the
index records that contain Texas as the state
column value.
An index scan is used to retrieve a range of
index pages and then a number of data pages based
on values in the state column.
The index pages are read sequentially in the same
way that a table scan accesses data pages.

13
Index Uniqueness

Ett unikt definerat index blir alltid unikt. Om
kolumnen som avses med indexet är unikt blir det
unika index
Fungerar snabbt och bra då endast en I/O till
behövs för att hitta data. Försök att skapa ett
lika index ger fel
Ett ickeunikt index fungerar på samma sätt men
här tillåts dubbla index med samma värde i
datakolumnen
Används när man inte kan göra unikt index

14
Cluster

Klustrade index lagrar aktuella data av rader i
sina egna noder
så fort rätt data hittats kan det läsas utan I/O
högre prestanda
alla data i index-sorterad ordning
om state/county/city klustrat index söks på state
blir även sorterat på county och city
Om man vill komma till andra data i tabellen blir
det ytterligare en access
Endast ett klustrat index per tabell

Ickeklustrat index pekar på data i en tabell
Max 249 ickeklustrade index per tabell
Index skapas via Wizard "Create Index Wizard"
SQL CREATE UNIQUE CLUSTERED NONCLUSTERED
INDEX index_name ON table_name
( column_name , column_name, column_name, ...
)
WITH options ON filegroup_name

16
Fulltext-Index

Mera lik en katalog, ej i trädstruktur
Ett fulltextindex måste innehålla en kolumn som
unikt identifierar varje rad
Ett fulltextindex per tabell
Uppdateras inte automatiskt av systemet
Ofta mer än text-söknings-maskin,.. fraser- ,
grupper av ord

17
Examples

SELECT FROM Customers WHERE ContactName
LIKE 'PETE'
SELECT FROM Customers WHERE
CONTAINS(ContactName, ' "PETE" ')
The CONTAINS predicate can find text strings that
contain the desired string, such as "PETER" or
"PETEY," by using the full-text index.
SELECT CategoryName FROM Categories WHERE
FREETEXT(Description, 'Sweets candy bread')
This query might find category names containing
such words as "sweetened," "candied," or "breads."

18
Hints

When the Query Optimizer generates a query
execution plan, it chooses an index based on
which index will provide the best
performanceusually the index that will use the
fewest I/O operations and retrieve the least
number of rows.
Although Query Optimizer usually chooses the most
efficient query execution plan and access path
for your query, you might be able to do better if
you know more about your data than Query
Optimizer does. For example, suppose you want to
retrieve data about a person named "Smith" from a
table with a column listing last names. Index
statistics generalize based on a column. Suppose
the statistics show that each last name appears
three times on average in the column. This
information provides fairly good selectivity
however, you know that the name "Smith" appears
much more often than average. If you have an idea
about how to better conduct an SQL operation, you
can use a hint. A hint is simply advice you give
to Query Optimizer specifying that it should not
make the automatic choice.

19
Hints cont..

Table scan In some cases, you might decide that a
table scan would be more efficient than an index
lookup or an index scan. A table scan is more
efficient when the index scan will retrieve more
than 20 percent of the rows in the tablesuch as
when 70 percent of the data is highly selectable
and the other 30 percent is "Smith."
Which index to use? You can specify a particular
index to be the only index considered. You might
not know which index SQL Server Query Optimizer
would choose without your hint, but you feel the
hinted index will perform best.
Which group of indexes to select from You can
suggest several indexes to Query Optimizer, and
it will use all of them (ignoring duplicates).
This option is useful when you know that a set of
indexes will work well.
Locking method You can tell Query Optimizer which
type of lock to use when it is accessing the data
from a particular table. If you feel that the
wrong type of lock might be chosen for this
table, you can specify that Query Optimizer
should use a row lock, a page lock, or a table
lock.

20
Hash Files

Rader i tabell lagras inte sekventiellt på disken
Hash-funktion beräknar adressen baserat på en
eller flera kolumner i raden
Hur garanterar man en unik adress?
Förutsättning Man vet något om vilka värden som
kan komma i en kolumn
Overflow-area behövs

21
Tex personnummer

år alla möjliga 1800-2001
månad 1-12
dag 1-31
löpnr 0001-9999
Hur ser ett företagsnummer ut 556380-4524, dvs
månad och dag-mönstret bryter hash-formeln
Lösning Overflow-area där hash-beräkning ger
samma index som en annan rad, lagras sekventiellt

22
Exempel

CREATE CLUSTER PropertyCluster
(propertyNo VARCHAR2(5))
HASH IS propertyNo HASHKEYS 300000
när clustret blivit gjort kan vi skapa en tabell
tex
CREATE TABLE PropertyFor Rent
(propertyNo VARCHAR2(5) PRIMARY KEY, .....)
CLUSTER PropertyCluster (propertyNo)

23
Using Bulk Copy Program (BCP)

BCP is an external program provided with
Microsoft SQL Server 2000 to facilitate the
loading of data files into a database. BCP can
also be used to copy data out of a SQL Server
table into a data file.
Using the BULK INSERT command The BULK INSERT
Transact-SQL (T-SQL) command lets you copy large
quantities of data from a data file into a SQL
Server table, from within SQL Server. Because
this command is an SQL statement (run within
ISQL, OSQL, or Query Analyzer), the process runs
as a SQL Server thread. You cannot use this
command to copy data from SQL Server into a data
file.
Using Data Transformation Services (DTS) DTS is a
set of tools provided with SQL Server that makes
copying data into and out of SQL Server an easy
task. DTS includes a wizard for importing data
and a wizard for exporting data.
Recovering a database from a backup file can also
be considered a form of database loading
Although staging tables do not provide a method
of loading data, they are commonly used in
database loading

24
User table data DB2

By default, table data is stored on 4 KB pages.
Each page (regardless of page size) contains 76
bytes of overhead for the database manager. This
leaves 4020 bytes to hold user data (or rows),
although no row on a 4 KB page can exceed 4005
bytes in length. A row will not span multiple
pages. You can have a maximum of 500 columns when
using a 4 KB page size

25
Long field data DB2

Long field data is stored in a separate table
object that is structured differently from other
data types
Data is stored in 32 KB areas that are broken up
into segments whose sizes are "powers of two"
times 512 bytes. (Hence these segments can be 512
bytes, 1024 bytes, 2048 bytes, and so on, up to
32 700 bytes.)
Long field data types (LONG VARCHAR or LONG
VARGRAPHIC) are stored in a way that enables free
space to be reclaimed easily. Allocation and free
space information is stored in 4 KB allocation
pages, which appear infrequently throughout the
object.
The amount of unused space in the object depends
on the size of the long field data, and whether
this size is relatively constant across all
occurrences of the data. For data entries larger
than 255 bytes, this unused space can be up to 50
percent of the size of the long field data.
If character data is less than the page size, and
it fits into the record along with the rest of
the data, the CHAR, GRAPHIC, VARCHAR, or
VARGRAPHIC data types should be used instead of
LONG VARCHAR or LONG VARGRAPHIC.

26
LOB Data Objects

Data is stored in 64 MB areas that are broken up
into segments whose sizes are "powers of two"
times 1024 bytes. (Hence these segments can be
1024 bytes, 2048 bytes, 4096 bytes, and so on, up
to 64 MB.)
To reduce the amount of disk space used by LOB
data, you can specify the COMPACT option on the
lob-options clause of the CREATE TABLE and the
ALTER TABLE statements. The COMPACT option
minimizes the amount of disk space required by
allowing the LOB data to be split into smaller
segments. This process does not involve data
compression, but simply uses the minimum amount
of space, to the nearest 1 KB boundary. Using the
COMPACT option may result in reduced performance
when appending to LOB values.
The amount of free space contained in LOB data
objects is influenced by the amount of update and
delete activity, as well as the size of the LOB
values being inserted.

27
LOGGING for BLOBs

NOLOG
right click at time of definition for objects
(columns)
does not affect other objects (columns) in the
table where data are used)

28
Windows DNA 2000Next Generation Web Application
Platform
Application LogicTier
Client Tier
DataTier
Application Services IIS/ASP, COM/MTS,
MSMQ. ADO/OLE DB Data Access, Active Directory,
XML, Commerce
Data Access andStorage
ASP/COM Application
Browser
SQL Server
Rich Client
Devices
Internet (TCP/IP, HTTP, HTML, XML)
29
SQL Server 2000VLDB

Higher Availability
Lower Administration Costs
Larger Databases
Based on customer feedback
Simplified Planning
Higher Performance
Smooth Operations

Keep the Ideas Coming!
30
Overview
VLDB Techniques
Performance
Architecture
Features

Backup Recovery
Index Maintenance
Loading and Resizing
Verification

31
Backup and Recovery

Recovery Models
Recovery Models and Backup
Backup Enhancements
Snapshot Backups

32
Log Basics

Conceptually an infinite stream of log records
Log files on disk provide a circular cache
Optimized for sequential writes
Not an audit-trail
10 byte Log Sequence Number (LSN) identifies
location and sequence of a log record.
Monotonically increasing

33
Page Structure
LSN, ID, record count, ...

Header
LSN identifies log record corresponding to last
page modification
Data records
Offset table

Record 1
Record 3
Record 2
34
Transaction Log Example 1
Roll - back
Done
35
Transaction Log Example 2
Record Level Locking
LSN
Done
Log
36
Checkpoint

Anchor in log for recovery
Lists active transactions
Flushes hot and old dirty pages to disk
Limits recovery time
Allows log to be reused

37
Recovery Overview

3 passes (phases) to recovery

Log Start
Log/Time
38
Recovery Models

Simplify recovery planning
Simplify backup and recovery procedures
Clarify tradeoffs
Performance
vs. Data loss exposure
vs. Log space consumption
vs. Simplicity
Simplify transitions between common operational
scenarios
Maintain data protection
Maintain availability
Minimize disruption of automated maintenance
procedures

39
Recovery Models

Control Media Recovery Only
Choosing a Model
Key Benefits
Work Loss Exposure
Functional Limitations
Setting the model
How it Works

Recovery Models
Full
Simple
Bulk Logged
ALTER DATABASE SET RECOVERY FULL
40
Recovery Models

Full
No work loss
Supports recovery to any arbitrary point-in-time
Simple
Simplest backup/restore strategy
Less log space required
Greatest work loss possible
Bulk_Logged
High performance bulk operations
Minimal log space for bulk operations
Some work loss exposure

41
How Recovery Models Work

Full ?Everything is fully logged
Bulk_Logged ?Minimal logging for some operations
CREATE INDEX
Bulk Load
SELECT INTO
WRITETEXT, UPDATETEXT
Not settable per operation due to administration
impact
Simple ?Log truncation on checkpoint

42
Log Backups in Bulk_Logged Mode

Log backup contains extents modified by bulk
logged operations since last log backup
Avoids breaking the log chain
Eliminates need for special full backup
?Minimal impact on automated backups
?Log shipping works!

Transaction Log
Log Backup
Data Extents
43
Available Backup TypesFor Each Model
44
Changing Recovery Model

FULL? BULK_LOGGED doesnt break automated backup
procedures
Transitions to and from SIMPLE require more care
but are uncommon

45
Recovery ModelsVLDB Techniques

Use
Full
Bulk_Logged
Or both
Transitions are easy
Dont use Simple for VLDB
Limited to database or database differential
backup
Precludes file / filegroup backup

46
Database BackupHow it works

Fuzzy backup
Writes inconsistent set of pages
Includes transaction log to make consistent
Minimal impact on running transactions
Fast parallel sequential scans, no
transformations to data

Time
Log
Start Oldest Active Transaction
End Data Backup
Old - not backed up
After data written
During data backup
Data
47
Log and File Backup

Flexibility
Log and file backups dont block each other
Files are independent of each other
Must do concurrent file backups as a single
operation

Time
Log
Log
Log
Log
Log
Log
File
File
File
File
48
Differential Backups

Fast bitmap driven
NewDifferential file backup for VLDB

Bitmap
Extents Changed Since Database Backup
Backup
49
Snapshot Backup/Restore

Equivalent to full database or file backup
Roll forward using regular differential and log
backups
History maintained in MSDB
Third party VDI application with storage system
support (split mirror or copy-on-write)

Tape Library
3-way Mirror
Split Mirror
Backup / Passive
Production
Storage
Virtual Device Interface for Backup
50
Snapshot Benefits

Very fast restore
From mirror in seconds
Longer from tape
Backup with the minimum possible impact on
production server
Very fast backup
Create reporting or test database
Initialize warm standby

51
Backup and RecoveryVLDB Techniques

Online backup no window needed
Use file and log backups
Consider file differential to reduce recovery
time
Backup infrequently modified filegroups less
often
Consider snapshots for high availability
Very fast restore from disk
Vary fast backup to disk

52
Index Maintenance

Index Creation
Analysis
Online Reorganization

53
Parallel Index Creation

Fast
Degree of Parallelism Controllable
How It Works
Separate thread buildsa sub-index for each range
Each thread fed by parallel scan
Sub-indexes stitched together at end
Optional use of TEMPDB

Complete Index
Sub-index Per Range
Parallel Scans
54
Fragmentation Analysis

Fast mode avoids reading leaf level pages
Optional tabular results
Optional check of all index levels
Optional check of all indexes

?
55
Online Reorg

Low impact
20 degradation of medium OLTP workload
Uses minimal data space
May be stopped and restarted
Use analysis to decide when to reorg

56
Online Reorg

Reorders the leaf level of an index
Reestablishes fill factor
Pages remain in the same file
Reorders one index at a time

57
Index MaintenanceVLDB Techniques

Index Creation - Parallel and Concurrent
Parallel for largest indexes
Default for SMP
Concurrent for smaller indexes
Combination for mixed indexes
Consider Bulk_Logged recovery model

58
Index MaintenanceVLDB Techniques

Maintenance
Fast analysis
Online reorganization at appropriate threshold
Allow for larger log backups
Slower than rebuilding index

59
Loading and Resizing

Data Loading
Shrinking the Log

60
Bulk Data Loading

New data types per column collation
Optional firing of triggers during load
Constraint checks
During loading
New command to check after loading
Biggest change is operational
Recovery models
Easy to trade off performance vs. concurrency
Easy to maintain recoverability

61
Bulk Load Performance

Scales linearly with number of CPUs
Improvementnearly 100for each CPU

500 MHz Xeon Typical
62
Bulk Data LoadVLDB Techniques

Initial load into empty tables
Load Data
No indexes
Use BULK INSERT
Parallel load from partitioned data files
One load stream per CPU
Bulk_Logged or Simple recovery model
TABLOCK option
Create indexes
Switch to appropriate recovery model
Perform backups

63
Bulk Data LoadingVLDB Techniques

For incremental loads
Load data with indexes in place
Performance and concurrency requirements
determine locking granularity
Recovery model changes
Full ? Bulk_Logged recovery model
Unless need to preserve point-in-time recovery
Simple (no change)

64
Shrinking the Log
Head of Log

Does what it can immediately
Manually shrink again after log backup

Head of Log
Head of Log
Log Backup
Target Size
65
Shrink DatabaseVLDB Techniques

Long running shrink database
May be stopped without losing work
Reclaim space by using TRUNCATE_ONLY

66
Verification

Optional online check (allow updates)
Performance 15 transaction throughput drop
during medium workload
Parallel check on SMP systems
Performance scales linearly at gt80
Physical check
Very fast page level check will catchproblems
caused by hardware

CHECKDB, CHECKTABLE, CHECKFILEGROUP
67
VerificationVLDB Techniques

Use physical check
For full check, allow enough TempDB
Space use ESTIMATE_ONLY option
Read / write throughput
Check part of the database at a time
Verify individual file groups separately
Check infrequently modified data less often

Hot
68
VerificationWhen?

Not a part of regular maintenance
Run at critical times
Hardware upgrade
Software upgrade
Whenever you suspect a problem, regardless of
cause
Run it occasionally forpeace of mind

69
Distributed Partitioned Views

Create identical tables on multiple servers
Define constraint on each table
Create UNION VIEW over all tables
QP optimizes access based on constraints
View is updateable
Recoverable to consistent state

Updateable Union View
Tables
70
Smooth Operations

Simple transitions between recovery models
Operation conflicts eliminated
E.g., CREATE INDEX vs. Database Backup
All operations recoverable
E.g., add file

Maintain data protection
Maintain availability
No disruption of automated maintenance

71
Putting It All Together
Time
Full Recovery Model
Full Recovery Model
Bulk_Logged Recovery Model
OLTP
Bulk Load
Create Index
Online Check
Bulk Load
Create Index
Online Reorg
Log Backup
Log Backup
Log Backup
Log Backup
Log Backup
Log Backup
File Backup
File Backup
File Backup
File Backup

Write a Comment

User Comments (0)