Title: Large%20Data%20Operations
1Large Data Operations
- Joe Chang
- jchang6_at_yahoo.com
- www.sql-server-performance.com/joe_chang.asp
2Large Data Operations Overview
- Updates Deletes
- Modifying large row counts can be very slow?
- Dropping indexes improves performance?
- Inserts See SQLDev.Net
- Covered in various presentations by Gert Drapers
3Execution Plan with Indexes
1
2
3
1. Insert multiple rows into table with clustered
index 2. Rows are spooled 3. Nonclustered indexes
are modified from the spooled data
Operations with indexes in place should be
faster Exception - large inserts where bulk log
requirements are met
4Execution Plan Cost Formula Review
Index Seek Plan Formula I/O Cost 0.006328500
0.000740741 per additional page(1GB)
0.003203425 0.000740741 per additional
page(gt1GB) CPU Cost 0.000079600 0.000001100
per additional row
Bookmark Lookup I/O Cost multiple of
0.006250000 (1GB) multiple of
0.003124925 (gt1GB) CPU Cost 0.0000011 per row
Table Scan or Index Scan I/O 0.0375785
0.000740741 per page CPU 0.0000785 0.0000011
per row
Insert, Update Delete IUD I/O Cost 0.01002
0.01010 (gt100 rows) IUD CPU Cost 0.000001 per
row
5Plan Cost Unit of Measure
- Time in seconds? CPU time?
Too fast for 7200RPM disk random I/Os.
0.0062500sec -gt 160/sec 0.000740741 -gt1350/sec
(8KB) -gt169/sec(64K)-gt 10.8MB/sec
About right for 1997 sequential disk transfer
rate?
S2K BOL Administering SQL Server, Managing
Servers, Setting Configuration Options cost
threshold for parallelism Opt Query cost refers
to the estimated elapsed time, in seconds,
required to execute a query on a specific
hardware configuration.
6Test Table
CREATE TABLE M3C_00 ( ID int NOT NULL, ID2 int
NOT NULL, ID3 int NOT NULL, ID4 int NOT
NULL, ID5 int NOT NULL, ID6 int NOT NULL, SeqID
int NOT NULL, DistID int NOT NULL, Value
char(10) NOT NULL, rDecimal decimal (9,4) NOT
NULL, rMoney money NOT NULL, rDate datetime NOT
NULL, sDate datetime NOT NULL ) CREATE CLUSTERED
INDEX IX_M3C_00 ON M3C_00 (ID) WITH
SORT_IN_TEMPDB
10M rows in table, 99 rows per page, 101,012
pages, 808MB 100K rows for each distinct value of
SeqID and DistID Common SeqID values are in
adjacent rows Common DistID values are in
separate 8KB pages (100 rows apart)
7Data Population Script
DECLARE _at_BatchStart int, _at_BatchEnd int ,
_at_BatchTotal int, _at_BatchSize int, _at_BatchRow int,
_at_RowTotal int, _at_I int , _at_p int, _at_sc1 int, _at_dv1
int SELECT _at_BatchStart 1, _at_BatchEnd 1000,
_at_BatchTotal 1000, _at_BatchSize 10000 SELECT
_at_RowTotal _at_BatchTotal_at_BatchSize , _at_p 100,
_at_sc1 100000 SELECT _at_I (_at_BatchStart-1)_at_BatchSi
ze1 , _at_dv1 _at_RowTotal/_at_sc1 WHILE _at_BatchStart lt
_at_BatchEnd BEGIN BEGIN TRANSACTION SELECT
_at_BatchRow _at_BatchStart_at_BatchSize WHILE _at_I lt
_at_BatchRow BEGIN INSERT M3C_00
(ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,rDecima
l,rMoney,rDate,sDate) VALUES ( _at_I, _at_I,
1(_at_I-1)_at_p/_at_RowTotal((_at_I-1)_at_p)_at_RowTotal,
(_at_I-1)(_at_sc1)1, (_at_I-1)/21, (_at_I-1)3201,
(_at_I-1)/_at_sc11, (_at_I-1)(_at_dv1)1,
CHAR(6526rand())CHAR(6526rand())CHAR(6526r
and()) CONVERT(char(6),CONVERT(int,100000(9.0
rand()1.0)))CHAR(6526rand()),
10000rand(), 10000rand(), DATEADD(hour,100000ra
nd(),'1990-01-01'), DATEADD(hour,_at_I/5,'1990-01-01'
) ) SET _at_I _at_I1 END COMMIT TRANSACTION
CHECKPOINT PRINT CONVERT(char,GETDATE(),121)
row ' CONVERT(char,_at_BatchRow)' Complete' SET
_at_BatchStart _at_BatchStart1 END
8Data Population Script Notes
- Double While Loop
- Each Insert/Update/Delete statement is an
implicit transaction - Gets separate transaction log entry
- Explicit transaction generates a single
transaction log write (max 64KB per IO) - Single TRAN for entire loop requires excessively
large log file - Inserts are grouped into intermediate size batches
9Indexes
CREATE INDEX IX_M3C_01_Seq ON M3C_01 (SeqID)
WITH SORT_IN_TEMPDB CHECKPOINT CREATE INDEX
IX_M3C_01_Dist ON M3C_01 (DistID) WITH
SORT_IN_TEMPDB CHECKPOINT UPDATE STATISTICS
M3C_01 (IX_M3C_01_Seq) WITH FULLSCAN UPDATE
STATISTICS M3C_01 (IX_M3C_01_Dist) WITH FULLSCAN
Common SeqID values are in adjacent rows Common
DistID values are in separate 8KB pages (100 rows
apart)
10Test Queries
-- Sequential rows, table scan SELECT AVG(rMoney)
FROM M3C_01 WHERE SeqID 91 -- Sequential
rows, index seek and bookmark lookup SELECT
AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Seq
)) WHERE SeqID 91 -- Distributed rows, table
scan SELECT AVG(rMoney) FROM M3C_01 WHERE
DistID 91 -- Distributed rows, index seek and
bookmark lookup SELECT AVG(rMoney) FROM M3C_01
WITH(INDEX(IX_M3C_01_Dist)) WHERE DistID 91
11Execution Plans - Select
Table scan involves 101,012 pages Bookmark Lookup
involves 100,000 rows 1 BL 3.6X more expensive
than 1 page in Table Scan
12Table Scan Cost Detail
I/O and CPU cost occasionally show ½ the expected
value, but combined cost shows the expected value
Table Scan Formula I/O 0.0375785 0.000740741
x 101,012 74.8 CPU 0.0000785 0.0000011 x
10M 11.0
13Index and Bookmark Details
Bookmark Lookup I/O 0.003124925x100Kx0.998
311.87 CPU 0.0000011x100K 0.11
14Measured Query Times
SELECT query 100K rows Sequential rows Sequential rows Distributed rows Distributed rows
256M Server mem Index BL Table Scan IndexBL Table Scan
Query time (sec) 0.3 10.5 167 10.5
Rows or Pages/sec 333,333(R) 9,620(P) 599(R) 9,620(P)
Disk IO/sec Low 1,200 600 1,200
Avg. Byte/Read N/A 64K 8K 64K
1154MB Server mem
Query time 0.266 1.076 0.373 1.090
Rows or Pages/sec 376,000 93,877 268,000 92,672
Test System 2x2.4GHz Xeon, data on 2 15K disk
drives
15Disk Bound Select Query Cost
- Performance limited by disk capability
- Random 300/disk (small portion of 18GB drive
high queue depth) - Sequential 38MB/sec (Seagate ST318451, first
generation 15K drive)
Disk drive random I/O 2X gain since
mid-1990s Sequential I/O 5X Cost formulas
underestimate current generation disk drive
sequential performance relative to
random However, SQL Server cost formulas do not
reflect in-memory costs
16Update Operation
17Update Details
18Actual Cost - Update
UPDATE query - 100K rows Sequential rows Sequential rows Distributed rows Distributed rows
256M server mem Index Table Scan Index Table Scan
Query time (sec) 1.3 12.6 476.6 28
Checkpoint time (sec) 0.4 0.6 14.5 8
Rows /sec 57,471 7,576 203 2,778
1154MB server mem
Query time (sec) 0.8 1.3 0.9 1.5
Checkpoint time (sec) 0.2 0.1 23 23
Rows /sec 100,000 71,429 4,184 4,082
19Update Variation
Default plan is now a table scan Column value is
not in the index, so a bookmark lookup is
required However data page must be loaded into
buffer cache before it can be modified
regardless!!
20Delete Operation
21Delete Details
22Delete Details (2)
23Delete - Actual Costs
Delete query - 100K rows Sequential rows Sequential rows Distributed rows Distributed rows
256M Server mem Index Table Scan Index Table Scan
Query time (sec) 4.8 88.52 282 41
Checkpoint time (sec) 8.4 4.52 8.4 14
Rows / sec 7,576 1,075 340 1,800
1154MB Server mem
Query time (sec) 4.1 6.4 4.2 5.3
Checkpoint time (sec) 3.7 3.9 28.6 28.6
Rows /sec 12,821 9,708 3,048 2,949
24Deleteno indexes
Delete query, no index 100K rows Sequential rows Distributed rows
256M server mem Table Scan Table Scan
Query time (sec) 11.5 26
Checkpoint time (sec) 0.1 4
Rows / sec 8,621 3,300
1154MB server mem
Query time (sec) 1.9 1.5
Checkpoint time (sec) 0.2 22
Rows /sec 47,619 4,255
25Delete with Foreign Keys
26Summary
- When large updates and deletes are slow
- Examine the execute plan
- Look for nonclustered index seeks on modified
tables with high row count - Use index hint to force table scan
27Additional Information
www.sql-server-performance.com/joe_chang.asp SQL
Server Quantitative Performance Analysis Server
System Architecture Processor Performance Direct
Connect Gigabit Networking Parallel Execution
Plans Large Data Operations Transferring
Statistics SQL Server Backup Performance with
Imceda LiteSpeed jchang6_at_yahoo.com