Siebel Data Loading Best Practices on SQL Server - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Siebel Data Loading Best Practices on SQL Server

Description:

Maybe hints in the Siebel EIM job. The pre-EIM Loading Issue ... Big performance gains can be made by just taking the hints out. ... – PowerPoint PPT presentation

Number of Views:1159

Avg rating:3.0/5.0

Slides: 48

Provided by: stephan159

Category:

more less

Transcript and Presenter's Notes

Title: Siebel Data Loading Best Practices on SQL Server

1
Siebel Data Loading Best Practices on SQL Server

Frank Earl McBath
frankmcb_at_microsoft.com

How many of you have used SQL Server and EIM?

3
Objectives

What is EIM?
What makes SQL Server different than other
platforms?
What can we optimize?
How do we optimize?
What tools can I use?
What techniques can I try?

4
What is EIM?

EIM is the process that Siebel uses to load data
into the database.
EIM is used on every platform.
It validates the data (PK/FK relationships) and
generates unique ROWIDs.
Every customer uses it.
It tends to be the first problem that every
customer hits with Siebel.
You cannot bypass EIM.

5
Symptoms of EIM?

The first several thousand will always go fast.
Performance deteriorates over time in a
logarithmic pattern.
Why? Because the b-trees grow to more levels.
Ex. The first 5K rows go in at a rate of 2
minutes. After 2 weeks of loading that same 5K
rows takes 1 hour.

6
What makes EIM query different? (vs. a normal
Siebel query)

EIM is batch oriented
Have better logging
Typically more complicated
Reads system catalog on each run.
You can alter the indexes on the EIM tables.
Typically less configurable than a business rule.

7
Common Siebel Performance Problems

Docking replication turned on
Too many indexes
Wrong indexes
Clustered index issues
Table spools (non-EIM)
Implicit cursor conversion (non-EIM)
Too much meta data being loaded
Batch size not optimum
Blocking/Deadlocks while loading in parallel
Stats are stale (EIM only)
Maybe hints in the Siebel EIM job

8
The pre-EIM Loading Issue

Siebel provides EIM, but no mechanism to populate
the EIM_ tables.
Every customer has to invent the wheel.
Some customers do it good others not.
Dont distribute over network
Dont validate/scrub while loading
What the Alliance is working on
A better wheel.

9
What are the typical problems with EIM? (I)

Too many indexes
Both on EIM_ tables and
Base tables
Incorrect indexes
Poor selectivity

10
What are the typical problems with EIM? (II)

You are allowed to modify/add/drop to the EIM_
tables
IMPORTANT You are not allowed to change base
table indexes without ES approval.
Build a case and present it to them.
They use expert mode for Siebel Tools and alter
the meta data.
Poor statistics Bad plans

11
Tools to Use (I)

SQL Server Profiler
Gives you a SQL Server kernel view of the query.
Level 8 EIM Trace
Will show you network times, etc
Will not always show you hints
Will not show you plans

12
Tools to Use (II)

Index Wizard
Will never suggest a better index. Why? Because
Siebel indexes just about every possible
configuration!
What it will show you is what indexes ARE being
used. Hence, you can deduce what indexes ARE NOT
being used.
DBCC SHOWSTATISTICS
Will show you what indexes are 100 of one value.
Probably not being used.

13
Tools to Use (III)

ISQLW
Use ODBC wrapper
Need to use proper wrapper
Will get incorrect plan if you do not
SET STATISTICS PROFILE
Will show plans with costing
SET STATISTICS IO
Shows the IO only
Only use one at a time.
DBCC DBREINDEX
UPDATE STATISTICS
Long Running Query

14
SQL Profiler

Powerful tool, but can burn up a lot of CPU if
you are not careful.
Not a Black Box flight recorder. Use it
sparingly.
Filter on the SPID or CLIENT PROCESS
Capture on the following Events Stored
Procedures - RPCCompleted and TSQL -
SQLBatchCompleted
Save off to a file, then to a table and analyze

15
How To Optimize EIM (I)

Turn on Level 8 EIM logging.
Very verbose
Turn on SQL Server Profiler for the run.
Run an EIM batch.

16
How to Optimize EIM (II)

Look at the EIM log. What is the longest
execution time?
Tip Load this log into Excel and sort. Then
search the original log.
Look at the SQL Profiler trace. Sort on longest
duration and then on reads.
These queries should match those in the EIM log.
Look at the execution plans. Why is it taking a
long time?
Use Excel to help sort.
Once the problem has been isolated, figure out
the fix.

17
Hints (I) SQL Server

Siebel uses hints by default. This helps rule
based optimizers, but hurts cost based ones.
Big performance gains can be made by just taking
the hints out.
Why you just cant drop the index the query will
not compile
DBCC trace flag will cause optimizer to bypass
all hints
DBCC TRACEON (-1,8602)
Query will compile even if index not there.

18
Hints (II) Siebel

From the IFB file configuration
Test with / without Index Hints
USEINDEXHINTS, FALSE
USEESSENTIALHINTS, FALSE

19
More IFB File Configuration

Consider using the parameters as appropriate
ONLY BASE TABLES,
IGNORE BASE TABLES,
ONLY BASE COLUMNS,
IGNORE BASE COLUMNS
This will reduce the amount of processing
performed by EIM.
ONLY and IGNORE are mutually exclusive.

20
Index Tuning Wizard

Tends not to help much because Siebel has just
about every index on there already.
Will not suggest a clustered index.
Just takes a long time to run
but it can help you determine which indexes to
drop.
By showing you what indexes are used, you can
infer which ones are not.

21
Which Indexes Are Used?

Look at the Profiler trace from the EIM run.
Within the trace, look at the execution plan.
Will tell you which indexes used.
Remember, you have to capture with the following
Event Performance - Show Plan Statistics
Data Columns BinaryData
Output Example
Clustered Index Scan(OBJECT(master.dbo.sysob
jects.sysobjects))

22
Index Removal (I)

Why remove indexes?
Penalties on INSERTS/DELETES which is what the
bulk of EIM
Many indexes are not used
Only keep the EIM indexes that are used.
Fewer indexes mean that there are fewer choices
and fewer chances of a wrong plan being
generated.
Less information to the optimizer can be good
sometimes.

23
Index Removal (II)

Feel free to remove indexes on the EIM_ tables
Work with Siebel ES to remove un-needed indexes
on the base tables.
Build a case. For example, show that a column is
100 NULL, etc
Optimizer probably will never use single value
indexes.
Only in a COUNT on that column would it use it.

24
Index Removal (III)

DBCC SHOWSTATISTICS
Look at Density and Steps

25
Does the Table have a Clustered Index?

Youd be surprised Many tables dont
Run the script in the notes to find which ones
dont
Compare the results to your mappings and ER
diagram
Ask Siebel why they dont have a clustered index

26
Clearly Think Out Your Clustered Index

Smart design can help in covering the index
Non clustered indexes are built on top of
clustered indexes
Point to the cluster key
Every Siebel query has ROWID in it, thats why
its the clustered index on all base tables.
Look at DBCC SHOW STATISTICS

27
Index Strategies (I)

A successful strategy used has been
Analyze what columns are being used
Put them in the clustered index
Use BOE method for selectivity
Drop all non-clustered indexes from the EIM
table.
The premise is that EIM is operating on every row
in the batch anyways, so why not just scan them
all.
Is it worth the extra overhead of more indexes
for what amounts to be a scan?

28
Index Strategies (II)

This strategy works because the BATCH_NUM
firewalls the scan from a full table scan.
The BATCH_NUM is the first column in the
clustered index. Thus, bounding the scan.

29
Performance Testing

After query is isolated
Make a baseline
Use ISQLW
Use SET STATISTICS IO ON
Use SET STATISTICS PROFILE ON
(One at a time. Turn off between.)
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
Change/Add index, rerun the query with IO and
PROFILE.
The McBath Oil Argument

30
The Oil Argument

Do you notice when you change the oil in your
car? Your engine does.
Looking at something that is massively iterated
on and saving 1 IO.
Reindex, fill factor
Resequencing columns in an index
Check IO with SET STATISTICS IO

31
Fragmentation Happens

During big data loads
Run DBCC REINDEX to correct
Think about
Fill Factor
Pad Index
Look at sample script to defrag
Defrag update stats between runs
DBCC INDEXDEFRAG, REINDEX, and drop recreate
Samples and run times
Pros and cons

32
Stale Statistics (I)

What happens when stats get old
Bad plans. A query that normally runs in a few
seconds can take 30 minutes.
How do the stats get stale?
EIM updates every row in the EIM_ table.
The thread that auto updates stats doesnt wake
up in time between runs.
Small tables will never be updated.

33
Stale Statistics (II)

Correct this by running UPDATE STATISTICS between
runs or a SQL AGENT job that wakes up and runs.
Consider turning off auto update stats for the
data load.
Its all about getting predictable performance.

34
Multiple EIM Batches

Number of batches directly related to your
clustered index design
Good index will keep deadlocks from happening
You can run multiple batches against the same _IF
table!
EIM Partioned tables
Use SP_INDEXOPTION to enable row level locks
only. Cuts down on blocking issues.

35
Efficiently Loading Data (I)

Load into pre-staging tables
Scrub in tempdb
Minimal logging
Scrub in SQL Server
Efficiencies of cache
Use set wise processing, not cursors
If have to use cursors, use Fast Forward/Read
Only
Run all local on the database server. Not
distributed over the network.

36
Efficiently Loading Data (II)

BULK INSERT vs BCP
BULK INSERT is in memory
BCP is more configurable
Both are single threaded
Only run on one CPU
Run multiple BULK INSERTS at once across multiple
CPUs.
If the order of the data is not a concern, or
you'd rather take the hit when creating the
index, it's best to run BULK INSERT into the EIM
tables in parallel by deploying a separate thread
per CPU. You can use the TABLOCK hint to allow
multiple non-blocking inserts.

37
Efficiently Loading Data (III)

Rules of thumb for previous
Use only 2 - 3 threads at max (only if you have
the processors)
Limit the batch Commit Size (batch size) to about
2,000 rows per batch.
Adjust up or down based on your testing.
Remember, if loading in clustered index sequence,
only use one thread.
Bulk operations are very high performance.
They do log.
Conditions in BOL (ex. TABLOCK)

38
Efficiently Loading Data (IV)

Disk Slices
Even with a SAN, break out the following on
separate slices if possible
EIM_ Tables
TEMPDB
Base (DATA) Tables
INDEXES
Do this by dropping the table and recreating on a
different file group
Cuts down on fragmentation and contention

39
Efficiently Loading Data (V)

RAID and EIM
Due to the constant UPDATE, INSERT and DELETE,
try and use RAID 01 if possible.
Parity bit calculation penalty can be
significant.
See Balanced System Design for more
information on slicing
Size the Siebel database (siebeldb or whatever it
is named in production) appropriately, and ensure
that it will not have to autogrow during the
process that will hurt disk I/O and performance.

40
Efficiently Loading Data (VI)

When running EIM itself, run processes in
parallel. Set different batch numbers, but they
can be executed against the same interface
tables.
Try and run from opposite ends of the batch
range. Can help cut down on the blocking.
Ex. Run 1 5 at the same time, not 1 2.
Test to see how many threads can be run on your
system. Start with two and add as appropriate.
If you are blocking and getting lock escalations,
use sp_indexoption and set the clustered index to
no page locks. See BOL for more information.

41
Efficiently Loading Data (VII)

Disable any triggers on the databases (such as
workflow triggers) and then re-apply/enable them
after the process is done.
If this is not for an initial data load, this
means that Workflow Manager or Assignment Manager
will not function during the load for the new or
updated data.

42
Efficiently Loading Data (VIII)

Load multiple base tables from one interface
table.
In the IFB table, set the parameter USING
SYNONYMS to FALSE only if you are not associating
multiple addresses with accounts.
If you have a 11 ratio, you are telling the EIM
process that account synonyms do not require
processing during the import, reducing the amount
of work EIM needs to do (as well as load on the
system).

43
Recovery Models During EIM

Use SIMPLE or BULK LOGGED if possible.
Run a full backup freqeuntly during the day.
Weigh the issues of recovery vs. lost data.
Note Switching from FULL to SIMPLE will break
the log chaing and have recovery consequeneces.
Always make a full backup after switching to
SIMPLE.

44
Bringing It All Together (I)

Optimize your EIM
Batch Size
Hint Removal Siebel and SQL Server
Turn off docking replication
Get rid of workflow triggers
Only load up the tables needed from the Siebel
meta data. Loading the whole catalog can
represent 25 the time of your whole batch run.
Run batches in parallel
Exclude validation of non-used data. If you know
something is never NULL (ex. EmpID), then dont
check for it.

45
Bringing It All Together (II)

Update Stats
Stale stats can cause problems
Investigate turning autostats off
Defrag between large runs
Defrag both EIM_ and base tables
On large initial loads, put fill factor and pad
index to 50. Cut down on page splits. Default
is 100 full.
Use minimally logged operations to load and
scrub.
Bulk Insert, SELECT/INTO
Recovery Models
Run all data loading locally
Scrub data inside SQL Server. No cursors.
Make the right indexes
Try monster clustered index only
Get rid of unused indexes
Add them back after runs
Work with ES to resolve support issues.

46
Bringing It Together (III)

Slice your disk right
More spindles More performance. Dont believe
the vendor when they say cache will solve your
problems. It helps hide them.
No RAID 5 if possible
Separate Data, Log, Indexes

47
Questions?