Title: BIN315 : SSIS Connectivity With ETL Best Practice
1BIN315 SSIS Connectivity WithETL Best Practice
- Itamar Ben Hemo
- CO-CEO Vision.bi
- Itamar_at_vision.bi
2Session Objectives
- Design ETL Why, What and How
- Describe connectivity and extensibility in SSIS
- Explain how the various data sources are
supported
Key Takeaways
- ETL Design - Success keys to make it smart and
simple - ?? ??? ???? ???? ???? ?? ????"
- Who are our connectivity partners?
- What data sources can SSIS load from/to?
- What to expect from these connections in terms
of performance and 64bit?
3ETL Why, What and How
Extract, Transform and Load Data from Operational
system to DWH environment (engine for
data-centric-integration). Data is recognized as
a strategic asset ,Many large organizations are
data processing engines E-World, Insurance,
banking, financial services The ETL Must be
clear, stable, easy for maintenance and with the
best of data quality (Bad/missing data have
greater implied costs)
4Independencies between packages
5Referential Integrity
Dont Handle Referential Integrity in the SSIS
ETL packages 5 Advantages for separating the
referential integrity Simple ETL packages- best
performance, simple for testing Independencies
between packages, dimensions and facts Save
memorySave maintenance timeOne convention for
all shared dimensions
6DWH Sample Architecture
7Operational Fields
Sources - SourceCreationTime, SourceUpdateTime DWH
- DwhCreationTime, DwhUpdateTime Dimensions
RI_Flag 5 Advantages for working with
Operational Fields Testing In development and
production Nearly real time DWH Gap time
between OS to DWH Fast recovery from garbage
data Quality Assurance - Rate of RI records for
each dimension
8Configuration files and Parameters
Incremental Repository One central table with
parameters for all source Table_ID, SourceDB,
SourceSchema, SourceTable, StatusLastETL,
NoRecordsLastETL, FromID, ToID, ChunkSize
Separate Configuration files (one for each Data
Source)- XML with Environment parameters 5
Advantages for working with MngSourceTables and
Configuration file Production Testing
Parameters and values for Quality Assurance tests
(Status, NoRecords, From, To) Recovery from
unexpected gap (Chunk size) History Migration at
the first upload (Chunk size)Transfer between
environments (Configuration Dev, Testing,
Production)Management all incremental processes
in one place
9Standard and other small tips- Dim
Dimension ETL Truncate insert to STGUpdate
Else Insert to DWHIf data in the source is
sorted Set IsSortedTrue. Performance- save
unnecessary sorting
10Standard and other small tips
- Convention names- Documentation, Debugging and
Package execution reports - Select just needed columns save warnings,
Save execution time - OLE DB Source component SQL statement
Select a.Version_id,a.Version_Name From - Lookup / Merge join Not absolute answer (even
though Lookup should be faster specially in
small tables). Anyway when using Lookup -gt
use cache type Full/partial
11Standard and other small tips
- Insert log into table (for analyzing and fine
tuning) - Filter the data in the source adapter- if the
source is file use the conditional split - Fine tuning of DefaultBufferMaxSize and
DefaultBufferMaxRows properties
12Standard and other small tips
- Use Templates for one standard (Logging, Event
handler and configuration), Our
recommendation
13Data Sources
14Data Source Categories
15Application Systems
16Relational DB Systems
17Structured Semi-Structured Data
18Queue Systems Protocols
19Providers Provider Stack
Source Provider
Destination Provider
20Providers Provider Stack
21SSIS vs. Provider Stacks
22SSIS Boundary
23SSIS Boundary
24SQL Server(built-in)Support
2564-bit Support
26Office (2007) Support
27Oracle Support
28CautionOracle Number type
29IBM DB2 Support
30SAP Integration
31Sowhat can SSIS connect to?
32Summary
- ETL Design - Design it smart and simple
Success keys to make it smart and simple - ?? ??? ???? ???? ???? ?? ????"
-
- Who are our connectivity partners?
- What data sources can SSIS load from/to?
- What to expect from these connections in terms
of performance and 64bit?
33(No Transcript)