Title: Copying, Managing, and Transforming Data With DTS
1Copying, Managing, and Transforming DataWith DTS
2Defining Bulk Insert Task Functionality
- Quickly Loads Data from a File into SQL Server
- Encapsulates the Transact-SQL Bulk Insert
Statement - Supports Table or View Destinations in SQL Server
- Loads Data with No Applied Transformations
- Supports Format Files to Specify File Layout
- Requires Sysadmin or Bulkadmin Fixed Server Roles
Membership
The Bulk Insert Task is One of Three Ways to Run
SQL Server Bulk Copy Operations
3Sidebar SQL Server Bulk Copy Operations
- What Do Bulk Copy Operations Offer?
- Allow Fast Loading of Data into SQL Server
- Configure Data Load Batches
- Allow You to Control Logging Operations
- Ways to Access Bulk Copy Operations
- Bcp Utility
- Bulk Insert Task or T-SQL Bulk Insert Statement
- Bulk Copy APIs for OLE DB, ODBC, DB-Library
Applications
4Defining the Sales_stage Table Load
DTS
Polaris
Tab Delimited File
- Using the Bulk Insert Task to Load Tab-delimited
File Data into Sales_stage - Loading Sales_stage with Data Bound for Sales_fact
5Defining Execute SQL Task Functionality
- Executing SQL Statements
- Source database must understand SQL syntax
- SQL statement determines task performance
- Task supports single or multiple SQL statements
- You can create queries in the DTS Query Designer
- Running Parameterized Queries
- Input parameters
- Output parameters
6Using Parameterized Queries
- Understanding Global Variable Basics
- User-defined storage locations
- Information is shared across package steps
- Using Parameters with Global Variables
- Assign global variable values to query input
parameters - Store query results to a global variable with
output parameters
7Creating Dynamic Queries
SELECT FROM product_dim WHERE product_name
? AND category_name ?
- ? Question Marks Represent Query Parameters
8Storing Query Results
Storing Row Values
SELECT begin_date, end_date FROM
financial_period WHERE quarter 1
Global Variables
Output Parameter
BeginDate EndDate
begin_date end_date
Storing Entire Rowsets
SELECT FROM product
Global Variable
Output Parameter
Product
Entire Rowset
Store Query Results in Global Variables
9Defining the Time_dim Data Load
Time_dim_build Stored Procedure
- Input Parameters
- _at_p_start_date
- _at_p_end_date
10Defining the DTS Data Pump
- DTS Mechanism for Moving and Transforming Data
- Allows for High-speed Batch Copying of Data
- Contains Supplied Data Transformations
- Can Also Define ActiveX Script Transformations
- Provides An Extendable COM-based Architecture
That Allows for Custom Transformations (C) - Permits the Application of Transformation Logic
to Specific Phases of a Data Pump Operation - Multi Phase Data Pump
11Understanding How the Data Pump Processes Data
X Forms
Source
Destination
ActiveX Script Copy Trim String Custom
OLE DB ODBC
DTS Data Pump
In
Out
- Connects to the source and destination
- Reads OLE DB metadata about source and
destination columns - Gathers data transformation definitions
- Implements the transformation
- Writes completed record to the destination
12Defining the Tasks That Transform Data
- The Transform Data Task
- Inserts
- The Data Driven Query Task
- Inserts
- Updates
- Deletes
- The ParallelDataPumpTask
- Processes hierarchical rowsets
- The Transform Data Task
- Inserts
13Defining the Transform Data Task
- Data Movement and Transformation Functionality
- Copying data between heterogeneous data sources
- Applying optional column level transformations
- Extended Data Transfer Functionality
- Supporting batch processing of data
- Providing error-handling capabilities
- Containing optimization settings for SQL Server
destinations
14Selecting Transformation Types
Transformation
Description
ActiveX Script
Invokes user-defined ActiveX scripts.
Copy Column
Copies data from source to destination.
DateTime String
Converts a date to a new destination format.
Lowercase String
Converts a string to lowercase characters.
Uppercase String
Converts a string to uppercase characters.
Middle of String
Extracts a sub string of source data.
Trim String
Removes white space from a source string.
Read File
Copies contents of a file to a destination
column. File path is specified by a source
column.
Write File
Copies contents of a source column to a file.
File path is specified by a second source column.
15Defining Column Mappings
- One-to-One Mappings
- Symmetric Many-to-Many Mappings
- Asymmetric Mappings
16Creating Efficient Column Mappings
- Minimizing the Number of Column Mappings
- Using Many-to-Many Mappings When Possible
- Grouping Common Transformations Together
17Loading Customer_dim
Northwind OLTP SQL Server Database
18Performance Settings
- Enabling Fast Load
- Using high-speed bulk copy processing
- Accepting batches of transformed data
- Only applies to SQL Server destinations
- Using a Table Lock
- Configuring Batch Size
19Configuring Batch Size
- Assembling Records into Groups
- DTS commits records to database as a group
- Insert batch size sets the number of records in
the group - Understanding Default Behavior
- Insert batch size is 0
- DTS assigns one batch for all records
- Setting the Insert Batch Size
- Value between 0-9999
- Setting value can improve performance
20Defining SQL Solutions
- You Can Use the Source Query of the Transform
Data Task to Implement Data Transformations - The Source SQL Statement Must Be Understood by
the Source Database - The Performance of the Source Query Depends on
the SQL Statement - You Can Use Parameters in the Source Query to
Create Dynamic Source SQL Statements - If You Use the Source Query to Manipulate Data,
You Can Use the Copy Column Transformation to
Load Data into the Destination
21Applying SQL Solutions to Load Fact Tables
- Using the Source Query to Join Staging Table Data
to Dimension Tables - Retrieving Primary Key Values to Store as Foreign
Keys on the Fact Table - Using a Copy Column Transformation in the
Transform Data Task - Configuring Fast Load for SQL Server Destinations
22Loading the Fact Table
Source Data
DimensionTables
customer id
product id
order date
quantity_sales
amount_sales
ALFI
123
1/1/2000
400
10,789
123
1/1/2000
400
10,789
customer_dim
ALFI
201 ALFI Alfreds
Sales Fact Data
cust_key
prod_key
time_key
quantity_sales
amount_sales
product_dim
25 123 Chai
123
1/1/2000
400
25
134
400
10,789
201
- Identifying Dimension Application Key Values in
the Fact Table Source Data - Retrieving Primary Keys from Each Dimension Table
to Assign Foreign Keys
23Loading Sales_fact
- Extracting Data from the Sales_stage Table
- Assigning Foreign Keys by Retrieving Primary Keys
from the Product_dim, Customer_dim, and Time_dim
Dimensions
24Best Practices - Performing Inserts
- Bulk Insert Task
- Accessing data in files
- Loading data into SQL Server destinations
- Copying data with no transformations
- Transform Data Task
- Accessing any source
- Loading to any destination
- Creating data transformations
- Using input parameters in the source query
- Applying custom logic to phases of the data pump
25Best Practices - Performance Settings
- Tuning the Transform Data Task
- Fast load for SQL Server destinations
- Batch size
- Table lock
- Tuning the Bulk Insert Task
- Sort order for clustered indexes
- Batch size
- Table lock
26Best Practices - Executing Flexible Queries
- The Data Driven Query Task
- Execute flexible queries on a row-by-row basis
- Meet flexibility needs that outweigh performance
needs - Perform non-insert queries
- The Execute SQL Task
- Execute SQL statements and extended SQL
statements - Perform parameterized queries
- Assign query outputs to global variables
27Best Practices - Using Custom Tasks
- Creating Reusable Functions and Utilities
- Adding Functionality to DTS Package Designer
- Implementing a Faster Alternative to ActiveX
Script Tasks
28Best Practices - Creating Efficient Column
Mappings
- Minimizing the Number of Column Mappings
- Using Many-to-Many Mappings When Possible
- Grouping Common Transformations Together
29Best Practices - The Right Transformation Type
- Using Supplied Transformations When Possible
- Minimizing ActiveX Script Transformations When
Performance Outweighs Flexibility - Using SQL Solutions with Copy Column
Transformations - Developing Custom Transformations as a Faster
Alternative to ActiveX Script Transformations