Copying, Managing, and Transforming Data With DTS - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Copying, Managing, and Transforming Data With DTS

Description:

Quickly Loads Data from a File into SQL Server. Encapsulates the Transact-SQL Bulk Insert Statement ... Task supports single or multiple SQL statements ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 30
Provided by: v1gr
Category:

less

Transcript and Presenter's Notes

Title: Copying, Managing, and Transforming Data With DTS


1
Copying, Managing, and Transforming DataWith DTS
2
Defining Bulk Insert Task Functionality
  • Quickly Loads Data from a File into SQL Server
  • Encapsulates the Transact-SQL Bulk Insert
    Statement
  • Supports Table or View Destinations in SQL Server
  • Loads Data with No Applied Transformations
  • Supports Format Files to Specify File Layout
  • Requires Sysadmin or Bulkadmin Fixed Server Roles
    Membership

The Bulk Insert Task is One of Three Ways to Run
SQL Server Bulk Copy Operations
3
Sidebar SQL Server Bulk Copy Operations
  • What Do Bulk Copy Operations Offer?
  • Allow Fast Loading of Data into SQL Server
  • Configure Data Load Batches
  • Allow You to Control Logging Operations
  • Ways to Access Bulk Copy Operations
  • Bcp Utility
  • Bulk Insert Task or T-SQL Bulk Insert Statement
  • Bulk Copy APIs for OLE DB, ODBC, DB-Library
    Applications

4
Defining the Sales_stage Table Load
DTS
Polaris
Tab Delimited File
  • Using the Bulk Insert Task to Load Tab-delimited
    File Data into Sales_stage
  • Loading Sales_stage with Data Bound for Sales_fact

5
Defining Execute SQL Task Functionality
  • Executing SQL Statements
  • Source database must understand SQL syntax
  • SQL statement determines task performance
  • Task supports single or multiple SQL statements
  • You can create queries in the DTS Query Designer
  • Running Parameterized Queries
  • Input parameters
  • Output parameters

6
Using Parameterized Queries
  • Understanding Global Variable Basics
  • User-defined storage locations
  • Information is shared across package steps
  • Using Parameters with Global Variables
  • Assign global variable values to query input
    parameters
  • Store query results to a global variable with
    output parameters

7
Creating Dynamic Queries
SELECT FROM product_dim WHERE product_name
? AND category_name ?
  • ? Question Marks Represent Query Parameters

8
Storing Query Results
Storing Row Values
SELECT begin_date, end_date FROM
financial_period WHERE quarter 1
Global Variables
Output Parameter
BeginDate EndDate
begin_date end_date
Storing Entire Rowsets
SELECT FROM product
Global Variable
Output Parameter
Product
Entire Rowset
Store Query Results in Global Variables
9
Defining the Time_dim Data Load
Time_dim_build Stored Procedure
  • Input Parameters
  • _at_p_start_date
  • _at_p_end_date

10
Defining the DTS Data Pump
  • DTS Mechanism for Moving and Transforming Data
  • Allows for High-speed Batch Copying of Data
  • Contains Supplied Data Transformations
  • Can Also Define ActiveX Script Transformations
  • Provides An Extendable COM-based Architecture
    That Allows for Custom Transformations (C)
  • Permits the Application of Transformation Logic
    to Specific Phases of a Data Pump Operation
  • Multi Phase Data Pump

11
Understanding How the Data Pump Processes Data
X Forms
Source
Destination
ActiveX Script Copy Trim String Custom
OLE DB ODBC
DTS Data Pump
In
Out
  • Connects to the source and destination
  • Reads OLE DB metadata about source and
    destination columns
  • Gathers data transformation definitions
  • Implements the transformation
  • Writes completed record to the destination

12
Defining the Tasks That Transform Data
  • The Transform Data Task
  • Inserts
  • The Data Driven Query Task
  • Inserts
  • Updates
  • Deletes
  • The ParallelDataPumpTask
  • Processes hierarchical rowsets
  • The Transform Data Task
  • Inserts

13
Defining the Transform Data Task
  • Data Movement and Transformation Functionality
  • Copying data between heterogeneous data sources
  • Applying optional column level transformations
  • Extended Data Transfer Functionality
  • Supporting batch processing of data
  • Providing error-handling capabilities
  • Containing optimization settings for SQL Server
    destinations

14
Selecting Transformation Types
Transformation
Description
ActiveX Script
Invokes user-defined ActiveX scripts.
Copy Column
Copies data from source to destination.
DateTime String
Converts a date to a new destination format.
Lowercase String
Converts a string to lowercase characters.
Uppercase String
Converts a string to uppercase characters.
Middle of String
Extracts a sub string of source data.
Trim String
Removes white space from a source string.
Read File
Copies contents of a file to a destination
column. File path is specified by a source
column.
Write File
Copies contents of a source column to a file.
File path is specified by a second source column.
15
Defining Column Mappings
  • One-to-One Mappings
  • Symmetric Many-to-Many Mappings
  • Asymmetric Mappings

16
Creating Efficient Column Mappings
  • Minimizing the Number of Column Mappings
  • Using Many-to-Many Mappings When Possible
  • Grouping Common Transformations Together

17
Loading Customer_dim
Northwind OLTP SQL Server Database
18
Performance Settings
  • Enabling Fast Load
  • Using high-speed bulk copy processing
  • Accepting batches of transformed data
  • Only applies to SQL Server destinations
  • Using a Table Lock
  • Configuring Batch Size

19
Configuring Batch Size
  • Assembling Records into Groups
  • DTS commits records to database as a group
  • Insert batch size sets the number of records in
    the group
  • Understanding Default Behavior
  • Insert batch size is 0
  • DTS assigns one batch for all records
  • Setting the Insert Batch Size
  • Value between 0-9999
  • Setting value can improve performance

20
Defining SQL Solutions
  • You Can Use the Source Query of the Transform
    Data Task to Implement Data Transformations
  • The Source SQL Statement Must Be Understood by
    the Source Database
  • The Performance of the Source Query Depends on
    the SQL Statement
  • You Can Use Parameters in the Source Query to
    Create Dynamic Source SQL Statements
  • If You Use the Source Query to Manipulate Data,
    You Can Use the Copy Column Transformation to
    Load Data into the Destination

21
Applying SQL Solutions to Load Fact Tables
  • Using the Source Query to Join Staging Table Data
    to Dimension Tables
  • Retrieving Primary Key Values to Store as Foreign
    Keys on the Fact Table
  • Using a Copy Column Transformation in the
    Transform Data Task
  • Configuring Fast Load for SQL Server Destinations

22
Loading the Fact Table
Source Data
DimensionTables
customer id
product id
order date
quantity_sales
amount_sales
ALFI
123
1/1/2000
400
10,789
123
1/1/2000
400
10,789
customer_dim
ALFI
201 ALFI Alfreds


Sales Fact Data
cust_key
prod_key
time_key
quantity_sales
amount_sales
product_dim
25 123 Chai
123
1/1/2000
400
25
134
400
10,789
201

  • Identifying Dimension Application Key Values in
    the Fact Table Source Data
  • Retrieving Primary Keys from Each Dimension Table
    to Assign Foreign Keys

23
Loading Sales_fact
  • Extracting Data from the Sales_stage Table
  • Assigning Foreign Keys by Retrieving Primary Keys
    from the Product_dim, Customer_dim, and Time_dim
    Dimensions

24
Best Practices - Performing Inserts
  • Bulk Insert Task
  • Accessing data in files
  • Loading data into SQL Server destinations
  • Copying data with no transformations
  • Transform Data Task
  • Accessing any source
  • Loading to any destination
  • Creating data transformations
  • Using input parameters in the source query
  • Applying custom logic to phases of the data pump

25
Best Practices - Performance Settings
  • Tuning the Transform Data Task
  • Fast load for SQL Server destinations
  • Batch size
  • Table lock
  • Tuning the Bulk Insert Task
  • Sort order for clustered indexes
  • Batch size
  • Table lock

26
Best Practices - Executing Flexible Queries
  • The Data Driven Query Task
  • Execute flexible queries on a row-by-row basis
  • Meet flexibility needs that outweigh performance
    needs
  • Perform non-insert queries
  • The Execute SQL Task
  • Execute SQL statements and extended SQL
    statements
  • Perform parameterized queries
  • Assign query outputs to global variables

27
Best Practices - Using Custom Tasks
  • Creating Reusable Functions and Utilities
  • Adding Functionality to DTS Package Designer
  • Implementing a Faster Alternative to ActiveX
    Script Tasks

28
Best Practices - Creating Efficient Column
Mappings
  • Minimizing the Number of Column Mappings
  • Using Many-to-Many Mappings When Possible
  • Grouping Common Transformations Together

29
Best Practices - The Right Transformation Type
  • Using Supplied Transformations When Possible
  • Minimizing ActiveX Script Transformations When
    Performance Outweighs Flexibility
  • Using SQL Solutions with Copy Column
    Transformations
  • Developing Custom Transformations as a Faster
    Alternative to ActiveX Script Transformations
Write a Comment
User Comments (0)
About PowerShow.com