Do you ELT on z? - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Do you ELT on z?

Description:

Designing physical database models (reverse/forward engineering) Designing OLAP objects ... Reverse engineer from an existing Database Explorer connection or from DDL ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 57
Provided by: labu367
Learn more at: http://mwdug.org
Category:
Tags: elt | rever

less

Transcript and Presenter's Notes

Title: Do you ELT on z?


1
Do you ELT on z?
  • Mar 16. 2009

2
Agenda
  • Design Studio Overview
  • Development SQW Flows using Design Studio
  • Physical Data Modeling
  • Data Flows
  • Control Flows
  • Deployment Runtime Management

3
Inhibitors to data warehousing on System z?
  • DB2 functionality?
  • DB2 V8 and V9 great strides
  • Costs?
  • Specialty processors help IIPs
  • Lower cost licensing for new DB2 workloads DB2
    VUE
  • Perception?
  • Conventional wisdom over last decade use
    distributed sytems for Data Warehouses
  • A lot of shops defied conventional wisdom!
  • Lack of skills and/or appropriate data warehouse
    building tooling on System z
  • Fewer people have skills on System z?
  • Recent graduates grew up using graphical tooling!
  • No green screen for them
  • IBM recognizes this and is porting tooling to
    System z

4
Data Movement and Transformation
  • Moving and transforming data is a key component
    to building a data warehouse
  • What do you use on System z?
  • Programs? COBOL? Rexx?
  • Database utilities? Import, Export?
  • SQL scripts?
  • Stored procedures?
  • FTPs?
  • ????

5
Alphabet Soup for Tools
E Extract T Transform L Load
  • ETL
  • ELT
  • ETLT
  • TELT
  • TETLT
  • What???
  • Just MarketingSpeak of stating how and where the
    data movement and transformation activities
    occur
  • Leverage an stand-alone transformation engine
  • Leverage a database engine for data
    transformations

6
ETL Extract Transform Load
Usually a stand-alone server separate from any
source or target systems
Referred to as an ETL Engine that performs all
extracts and transformations
Database agnostic
May push some processing to source databases
May invoke database utilities
Usually a procedural design slant
7
ELT Extract Load Transform
Extract
Transform
Load
Turns the relational database engine into a data
movement and transformation engine
The work is usually done at the target system
Can source from multiple database types but
typically supports a specific target
May push some processing to source databases
May invoke database utilities
By definition, has a set-based design slant
8
Variations on a theme
  • ETLT, TETLT
  • ETL tools are now pushing some processing down
    into the source and/or target relational
    databases by generating SQL
  • TELT, TETLT
  • ELT tools can typically push SQL to remote
    databases
  • ELT tools can do some limited non-relational
    processing typically by calling executables or
    scripts

9
Reality
  • Dont get caught up in the ETL vs ELT wars
  • Advantages to doing some work outside the
    database
  • Advantages go doing some work inside the database
  • Dont build an ETL or an ELT system
  • Build an architected Population Subsystem
  • Apply the appropriate tools to the appropriate
    function for the available (and future) skill
    level
  • ETL tools when appropriate
  • ELT tools when appropriate
  • Other tools when appropriate

10
A Population Subsystem
  • Consider using an architected model for building
    population subsystems
  • Each logical layer performs a specific kind of
    function and processing is encapsulated to that
    layer
  • Logical layers are grouped into physical
    components and a staging of data occurs between
    physical components
  • Each physical component will be implemented with
    technical functions implemented by one or more
    technologies

11
Population Subsystem
  • Consider using an architected model for building
    population subsystems
  • Each logical layer performs a specific kind of
    function and processing is encapsulated to that
    layer
  • Logical layers are grouped into physical
    components and a staging of data occurs between
    physical components
  • Each physical component will be implemented with
    technical functions implemented by one or more
    technologies

From IBM course DW130
12
Tools on System z (z/OS or Linux on System z)
  • Data Movement and Transformation
  • ETL InfoSphere DataStage for Linux on System z
  • Data Cleansing InfoSphere Quality Stage for
    Linux on System z
  • ELT InfoSphere Warehouse on System z SQL
    Warehousing Tool
  • Expanded Sources
  • Heterogeneous data access InfoSphere Federation
    Server
  • Classic Data Sources InfoSphere Classic
    Federation Server for z/OS
  • Capturing Changes
  • SQL and Q-based data replication InfoSphere
    Replication Server for z/OS
  • Classic data source replication InfoSphere
    Classic Replication Server for z/OS
  • DB2 event publishing InfoSphere Data Event
    Publisher for z/OS
  • Class event publishing InfoSphere Classic Data
    Event Publisher for z/OS

13
ELT on System z
  • InfoSphere Warehouse on System z
  • SQL Warehousing Tool
  • (SQW)

14
InfoSphere Warehouse on System z
MQT Advisor
Eclipse
  • Client Layer
  • Design and admin client
  • BI / Reporting tools and apps

IE/Firefox
Excel
Cognos 8 BI for System z
Third Parties / BPs
MDX
SQL
JDBC/DB2 Connect
JDBC/DB2 Connect
WebSphere App Server
Cubing Services Engine
Administration
SQW Runtime
Application Server
Linux on System Z Partition / IFL
JDBC/DB2 Connect
Data Warehouse Server
Cube Metadata
DB2 for z/OS
Control DB
MQT
Source Systems
15
Design Studio
16
InfoSphere Warehouse Design Studio Key features
  • (IDE) Integrated Development Environment for DB2
    Warehouse projects
  • Integrated consistent and interoperable tools
    for
  • Connecting and browsing databases
  • Exploring data
  • Designing physical database models
    (reverse/forward engineering)
  • Designing OLAP objects
  • Designing data movement and transformation flows
  • The platform is extensible (Eclipse based) and
    can be easily extended with third party or
    customer developed plug-ins

17
Getting Started The Workspace
  • A workspace is
  • A directory on the local file system where the
    projects (i.e. Metadata) created in the Design
    Studio are stored as XML files.
  • The Design Studio GUI.

Metadata
18
The Business Intelligence Perspective
  • The Business Intelligence Perspective is the
    default perspective in the Design Studio.
  • It contains the views which are useful during the
    development of a Data Warehousing project.
  • Views can be moved/stacked by using drag and
    drop. They can be maximized by double-clicking on
    their title.
  • View can be closed and reopened. To add a new
    view to the perspective, use the menu bar Window
    -gt Show View

Object Palette
Editor(s)
Data Project Explorer View
Outline View
Data Output View
Problems View
Data Source Explorer View
Properties View
19
Data Source Explorer
  • Define live jdbc connections to relational
    sources
  • Could be nicknames if Federated Server installed
  • Browse and work with objects in the live
    database
  • Live connection required for many Design Studio
    operations
  • Reverse engineering data models
  • Test execution of flows
  • Sampling data

20
Team component
  • Version and configuration management
  • Share resources with team via a repository
  • The Design Studio includes a CVS repository
    provider
  • Other repository providers can be used by
    installing the plugins provided by the repository
    vendors
  • Rational Clearcase
  • IBM CMVC
  • Merant PVCS Version Manager
  • see eclipse.org community page for a list of
    the available plugins.

Repository
Check in/out
Check in/out
21
Integration with other IBM Tools Eclipse Shell
Sharing
  • Share the core Eclipse components so that they
    are not duplicated between each Eclipse-based
    product. Shell sharing eliminates the need to
    install several Eclipse platforms for each
    product, thus saving disk space and eliminating
    duplication of components.
  • Supported products that shell share with the
    Design Studio
  • Data Studio Developer (DSD) v2.1
  • Data Studio Administrator (DSA) v2.1
  • InfoSphere Developer Architect (IDA) v7.5.1
  • Data Studio Optimization Expert for z/OS (DSOE)
    v2.1
  • Rational Architect Developer (RAD) v7.5.1
  • Rational Software Architect (RSA) v7.5.1

22
Physical Data Modeling
23
Data Models
  • Two types of data models
  • Logical The Business representation of data
    without regard to underlying DBMS
  • Physical The representation of the data as it
    would appear in the DBMS
  • Design Studio supports development of physical
    data models
  • Metadata representation of actual objects that
    are present in the DBMS
  • Create from scratch
  • Reverse engineer from existing database or DDL
  • Physical data model required to provide database
    metadata to other SQW components

24
Data modeling overview
  • Design and modify database physical models
    (schema storage design, as well as cubes,
    dimensions, hierarchies)
  • Key Features
  • Create a new DB design from scratch
  • Reverse engineer from an existing Database
    Explorer connection or from DDL
  • Create overview diagrams
  • Modify the schema graphically or in the project
    tree
  • Compare DB objects with each other or with
    objects existing in the database
  • Analyze design (best practices, and
    dependencies), Validation
  • Generate DDL script Deploy
  • Impact Analysis
  • DB2 Storage Modeling Table Space, Buffer Pool,
    Partition

25
Design Studio vs InfoSphere Data Architect
  • Design Studio includes a subset of functionality
    provided in InfoSphere Data Architect (IDA).
  • Design Studio includes the physical data modeling
    and corresponding SQL generation capabilities to
    help you implement and modify to your physical
    model.

Logical data modeling Naming model Glossary
model Other non-LUW advanced physical data
modeling Web publishing and report Mapping
editor UML - LDM transformation
IDA
Data Project Explorer Database Explorer Complete
Physical Data Modeling for DB2 Basic Physical
Data Modeling for others Impact Analysis
Design Studio
26
Design Studio - Physical Data Model
Data Project
Physical data model
Database one per model
Schema
Diagrams logical folder one per schema
Diagram
SQL statements logical folder one per schema
Table
Primary key column
Column
Primary key constraint
Unique constraint
Index
27
Physical Data Model - Diagram
Hide/show palette
Geometric shapes
Palette select the element to create on diagram
Drawing area
28
Embedded Data MovementSQL Warehouse Tool(SQW)
29
SQL warehousing tool (SQW)
  • Build and execute intra-warehouse (SQL-based)
    data movement and transformation services
  • Integrated Development Environment and metadata
    system
  • Model logical flows of higher-level operations
  • Generate code and create execution plans
  • Test and debug flows
  • Package generated code and artifacts into a data
    warehouse application
  • Integrate SQW Flows and DataStage jobs
  • Generate DB2 z/OS specific optimized SQL code
    (Data Flows)
  • DB2 z/OS specific operators
  • DB2 z/OS specific code generation
  • Across query optimization
  • Predicates pushdown and move around
  • Unnecessary column reduction
  • Staging table handled automatically by the engine
  • Integrate SQL based flows with non-database
    activities (Control Flows)
  • Sequence and manage activity flow

30
Data Flows
31
Data flows
  • Data flows are flow models that represent data
    movement and transformation requirements
  • SQW Codegen translates the models into
    repeatable, SQL-based warehouse building
    processes
  • Data from source files and tables moves through a
    series of transformation steps then loads or
    updates a target table or creates a file

32
Would you rather type this ?
. SELECT SALES.OU_IP_ID AS STR_IP_ID,
SALES.PD_ID AS PD_ID, SALES.MSR_PRD_ID AS
TIME_ID, SALES.C_D_MKT_BSKT_TXN_ID AS
NMBR_OF_MRKT_BSKTS, SALES.SUM_NBR_ITM AS
NUMBER_OF_ITEMS, CASE WHEN
SALES.M_BK_PD_SUB_DEPT_NM IN ('BATH AND SHOWER',
'CAMERAS') THEN SALES.BKP_SUM_NBR_ITMXPR
C DECIMAL(MARTS.RAND1N(5) 123) / 100
ELSE SALES.BKP_SUM_NBR_ITMXPRC
DECIMAL(MARTS.RAND1N(5) 102) / 100 END AS
PRDCT_BK_PRC_AMUNT, CASE WHEN
SALES.MIN_CG_PD_DEPT_NM IN ('TEEN BOYS', 'TEEN
BOYS JEANS', 'DRESS FORMAL','MEN SHOES')
THEN (DECIMAL(68 - MARTS.RAND1N(5)) /
100) SALES.SUM_CG_NBR_ITMX_PRC WHEN
SALES.MIN_CG_PD_DEPT_NM IN ('ELECTRICAL
APPLIANCES','ELECTRONICS','COLORED
TELEVISIONS','WOMEN SHOES') THEN
(DECIMAL(77 - MARTS.RAND1N(5)) / 100)
SALES.SUM_CG_NBR_ITMX_PRC WHEN
SALES.MIN_CG_PD_DEPT_NM IN ('HEALTH AND BEAUTY')
THEN (DECIMAL(65 - MARTS.RAND1N(5))
/ 100) SALES.SUM_CG_NBR_ITMX_PRC ELSE
(DECIMAL(72 - MARTS.RAND1N(5)) / 100)
SALES.SUM_CG_NBR_ITMX_PRC END AS
CST_OF_GDS_SLD_CGS, SALES.SUM_NBR_ITMXSTM_PRC AS
SALES_AMOUNT FROM SALES)
33
would you rather describe your logic at a
higher-level ?
A simple Skills Star schema
34
and have optimized SQL generated for you?
  • INSERT INTO OLAPANL.STAR_FACT_TABLE
  • (ID, COMPANY_ID, TIME_ID, SKILL_DETAILS_ID,
    NB_SKILLS)
  • WITH INPUT_04 (COMPANY_NAME, TIME, ID, SKILL_CAT,
    SKILL_DETAILS, SKILL_ID)
  • AS (
  • SELECT
  • COMPANY_NAME AS COMPANY_NAME,
  • TIME AS TIME,
  • ID AS ID,
  • SKILL_CAT AS SKILL_CAT,
  • SKILL_DETAILS AS SKILL_DETAILS,
  • SKILL_ID AS SKILL_ID
  • FROM
  • TXTANL.IT_SKILLS_ASKED INPUT_0281),
  • IN4_07 (ID, SKILLS_PER_OFFER)
  • AS (
  • SELECT
  • INPUT_04.ID AS ID,
  • COUNT() AS SKILLS_PER_OFFER
  • FROM

35
Data Flow Operators
Most operators same as in LUW versions but
generate DB2 z/OS specific SQL
  • Sources Targets
  • Table Source (Local and Remote)
  • Table Target (Local and Remote)
  • Data Set Import
  • Data Set Export
  • SQL Query Source
  • Data Station
  • SQL Transformation Operators
  • Select List
  • Distinct
  • Group By
  • Order By
  • Table Join
  • Where (Filter)
  • Union
  • Warehouse Operators
  • Fact Key Replace
  • Key Lookup
  • Pivot
  • Unpivot
  • Splitter
  • Custom Tranformations
  • Custom SQL
  • DB2 Table Functions
  • DB2 User Defined Functions

36
Data Flows that call DB2 z/OS utilities
  • Data Set Import Operator
  • Invokes the load utility to load data in a target
    table from a data set
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU
  • Data Set Export Operator
  • Invokes the unload utility to unload data from a
    table to a BSAM sequential data set.
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU
  • Cross Loader Operator
  • Invoke load utility to directly load the output
    of a dynamic SQL SELECT statement into a table
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU

37
Execution Database
  • The DB2 for z/OS subsystem to which the generated
    SQL of a dataflow is submitted
  • Sources/Target tables are local when in the
    execution subsystem, otherwise remote

Blue Source Table Green Target Table
Control Data
38
Local Source Local Target
  • Both Source and Target tables are in the same DB2
    for z/OS subsystem
  • SQL submitted to DB2C
  • No data flows outside of DB2C

Blue Source Table Green Target Table
Control Data
SQL processing
Execution database DB2C
39
Remote Source Table via jdbc Local Target
  • Source table in remote database accessed as
    remote Table Source operator
  • Java application runs on Linux connects to DB2A
    and DB2C
  • Data flows throught Linux

Blue Source Table Green Target Table
Control Data
SQL processing
Execution database DB2C
40
Remote Source DB2 via Cross Loader Local Target
  • Source table in remote DB2 database accessed as
    remote Table Source operator but using Cross
    Loader Target operator
  • Cross Load utility invoked at DB2C
  • Uses DDF to access remote DB2 table over DRDA

Blue Source Table Green Target Table
Control Data
SQL processing
Execution database DB2C
41
Local Source Table Remote Target via jdbc
  • Source table in local DB2 z database and target
    is defined as remote Table Target operator
  • Java application runs on Linux connects to DB2A
    and DB2C
  • Data flows throught Linux

Blue Source Table Green Target Table
Control Data
SQL processing
Execution database DB2C
42
Remote Source Table Remote Target Table via jdbc
  • Source and Target tables are remote
  • Java application runs on Linux connects to DB2A
    and DB2C
  • Data flows through Linux
  • All SQL processing is in DB2 z _at_ DB2C

Blue Source Table Green Target Table
DB2A
Control Data
DB2C
SQL processing
Execution database DB2C
43
Other data flow features
  • Variables
  • Variables can be used in Data Flows
  • Defer the definition of certain properties until
    a later phase in the life cycle.
  • File Names
  • Table Names
  • Database Schema Names
  • Many more
  • Generalize a Data Flow
  • Subflows
  • A subflow is a predefined set of operators that
    you can place inside a data flow.
  • Useful as a plugin into multiple versions of the
    same or similar data flows
  • Containers or building blocks for complex flows
    (division of labor)
  • Blue ports represent subflow inputs and outputs

44
Control Flows
45
Definition and simple example
  • A control flow is a flow model that sequences one
    or more data flows and integrates other data
    processing tasks and activities.
  • Control flows are the unit of execution.
  • This simple example processes two data flows in
    sequence. If they fail, e-mail is sent to an
    administrator

46
Control Flow Operators
  • Task-oriented operators (Do things)
  • Data flow
  • Subprocess
  • JCL Job
  • Command (DB2 Shell/FTP)
  • Secure Command
  • Secure FTP
  • Email
  • Period row generator
  • Load
  • Unload
  • Reorg
  • Runstats
  • Table Partition
  • Stored procedure
  • DataStage job sequence
  • DataStage parallel job
  • Custom SQL

47
Control Flow Operators
  • Flow control operators (Manage things)
  • Parallel Container
  • Start
  • End
  • Iterator/End Iterator
  • Continue
  • Break
  • Fail
  • File wait
  • Variable assignment
  • Variable comparison
  • File Write

48
CF Operator Introduction(1)
  • Exchange Operator
  • Switch the contents of a base table and its
    associated clone table.
  • Exchange operation is not supported when it runs
    against a DB2 z/OS version 8 database.
  • Table Partition Operator
  • Perform table partition operation
  • Adding a partition
  • Rotating partitions
  • Changing partition boundary
  • Runstats Operator
  • Update the system catalog statistics for DB2 for
    z/OS database through the DB2 RUNSTATS utility.
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU

49
CF Operator Introduction(2)
  • Reorg Operator
  • Reorganize a table space or an index for DB2 for
    z/OS database through the DB2 REORG utility.
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU
  • Unload Operator
  • Unload data from an entire table space or select
    table, columns to the BSAM sequential data
    sets.
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU
  • DB2 Online Utility Operator
  • Runs any DB2 for z/OS utility that can be invoked
    by the stored procedure DSNUTILU
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU

50
CF Operator Introduction(3)
  • Load Operator
  • Four load approaches
  • Load from Z/OS data set
  • Load from cursor
  • Load from file on client side (Using FTP to
    transfer file to Z/OS data set)
  • Load from file on client side (Using FTP to
    transfer file to Z/OS BATCHPIPES data set)
  • Call DB2-Supplied stored procedure
    SYSPROC.DSNUTILU for 1,2,3 approaches, Call
    ADMIN_JOB_SUBMIT, ADMIN_JOB_QUERY,
    ADMIN_JOB_FETCH, ADMIN_JOB_CANCEL for 4 approach.
  • Command Operator (FTP)
  • Advanced Options Tab has been added ,user could
    specify following additional options for Z/OS
    file transfer
  • Record length
  • Record format
  • Data transfer type

51
CF Operator Introduction(4)
  • JCL Operator
  • JCL a control language that is used to identify
    a job to an operating system and to describe the
    job's requirements.
  • JCL Operator works to submit a job, query the
    status of the job, fetch the output of a job and
    purge a job.
  • Support three scenarios
  • JCL on local machine
  • JCL on Z/OS side
  • New edited JCL
  • Call DB2-Supplied stored procedures
    ADMIN_JOB_SUBMIT, ADMIN_JOB_QUERY,
    ADMIN_JOB_FETCH, ADMIN_JOB_CANCEL, ADMIN_DS_BROWSE

52
Deployment and Runtime
53
Deployment
The process of promoting a Warehouse Application
from a development environment to Test and
Production environments.
Design / Test
Prepare Application
Development
Test
Deploy / Install
Manage / Execute
Deployment preparation is done in the Design
Studio Deployment is done via the Administration
Console
54
InfoSphere Warehouse on System z
  • Administration Console manage the runtime
    environment
  • Deploy data movement applications
  • Schedule, Execute, Monitor flows
  • Define and manage cube servers
  • Manage OLAP Metadata
  • Assign cubes to cube servers

54
55
Admin Console - SQW
56
Contact Info z Warehouse SWAT Team
  • Mgr Beth Hamel hameleb_at_us.ibm.com
  • Andy Perkins aperkin_at_us.ibm.com
  • Jonathan Sloan jonsloan_at_us.ibm.com
  • Sundari Voruganti svoruga_at_us.ibm.com
  • Willie Favero wfavero_at_us.ibm.com
Write a Comment
User Comments (0)
About PowerShow.com