Title: August 11, 2003
1 31 Schema Flavors Build a Warehouse Session
37018 OracleWorld San Francisco 2003 Robert
Corfman Boeing Commercial Airplanes Information
Systems
August 11, 2003
2What is this Session?
- Background on the Boeing Tool Management Project
- COTS Package Enhancements
- COTS Limitations
- Warehouse Limitations / Overview Design
- Specific Code Examples / Techniques
- Replication Usage
- Detecting Schema Differences
- Error Capture, Propagation, Reporting
- Performance Issues
3Presentation Perspective
- Application Architect
- Responsible for technical application decisions
- Enterprise project and Puget Sound
responsibilities - Programmer
- Multiple technical facets
- COTS modification specs, delivery, testing,
deployment - Radio Frequency Implementation
- Data Conversion
- Legacy application interfaces
- Warehouse
Not a Warehousing Specialist . . .
431 Schema Flavors
- Project Background
- And
- Warehouse Design Decisions
How did we get where we are?
5Boeing Tool Standardization
- Perishable / Portable Tools and Shop Supplies
- Drill Motors, Safety Glasses, Reamers
Non-Production - Retirement of multiple legacy applications
- Mergers and acquisitions
- Robust home-grown applications
- Enterprise Commodity Standardization
- Internal customer base already communicating
- Enterprise application
- Loose federation of independent site projects
6Site Distribution
7COTS Package
- Site vs. Enterprise daily use vs. high leverage
- Picked best identified product forsite-level
usability - No Enterprise features in selected product
- Cross-site visibility, item linkage, or transfers
- Supplier to build-in needed features
- No customizations
Be Nimble, Keep the Product Standard
8Data Warehouse Design Decisions
- Dont break the site-level features/usability
- Main reason for selecting this product
- Integrate warehouse features into the product
- Allow continued site operations when warehouse is
unavailable - Drove decision not to integrate all sites into a
single Corporate-level database - Create separate roll-up database of all
participating sites
Site Operations ?? Company Asset
9What Data Goes in the Warehouse?
- Data Subsets? Data Summarizations?
- How do we decide what data to store? Summarize?
- Volatile decision, different for different
Companies - Drove decision to replicate all data into the
warehouse and not pre-define summarizations - Application runs on multiple databases
- Expected warehouse structure defined by COTS
- Decision on exactly how to get data into designed
warehouse left outside of COTS package - Additional tables and summaries could be
added,but without direct integration in COTS
package
Be Nimble, Keep the Product Standard
10Application Look and Feel
Site Data View
11Application Look and Feel
Warehouse Data View
12Application Look and Feel
Inventory Reports View
13Warehouse Design Considerations
- 7 x 24 Operations
- Allowed to be non-real-time
- Includes Oracle and SQL Server sites
- COTS package allows customization of site
- User-defined fields in standard tables
- Modification of field lengths from standard
- Warehouse needs to handlegreatest common
denominator - Desire to leverage replication
14Warehouse Design Options
- Updateable Snapshots
- Warehouse is the Master, each site an updatable
snapshot - Doesnt work if sites have schema variants
- Master control of data no longer at the sites
- Replication directly into Warehouse
- Same schema variant problem
- Multiple replicants
- Custom procedures to consolidate into Master
warehouse - Allows Replication with Master control at site
- Accommodates variation in site schemas
ETL software now considered a possibility . . .
15Proposed Warehouse Design
Warehouse Site
Company Data Warehouse
Site 1 (Unix)
Site 2 (NT)
Site 3 (NT)
Site N
Oracle
Oracle
SQL Server
Site Data
16Initial Issues with Design
- Could perform Roll-up via Views
- Create or replace view as
- Select 1 siteno,item,desc,null,null from
site1.table - Union Select 2,item,desc,cust1,null from
site2.table - Union Select 3,item,desc,null,cust2 from
site3.table - Dynamically shows replicated data changes
- Significant difference in query times
- Oracle 9.2 showed an Order of magnitude
difference in query times - Oracle 8.1.7 was significantly worse
- this was the version we were using when the
decision was made not to use views
17Initial Issues with Design
- Strong desire to have 7x24 data availability
- Complexity of rebuilding in place
- Difficult to update existing data . . .
- Leverage transparency of synonyms
- Maintain two copies of warehouse
- Rebuild, then re-direct
- If rebuild fails, dont re-direct
- Built-in contingency . . .
Be Prepared for Success . . . (or Failure)
18Final Warehouse Design
Warehouse Site
Company Data Warehouse
Warehouse Data 1
Warehouse Data 2
Site 1 (Unix)
Site 2 (NT)
Site 3 (NT)
Site N
Oracle
Oracle
SQL Server
Site Data
19Replication of Oracle Data
- Standard Oracle fast refresh functionality
- Used to move data to the master warehouse site
- Flexible schedule, avoid replication during
rollup - Dbms_snapshot.refresh vs. Refresh Groups
- Groups provided cross-table read consistency,
required single commit - Perceived difficulties in schedule variations
- Difficulty in automating snapshot validity prior
to refresh - Custom column additions/deletions, column size
changes - Snapshot refresh chosen, acceptance of potential
for minor data inconsistencies - Managed replication via packages
- One package set on snapshot-site, companion
package on site - Designed validate and rebuild snapshot logs and
snapshots as required
20Oracle Snapshot Verification
- Verify_snapshots_sql
- 'select distinct upper(site.table_name)'
- ' from user_tab_columns snap, '
- ' user_tab_columns_at_' db_link ' site'
- ' where upper(site.table_name)
upper(snap.table_name())' - ' and upper(site.column_name)
upper(snap.column_name())' - ' and upper(site.table_name) not like
''MLOG''' - ' and upper(site.table_name) not like
''RUPD''' - ' and upper(site.table_name) not in '
- ' (select upper(tablename) from exclude_whse_at_'
db_link ')' - ' and ('
- ' snap.column_name is null'
- ' or snap.data_type ! site.data_type'
- ' or nvl(snap.data_precision,38)!nvl(site.data_p
recision,38)' - -- default to 38 as replicated number
have precision of 38... - ' or nvl(snap.data_length,-1) !
nvl(site.data_length,-1)' - ' or nvl(snap.data_scale,-1) !
nvl(site.data_scale,-1)' - ' )'
Never Apologize . . .
21Replication of SQL Server Data
- Expected to have full refresh
- Site administrators has strong preference
against - Apparent Options
- Oracle Transparent Gateway
- Two-part Replication
- Microsoft Incremental replication to central LAN
- Full replication to warehouse
- Final Decision
- Site researched push incremental replication from
SQL Server to Oracle It works. - Site schema changes are manually handled
22Build a Warehouse
- More Code
- For
- Rolling the Schemas Together
How did we get there from here?
23Automation with Configuration
- Packaged functionality run with Oracle Jobs
- Table-driven configuration
- Which schemas are included
- Snapshot and Warehouse Tables build parameters
- with defaults and individual table overrides
- Primary Keys and Indexes
- Includes infrastructure features
- Logging
- Full error trapping
- Continued operation after many failures
- Statistical performance information
- Positive response email results
Keep the DBAs Happy . . . What goes around . . .
24Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
25Drop Sub-Warehouse Objects (Simple)
- PROCEDURE DROP_WAREHOUSE_TABLES IS
- cursor warehouse_tables is
- select table_name from user_tables order by 1
- mysql varchar2(255)
- BEGIN
- FOR tables_rec IN warehouse_tables LOOP
- mysql 'DROP TABLE ' tables_rec.table_name
- EXECUTE IMMEDIATE mysql
- END LOOP
- END DROP_WAREHOUSE_TABLES
26Drop Warehouse Objects-Add Logging
- PROCEDURE DROP_WAREHOUSE_TABLES IS
- cursor warehouse_tables is
- select table_name from user_tables order by 1
- mysql varchar2(255)
- me ERR_TYPE
- BEGIN
- Me.my_name 'DROP_WAREHOUSE_TABLES'
- log_activity(me,'Drop Warehouse tables
started','S') - FOR tables_rec IN warehouse_tables LOOP
- log_activity(me,'Dropping '
tables_rec.table_name) - mysql 'DROP TABLE ' tables_rec.table_nam
e - EXECUTE IMMEDIATE mysql
- END LOOP
- log_activity(me,'Drop Warehouse tables
completed','C') - END DROP_WAREHOUSE_TABLES
27Drop Warehouse Objects-Add Error Traps
- PROCEDURE DROP_WAREHOUSE_TABLES IS
- cursor warehouse_tables is
- select table_name from user_tables order by 1
- mysql varchar2(255)
- me ERR_TYPE
- BEGIN
- Me.my_name 'DROP_WAREHOUSE_TABLES'
- log_activity(me,'Drop Warehouse tables
started','S') - FOR tables_rec IN warehouse_tables LOOP
- log_activity(me,'Dropping '
tables_rec.table_name) - BEGIN
- mysql 'DROP TABLE '
tables_rec.table_name - EXECUTE IMMEDIATE mysql
- EXCEPTION WHEN OTHERS THEN
- INTERNAL_PROC_ERROR(me, SQLERRM, SQLCODE,
mysql) - END
- END LOOP
- log_activity(me,'Drop Warehouse tables
completed','C') - IF me.cnt_errors gt 0 THEN
28Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
29Table Driven Configuration
. . . 6 TULSA 7 EVERETT 8 RENTON 9
AUBURN . . .
Warehouse_Sites SiteNo Schema_Name
- Allows for Adding and Removing Sites as needed
PROCEDURE VERIFY_TABLES IS BEGIN Loop Through
Each Schema (With Error Trapping and Logging)
Verify_All_Schema_Tables(Schema_Name) End
Loop END VERIFY_TABLES
30Build Warehouse Tables
- Verify Schema Tables performs the major warehouse
structure build-up work - 31 Flavors . . . more similar than different
- First Schema builds majority of tables / columns
- Follow-on schemas modify structure as needed
- Greatest common denominator
- Loops through all table columns usingdata
dictionary information - Very similar to snapshot verification routines
- Column names, data type, length, precision, scale
31More Table Driven Configuration
Storage_Parms Table_Size Table_Storage
Index_Storage
Table_Sizes Table_Name Table_Size
STANDARD TABLESPACE WH128K_DATA STORAGE(INITIAL
128K NEXT 128K) NOLOGGING TABLESPACE WH128K_INDX
STORAGE(INITIAL 128K NEXT 128K)
NOLOGGING LARGE TABLESPACE WH4M_DATA
STORAGE(INITIAL 4M NEXT 4M) NOLOGGING TABLESPACE
WH4M_INDX STORAGE(INITIAL 4M NEXT 4M) NOLOGGING
Select ' ' Table_Storage into BUILD_PARMS
from Storage_Parms Where Table_Size
'LARGE' My_SQL Create_Table_Statement
BUILD_PARMS Execute Immediate My_SQL
Used when Adding a New Table
32Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
33Insert Mechanics
- Data Warehouse schema is the sum of every site
- Similar technique to identifying schema
- Loop through each schema and table
- Dynamically build Insert Statement
- Based on site columns
- Include unique ID for each schema
- Same logging techniques built-in
- Includes actual SQL statement
- Error trapping and continuation built-in
34Minimizing Archive Logs with NoLogging
- Archive logging in the production warehouse
instance - Some configuration data should be archived
- Normal Boeing Operation to have Archive Logs in
Production - Warehouse data is recoverable from sites
- Nologging specified on tablespaces and tables
- Oracle still logs changes and inserts
- Daily warehouse rebuild generates Significant
Archive Log activity - Big Surprise with initial production use
Ignorance is Bliss ???
35Using Bulk Inserts
- Bulk load can be specified on inserts
- Direct Load with sqlloader
- Append hint with SQL
- Only affects tables specified with NoLogging
- Insert / APPEND / into warehouse.inventory
(siteno, item, description)Select
1,item,description from site1.inventory - Invalidates session table use till commit
- ORA-12838cannot read/modify an object after
modifying it in parallel
36Warehouse Operations Log
Warehouse_Op_Log ID
Log_Date Proc Log_Type
Log_User Message Error
Log_Rows
1617893 08/08/03 020802 INSERT_SITE_TABLE_DATA
I CMPWH Inserting From WICHITA.STATION
0
0 1617894 08/08/03 020803 INSERT_SITE_TABLE_DAT
A I CMPWH Data Inserted from
WICHITA.STATION (5031 Rows) 0
5031
37Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
38Indexing Mechanics
- Table-driven primary key and index list
- Uses table storage size information for
tablespace and other storage clause information - Dynamically build Alter table and Create Index
Statements - Same logging techniques built-in
- Includes actual SQL statement
- Error trapping and continuation built-in
39Error Logging - Continuation
- select id, log_date, proc, log_type, message
from warehouse_op_log where id between 1576388
and 1576390 order by id - 1576388 08/01/03 021211 CREATE_PK_CONSTRAINTS
I - Create Primary Key STATIONHIST.PK_STATIONHIST(CRIB
BIN,SITENO) - 1576389 08/01/03 021223 CREATE_PK_CONSTRAINTS
E - ORA-02437 cannot validate (CMPWH1.PK_STATIONHIST)
- primary key violated (alter table
CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST
PRIMARY KEY (CRIBBIN,SITENO) USING INDEX
TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT
128K) NOLOGGING) - 1576390 08/01/03 021223 CREATE_PK_CONSTRAINTS
I - Create Primary Key TASK.PK_TASK(TASKNO,SITENO)
40Error Logging - Propagation
- 1570901 08/01/03 020002 REBUILD_WAREHOUSE
S - REBUILDING WAREHOUSE - CMPWH1
- 1576389 08/01/03 021223 CREATE_PK_CONSTRAINTS
E - ORA-02437 cannot validate (CMPWH1.PK_STATIONHIST)
- primary key violated (alter table
CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST
PRIMARY KEY (CRIBBIN,SITENO) USING INDEX
TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT
128K) NOLOGGING) - 1576418 08/01/03 021249 CREATE_INDEXES
E - ORA-20000 1 Error Occurred in CREATE_PK_CONSTRAIN
TS (see log for details). Last was ORA-02437
cannot validate (CMPWH1.PK_STATIONHIST) - primary
key violated - 1576471 08/01/03 022031 REBUILD_WAREHOUSE
E - ORA-20000 1 Error Occurred in CREATE_INDEXES
(see log for details). Last was ORA-20000 1
Error Occurred in CREATE_PK_CONSTRAINTS (see log
for details). Last was ORA-02437 cannot
validate (CMPWH1.PK_STATIONHIST) - primary key
violated - 1576728 08/01/03 024158 REBUILD_WAREHOUSE
C - REBUILT WAREHOUSE - CMPWH1 (1 error encountered)
41Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
42Warehouse Build Steps
- Drop sub-warehouse objects
- Cycle through snapshots and build /
modifysub-warehouse tables - Bulk insert each snapshot table into
sub-warehouse - Create primary keys and indexes
- Compute statistics
- Switch master warehouse synonyms to point
atnew sub-warehouse (upon successful
completion) - Email rebuild results
43Positive Email Response
- Email sent upon success and failure
- Catastrophic failure would result in no email
- Subject clearly states warehouse status
- Table-driven recipient list
- Uses dbms_smtp package
- Build higher-level packaged functionality
- Used for multiple purposes (such as interface
failures) - Email includes statistics regarding
- Last snapshot refresh times (system tables)
- Roll-up timing statistics (warehouse operation
log) - Warehouse table row counts (warehouse operation
log)
44Status Email Subject and Body
- WARNING Snapshots NOT Healthy, but Warehouse
Redirected - CMTWH1 - SUCCESS Production Warehouse Redirected - CMPWH2
- The CribMaster Oracle Data Warehouse was
successfully rebuild and activated - Last Refresh of Participating Sites
- MESA 07/31/03 014817
- KELLY 07/30/03 233445
- CECIL 07/30/03 231442
- . . .
- Attachment shows statistics
45Email Statistics Attachment
- Timing statistics for all major warehouse build
steps - 8/11/2003 20006 REBUILDING WAREHOUSE - CMPWH2
- 8/11/2003 20016 Dropping all tables in
Warehouse started - 8/11/2003 20031 Verify Tables for all Schemas
started - 8/11/2003 20211 Insert Table Data for all
Schemas started - 8/11/2003 20802 Create all Primary key
constraints started - 8/11/2003 21235 Create all indexes started
- 8/11/2003 22012 Counting Rows in CMPWH2 started
- 8/11/2003 22013 Analyze all tables in Warehouse
started - 8/11/2003 24152 Starting with REVOKE-CMPWH2
- 8/11/2003 24201 Completed REVOKE-CMPWH2
46Email Statistics Attachment
- Row count statistics for all warehouse tables
- TOTAL TABLE ROWS 15288750
-
- Row Counts for Significant Tables (rows over
100,000) - 8/11/2003 SERIALSTATUSHISTORY 7235395
- 8/11/2003 TRANS 2945618
- 8/11/2003 STATIONHISTORY 669932
- 8/11/2003 STATION 658313
- 8/11/2003 ALTVENDOR 549365
- 8/11/2003 ITEMSERIAL 546443
- 8/11/2003 EMPLOYEE 486319
- 8/11/2003 INVENTRY 359461
- . . .
47Failure Email
- ERROR Production Warehouse Rebuild Failed -
CMPWH2(All Snapshots ARE Healthy) - There was a problem rebuilding the warehouse.
Cutover did NOT occur.Please review the
warehouse_op_log for problems, correct them, and
establish the correct new warehouse.Statement
Executed was (BEGIN CMPWH2.WH_UTIL_MASTER.REBUILD_
WAREHOUSE('CMPWH2') END)Error was (ORA-20000
2 Error(s) Occurred in REBUILD_WAREHOUSE (see log
for details).Last was ORA-20000 6 Error(s)
Occurred in INSERT_ALL_DATA (see log for
details).Last was ORA-20000 105 Error(s)
Occurred in INSERT_SCHEMA_DATA (see log for
details).Last was ORA-20000 1 Error(s)
Occurred in INSERT_SITE_TABLE_DATA (see log for
details).Last was ORA-00942 table or view does
not exist)
48What to Take Away
- Work with COTS Supplier, Avoid Customizations
- Data needs in a warehouse
- Dont Over-Analyze, disk is cheap (now)
- Plan in failure contingency in Build / Rebuild
process - Table-drive the implementation
- Decisions and Structures change with time
- Provide Visibility of the Activity
- Ongoing logging and positive results notification
- Bulk Insert
- Communications key
- DBA, Developers, COTS Supplier, and Users
49Forever New Frontiers
Robert Corfman TSMS System Architect Robert.a.corf
man_at_Boeing.com Please turn in your evaluation
form