Title: Data Warehousing De-Mystified
1Data Warehousing De-Mystified
- Presented to the Annual
- Georgia Oracle Users Group
- Conference
- on March 20, 2006
- By Rusty Barnett
2Data Warehousing De-Mystified
- What do people mean when they call a grouping of
data a Data Warehouse? - Large Data Volume, Loaded using an ETL Process
(Extract, Transform, Load) - Data is Time-sensitive and Subject Oriented
- Enterprise Data Warehouse, Data Marts, ODS
(Operational Data Stores), DSS, EIS - Dimensional Model, Third-Normal Form
3Data Warehousing De-Mystified
- What do people mean when they refer to Data
Warehousing? - Dimensional Data Modeling
- ETL processing and administrating
- Business Intelligence Reporting
- Database managing and administrating
- As it pertains to a Data Warehouse
4Data Warehousing De-Mystified
- What is a Data Warehouse?
- "A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management's
decision making process. Bill Inmon, 1990. - "A data warehouse is the queryable presentation
resource for an enterprises data". Ralph
Kimball, page 19, "The Data Warehouse Lifecycle
Toolkit.
5Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
- A data warehouse is a database geared towards the
business intelligence requirements of an
organization. www.oranz.co.uk/glossary_text.htm - An information infrastructure that enables
businesses to access and analyze detailed data
and trends.www.adobe.com/products/vdp/glossary.ht
ml - A collection of integrated, subject-oriented
databases designed to support the DSS function.
The data warehouse contains atomic data and
lightly summarized data.it.csumb.edu/departments/
data/glossary.html - A data warehouse is a central repository for all
or significant parts of the data that an
enterprise's various business systems
collect.www.pdacortex.com/glossary.htm
6Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
Continued - The Data Warehouse is a central repository of
data that provides the MIT community with
integrated, up-to-date data from various
administrative systems. web.mit.edu/sapr3/docs/web
docs/glossary/glCD.html - An information repository from which queries
analysis are made.www.pcai.com/web/glossary/pcai_
d_f_glossary.html - This vast database stores information like a data
repository, but goes a step further, allowing
users to access data to perform research-oriented
analyses.www.payorid.com/glossary.asp - A collection of databases combined with a
flexible data extraction system.www.genpromag.com
/GlossaryLETTERD.html
7Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
Continued - Datawarehouse (database) A generic term for a
system for storing, retrieving and managing large
amounts of any type of data. www2.themanualpage.or
g/glossary/glo_d.php3 - The department or entity charged with collecting
organization-wide data, verifying its accuracy,
and analyzing, managing, and distributing it
throughout the organization. In organizations
without a data warehouse, each department may
collect, analyze, manage, and distribute the data
it needs for its operations.www2.uta.edu/ssw/trai
nasfa/glossary.htm - A repository made up of databases of data
extracted from a variety of sources, with a view
to analysis to reveal additional
information.www.gbc.t-online.hu/english/bszotare2
.htm
8Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
Continued - Central repository of data extracted from various
sources. The current CMSU DW is a read-only
system with extracted data from SIS, AFINS, HRS
and Enrollment Management.www.cmsu.edu/x18299.xml
- A collection of data and information from various
source systems.www.gov.bc.ca/prem/popt/service_pl
ans/srv_pln/pssg/appen_a.htm - A subject-oriented non-volatile collection of
data used to support strategic decision making.
The warehouse is the central point of data
integration for business intelligence. It is the
source of data for data marts within an
enterprise and delivers a common view of
enterprise data.publib.boulder.ibm.com/tividd/td/
TEDW/GC32-0744-01/en_US/HTML/insmst342.htm
9Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
Continued - A logical collection of information, gathered
from many different operational databases, that
supports business analysis activities and
decision-making taskswww.321site.com/greg/courses
/mis1/glossary.htm - A data warehouse is, primarily, a record of an
enterprise's past transactional and operational
information, stored in a database designed to
favor efficient data analysis and reporting
(especially OLAP). Data warehousing is not meant
for current, "live" data. en.wikipedia.org/wiki/D
ata_warehouse -
10Data Warehousing De-Mystified
- Definitions of Data Warehouse on the Web
Continued - A very large repository of data comprising nearly
all of a companys information.www.mosaictec.com/
storage/storage_terms.htm - A large database capable of storing all the
information possessed by a large
organization.www.jqjacobs.net/edu/cis105/concepts
/CIS105_concepts_13.html - A database used for storing historical data,
which is used for data analysis.docs.rinet.ru/O8/
glossary.htm - One or more data stores originating from prime
authoritative data sources by an auditable
replication process.www.data-core.com/glossary-of
-terms.htm
11Data Warehousing De-Mystified
- Goal of Data Warehousing
- A System that displays Large Volumes of Data as
Business INFORMATION!
12Data Warehousing De-Mystified
- Terms
- Data Warehouse ---The term Data Warehouse was
coined by Bill Inmon in 1990 - Enterprise Data Warehouse --- Accumulation of
data from all significant areas of an Enterprise - Data Mart --- A subset of a data warehouse, for
use by a functional area (department) on a
subject - Dimensional Modeling --- A design technique that
seeks to present the data in a framework thats
intuitive, allows for high-performance access,
and resistance to change - ERD --- Entity Relational Diagram
13Data Warehousing De-Mystified
- Terms
- Schema --- A collection of database objects,
including tables, views, indexes, and synonyms - Star Schema --- A method of designing tables with
a Fact Table in the middle and Dimension Tables
around it a standard technique for designing and
building multi-dimensional databases - Snowflake Schema --- Star Schema with child
Dimensions - Star Query --- A join between a Fact table and a
number of Dimension tables, where the Dimension
tables are not joined to each other, only to the
Fact table
14Data Warehousing De-Mystified
- Terms
- Fact Table --- Primary table in Dimensional Model
that is meant to contain measurements of the
business, whose most useful columns are numeric
and additive - Dimension Table --- One of the set of companion
tables to a Fact Table, whose columns are usually
textual and are used for constraining, grouping,
ordering within queries - Grain --- The level of the data, like Weekly or
Yearly - Measures --- The numeric columns in a Fact Table
- Audit Columns --- Columns specifying change
(who,when)
15Data Warehousing De-Mystified
- Terms
- Conformance --- A Dimension is said to be
conformed if it can be used by more than one Fact
Table and/or Data Mart and/or subject area, i.e.,
shareable - Associative (Intersection) Table --- A table that
is used to associate a value from one table to a
value in another table in order to reduce
many-to-many relationships between tables - Staging Tables --- Tables used to stage data
along the way toward loading data, especially
useful in ETL - ETL --- Extract, Transform, and Load also a
Verb, meaning to load data into one system from
data in another
16Data Warehousing De-Mystified
- Data Warehousing Roles
- Business Users --- People with knowledge of the
business and needs for reporting - Report Writers --- People who write reports
(Developers) tools for writing reports - Data Modelers --- People who create data models
from Business Requirements - ETL Developers --- People who write / develop ETL
processes - DBA --- People who manage and support databases
17Data Warehousing De-Mystified
- Tips for Data Modelers
- Understanding the capabilities of the ETL tool,
the RDBMS, and the Reporting tool, is just as
important as understanding the needs of the
business and the users requirements - All Data Warehouse primary keys (PK) should be
single-column surrogate keys (sequence) - Keep Referential Integrity in the DW
- Use Associative tables to handle MM
- Order columns in tables by their fill factor
18Data Warehousing De-Mystified
- Tips for ETL Developers
- Embrace the use of multiple Staging Tables
- Primarily INSERT data in each ETL process step,
especially into the Staging Tables - Invalidate Indexes (other than Primary Keys,
Unique Keys, and Foreign Keys) before
loading/updating massive amounts of data. After
the data is loaded/updated, then rebuild these
Indexes - Be careful in usage of Oracle ROWNUM
- Test new features, like 10gs MERGE enhancements,
Pipeline Functions, Grouped Table Outer Joins,
Conditional Update/Insert/Delete statements, etc.
19Data Warehousing De-Mystified
- Tips for Report Developers
- Avoid using Hints in your SQL
- Consider using Materialized Views for reporting
on a common set of pre-processed data - Be very careful (in fact, be hesitant) in using
normal Views in Data Warehouses - Consider using the SQL clauses intersect,
minus, and union all in set processing - JOINs are expensive --- NOT, I/O is expensive
- Test new features, like 10gs Connect_by, MView
and MERGE enhancements, Pipeline Functions,
Grouped Table Outer Joins, Time Series Interrow
Calcs, etc.
20Data Warehousing De-Mystified
- Tips for DBAs
- Use as much of each Data Block as possible
(PCTFREE, PCTUSED) - Set optimizer_index_cost_adj to lower than 100
- Use ARCHIVELOG mode RMAN for backups
- Collect Statistics in best way for your Oracle
Release - Test Benefits of Parallel Query and Query
Re-write - Test new features of RDBMS --- 10g features like
BFT, new Flashback, Security, Partitioning, and
RMAN features, Improved VLDB support, etc.
21Data Warehousing De-Mystified
- Current Debates in Data Warehousing
- Fresh Data versus Live Data
- Dimensional Model versus Third-Normal Form
- Is a Snowflake schema bad for performance?
- Time spent to physicalize the Data Model?
- Who is responsible for Data Model?
- Manage DB via Data Model? (OWB)
- When to partition and when not to?
- Acceptance of Level 3 Changing Dimensions
22Data Warehousing De-Mystified
- Current Standards DW Tools
- Database --- Oracle 9i / 10g
- ERD Data Modeling --- ERwin (Oracle Designer)
- ETL --- Informatica (OWB)
- Report Writing --- Oracle Discoverer,
MicroStrategy, Business Objects, Cognos -
23Data Warehousing De-Mystified
- Questions
- And
- Answer
- Session
24Data Warehousing De-Mystified