Title: Data Warehouse Architecture
1Data Warehouse Architecture
2Lecture Objectives
- Define Data Warehouse Architecture
- Define Data Warehouse and Data Mart
- Discuss The Great Debate
- Present a Data Warehouse Architectural Framework
3Information Systems Architecture
- Information Systems Architecture is the process
of making the key choices that are essential to
the development of an information system.
Architecture includes - Guiding Principles
- Approaches/philosophies
- Logical representations of a system
- Hardware/Operating System
- Computing model client/server vs traditional vs
Web-based - Tools and technologies
- It is key, when making these choices that they
are - Requirements driven
- Take into consideration operational, technical
and financial feasibility - Made within an architectural framework
4Architecture Drivers
- There are a lot of Drivers of Architecture
Architecture
5Architecture to Design to Implementation
Looking at the Work of Developing a System as a
set of Choices, Architecture can be Described as
Highest Level Choices
Business Strategy Line of Code / Process
Step Employee / Computer
Removal of Choices
Architecture
Design
Implementation
6How is Architecture Different from Design?
- Its not Architecture can be considered
high-level design - Architecture includes those aspects of the design
that are essential to the information system - Architecture Example
- Users must be able to self-serve (guiding
principle) - We will use a hub and spoke design where data
will be placed in a central data warehouse, then
be propagated to one or more data marts.
(approach) - We will normalize data in the central warehouse
and use a dimensional design in the data marts
(approach) - We will use Oracle 8i as our DBMS (technical
architecture)
7Architecture vs Design
- Not Architecture
- The Order subject area will be composed of the
following tables order_fact, customer_dim,
product_dim and time_dim - The customer_dim table will have the following
attributes.
8The Value of Architecture
- Communication
- To business sponsors, and business users
- Between members of the project team
- Planning
- Cross Check for Project Plan
- Ensure that all important components of the data
warehouse are accounted for - Flexibility and Growth
- Thinking about overall architecture will reduce
risk associated with the success of the data
warehouse - Learning
- Productivity and Reuse
9Whats different about DW Architecture?
- Transaction processing systems growth is
(relatively) predictable - Example
- A company uses SAP for order processing
- They are opening a new retail store
- They predict (based on experience) 2000
transactions per week - To process this volume, we need 3 workstations to
capture the transactions - Peak time each day is 11-2 when 50 of
transactions occur
10Whats Different About Data Warehouse
Architecture?
- Success drives explosive growth
- More users
- More (complex) queries
- More data
- Performance is unpredictable
- Unpredictable queries
- Unpredictable use patterns
Data Warehouse
SAP R/3
Growth
Siebel
Time
11The Great Data Warehouse Architecture Debate
- Bill Inmon The enterprise data warehouse
- Ralph Kimball data marts
- The compromise Hub and Spoke or Federated
models
If you build it, They will come
12What is a Data Mart?
- A data mart is a collection of subject areas
organized for decision support based on the
specific needs of a given user group. - Each mart may widely different from others (as we
will see) - Typically, data marts are built on the
dimensional data model - Facts things that the organization wants to
measure revenue, orders, shipments, purchases,
etc. - Dimensions the means by which the organization
wants to analyze the measures (facts) by
customer, by time, by product BY ANY
COMBINATION!!
13What is a Data Mart?
- There are two kinds of data marts--dependent and
independent. - A dependent data mart is one whose source is a
data warehouse. - An independent data mart is one whose source is
the legacy applications environment. All
dependent data marts are fed by the same
source--the data warehouse. Each independent data
mart is fed uniquely and separately by the legacy
applications environment. - Dependent data marts are architecturally and
structurally sound. - Independent data marts have a number of
significant issues
14Data Warehouse Definition 3
- Provides the infrastructure to feed data marts,
which supply data to users. - Responsible for
- Acquiring data
- Cleaning data
- Transforming data
- Managing granular data
- Managing ALL data required for a specific subject
- Can be store data in a number of ways
- Flat files
- RDBMS
- Other technologies (compression)
15Data Warehouse Definition 3
- Ensures that
- Impact on source systems is minimized
- Consistent data definitions across data marts
- Conformed dimensions!!
- Appropriate history is maintained
16Architectural Framework
- Three Views
- Data (content of warehouse)
- Technical (functions to be performed by the
warehouse) - Infrastructure (Hardware, Communications)
- Four levels of detail for each view
- Business Requirements
- Architecture Models
- Detailed Models
- Implementation
17Data Architecture
- Business Requirements
- What information do we need to make better
business decisions? - What data assets are available?
- Architecture Models and Documents - The
Dimensional Model - What are the major entities (the facts and
dimensions) that make up this information? - How do they relate to each other?
- How should these entities be structured?
- Detailed Models and Specs - The Logical and
Physical Models - What are the individual elements, their
definitions, domains, and rules for derivation? - What are the sources and how do they map to the
targets? - Implementation
- Create and document the databases, indexes,
backup procedures, etc.
18Technical Architecture Back Room
- Business Requirements and Audit
- How will we get at the data, transform it, and
make it available to our users? - How is this done today?
- Architecture Models and Documents
- What are the specific capabilities needed to get
the data into a usable form in the desired
locations at the appropriate times? - What are the major data stores and where should
they be located? - Detailed Models and Specs
- What standards and products provide the needed
capabilities? - How will we hook them together?
- What are our development standards for code
management., naming, etc.? - Implementation
- Write the extracts and loads.
- Automate the process.
- Document the process.
19Technical Architecture Front Room
- Business Requirements and Audit
- What are the major business issues we face?
- How will we measure these issues?
- How do we want to analyze the data?
- Architecture Models and Documents
- What will users need to get the information out
in a usable form? - What major classes of analysis and reporting do
we need to provide? - What are the priorities?
- Detailed Models and Specs
- What tools do we select to provide capabilities
- What are the specifics for the report templates?
- Who needs them?
- How often?
- How do we distribute them?
20Technical Architecture Front Room
- Implementation
- Implement the reporting and analysis environment.
- Build the initial report set.
- Train the users.
- Document.
21Infrastructure
- Business Requirements and Audit
- What hardware and network capabilities do we need
to be successful? - What do we currently have in place?
- Architecture Models and Documents
- Where is the data coming from and going to?
- Do we have enough calculation and storage
capacity? - What are the specific capabilities we are
counting on? - Do they exist?
- Who is responsible for them?
- Detailed Models and Specs
- How do we interact with these capabilities?
- What are the system utilities, calls, APIs, etc.?
- Implementation
- Install and test new infrastructure components.
- Connect the sources to the targets to the
desktop. - Document.
22Technical Data Warehouse Framework
23Technical Data Warehouse Framework
Metadata Management
Metadata
Back Room
Data Staging Services
Front Room
Access
Engineering
Acquisition
Delivery
Storage
Client/ Server
Re-engd. Data
Extracted Data
Data Marts
Web Server
Workflow and System Management