Data Warehouse Architecture - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Data Warehouse Architecture

Description:

Present a Data Warehouse Architectural Framework. Information Systems Architecture. Information Systems Architecture is the process of making the key choices that ... – PowerPoint PPT presentation

Number of Views:2330
Avg rating:3.0/5.0
Slides: 24
Provided by: neilf8
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse Architecture


1
Data Warehouse Architecture
  • Lecture 1

2
Lecture Objectives
  • Define Data Warehouse Architecture
  • Define Data Warehouse and Data Mart
  • Discuss The Great Debate
  • Present a Data Warehouse Architectural Framework

3
Information Systems Architecture
  • Information Systems Architecture is the process
    of making the key choices that are essential to
    the development of an information system.
    Architecture includes
  • Guiding Principles
  • Approaches/philosophies
  • Logical representations of a system
  • Hardware/Operating System
  • Computing model client/server vs traditional vs
    Web-based
  • Tools and technologies
  • It is key, when making these choices that they
    are
  • Requirements driven
  • Take into consideration operational, technical
    and financial feasibility
  • Made within an architectural framework

4
Architecture Drivers
  • There are a lot of Drivers of Architecture

Architecture
5
Architecture to Design to Implementation
Looking at the Work of Developing a System as a
set of Choices, Architecture can be Described as
Highest Level Choices
Business Strategy Line of Code / Process
Step Employee / Computer
Removal of Choices
Architecture
Design
Implementation
6
How is Architecture Different from Design?
  • Its not Architecture can be considered
    high-level design
  • Architecture includes those aspects of the design
    that are essential to the information system
  • Architecture Example
  • Users must be able to self-serve (guiding
    principle)
  • We will use a hub and spoke design where data
    will be placed in a central data warehouse, then
    be propagated to one or more data marts.
    (approach)
  • We will normalize data in the central warehouse
    and use a dimensional design in the data marts
    (approach)
  • We will use Oracle 8i as our DBMS (technical
    architecture)

7
Architecture vs Design
  • Not Architecture
  • The Order subject area will be composed of the
    following tables order_fact, customer_dim,
    product_dim and time_dim
  • The customer_dim table will have the following
    attributes.

8
The Value of Architecture
  • Communication
  • To business sponsors, and business users
  • Between members of the project team
  • Planning
  • Cross Check for Project Plan
  • Ensure that all important components of the data
    warehouse are accounted for
  • Flexibility and Growth
  • Thinking about overall architecture will reduce
    risk associated with the success of the data
    warehouse
  • Learning
  • Productivity and Reuse

9
Whats different about DW Architecture?
  • Transaction processing systems growth is
    (relatively) predictable
  • Example
  • A company uses SAP for order processing
  • They are opening a new retail store
  • They predict (based on experience) 2000
    transactions per week
  • To process this volume, we need 3 workstations to
    capture the transactions
  • Peak time each day is 11-2 when 50 of
    transactions occur

10
Whats Different About Data Warehouse
Architecture?
  • Success drives explosive growth
  • More users
  • More (complex) queries
  • More data
  • Performance is unpredictable
  • Unpredictable queries
  • Unpredictable use patterns

Data Warehouse
SAP R/3
Growth
Siebel
Time
11
The Great Data Warehouse Architecture Debate
  • Bill Inmon The enterprise data warehouse
  • Ralph Kimball data marts
  • The compromise Hub and Spoke or Federated
    models

If you build it, They will come
12
What is a Data Mart?
  • A data mart is a collection of subject areas
    organized for decision support based on the
    specific needs of a given user group.
  • Each mart may widely different from others (as we
    will see)
  • Typically, data marts are built on the
    dimensional data model
  • Facts things that the organization wants to
    measure revenue, orders, shipments, purchases,
    etc.
  • Dimensions the means by which the organization
    wants to analyze the measures (facts) by
    customer, by time, by product BY ANY
    COMBINATION!!

13
What is a Data Mart?
  • There are two kinds of data marts--dependent and
    independent.
  • A dependent data mart is one whose source is a
    data warehouse.
  • An independent data mart is one whose source is
    the legacy applications environment. All
    dependent data marts are fed by the same
    source--the data warehouse. Each independent data
    mart is fed uniquely and separately by the legacy
    applications environment.
  • Dependent data marts are architecturally and
    structurally sound.
  • Independent data marts have a number of
    significant issues

14
Data Warehouse Definition 3
  • Provides the infrastructure to feed data marts,
    which supply data to users.
  • Responsible for
  • Acquiring data
  • Cleaning data
  • Transforming data
  • Managing granular data
  • Managing ALL data required for a specific subject
  • Can be store data in a number of ways
  • Flat files
  • RDBMS
  • Other technologies (compression)

15
Data Warehouse Definition 3
  • Ensures that
  • Impact on source systems is minimized
  • Consistent data definitions across data marts
  • Conformed dimensions!!
  • Appropriate history is maintained

16
Architectural Framework
  • Three Views
  • Data (content of warehouse)
  • Technical (functions to be performed by the
    warehouse)
  • Infrastructure (Hardware, Communications)
  • Four levels of detail for each view
  • Business Requirements
  • Architecture Models
  • Detailed Models
  • Implementation

17
Data Architecture
  • Business Requirements
  • What information do we need to make better
    business decisions?
  • What data assets are available?
  • Architecture Models and Documents - The
    Dimensional Model
  • What are the major entities (the facts and
    dimensions) that make up this information?
  • How do they relate to each other?
  • How should these entities be structured?
  • Detailed Models and Specs - The Logical and
    Physical Models
  • What are the individual elements, their
    definitions, domains, and rules for derivation?
  • What are the sources and how do they map to the
    targets?
  • Implementation
  • Create and document the databases, indexes,
    backup procedures, etc.

18
Technical Architecture Back Room
  • Business Requirements and Audit
  • How will we get at the data, transform it, and
    make it available to our users?
  • How is this done today?
  • Architecture Models and Documents
  • What are the specific capabilities needed to get
    the data into a usable form in the desired
    locations at the appropriate times?
  • What are the major data stores and where should
    they be located?
  • Detailed Models and Specs
  • What standards and products provide the needed
    capabilities?
  • How will we hook them together?
  • What are our development standards for code
    management., naming, etc.?
  • Implementation
  • Write the extracts and loads.
  • Automate the process.
  • Document the process.

19
Technical Architecture Front Room
  • Business Requirements and Audit
  • What are the major business issues we face?
  • How will we measure these issues?
  • How do we want to analyze the data?
  • Architecture Models and Documents
  • What will users need to get the information out
    in a usable form?
  • What major classes of analysis and reporting do
    we need to provide?
  • What are the priorities?
  • Detailed Models and Specs
  • What tools do we select to provide capabilities
  • What are the specifics for the report templates?
  • Who needs them?
  • How often?
  • How do we distribute them?

20
Technical Architecture Front Room
  • Implementation
  • Implement the reporting and analysis environment.
  • Build the initial report set.
  • Train the users.
  • Document.

21
Infrastructure
  • Business Requirements and Audit
  • What hardware and network capabilities do we need
    to be successful?
  • What do we currently have in place?
  • Architecture Models and Documents
  • Where is the data coming from and going to?
  • Do we have enough calculation and storage
    capacity?
  • What are the specific capabilities we are
    counting on?
  • Do they exist?
  • Who is responsible for them?
  • Detailed Models and Specs
  • How do we interact with these capabilities?
  • What are the system utilities, calls, APIs, etc.?
  • Implementation
  • Install and test new infrastructure components.
  • Connect the sources to the targets to the
    desktop.
  • Document.

22
Technical Data Warehouse Framework
23
Technical Data Warehouse Framework
Metadata Management
Metadata
Back Room
Data Staging Services
Front Room
Access
Engineering
Acquisition
Delivery
Storage
Client/ Server
Re-engd. Data
Extracted Data
Data Marts
Web Server
Workflow and System Management
Write a Comment
User Comments (0)
About PowerShow.com