Chapter 14 Physical Design - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Chapter 14 Physical Design

Description:

The Foundation for Design. Design Objectives. Designing a Data Mart ... Summaries may offload detail processing. Separate tables or multidimensional D/B ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 27
Provided by: deanca
Category:

less

Transcript and Presenter's Notes

Title: Chapter 14 Physical Design


1
Chapter 14 Physical Design
  • Summary of Topics
  • Design Process
  • The Foundation for Design
  • Design Objectives
  • Designing a Data Mart
  • Components of the Data Architecture
  • Design Example

2
Chapter 14 Physical Design
  • Design Process
  • Iterative process with following steps
  • ID user requirements and project scope
  • Develop subject area data model(s)
  • Develop data warehouse logical model
  • Develop data warehouse architecture
  • Design the physical database
  • Populate user-oriented repository/directory
  • Identify sources of data
  • Cleanse and integrate data from legacy systems
  • Populate the data warehouse
  • Test for user satisfaction (quality, performance)
  • Rework design as needed

3
Chapter 14 Physical Design
  • The Foundation for Design
  • Data model is the foundation
  • 3rd normal form model recommended for depth of
    understanding, though later de-normalization is
    common
  • Top-down, global model ideal, but
  • Lengthy duration
  • Corporate funding requirements
  • Bottom-up may be more practical
  • Build up ,one subject area at a time
  • Integrate into complete model over time

4
Chapter 14 Physical Design
  • Design Objectives
  • Define in terms of business objectives
  • Involves tradeoffs among
  • Performance
  • Flexibility
  • Scalability
  • Ease of Administration
  • Data Integrity
  • Data Consistency
  • Data Availability
  • User Satisfaction

5
Chapter 14 Physical Design
  • Design Objectives Performance
  • Response time in DW typically gt OLTP
  • Important to manage user expectations
  • Poor performance may result from
  • Inadequate hardware
  • Inflexible data architecture
  • Poor physical design
  • Unrealistic user expectations
  • Build performance bottom-up (Fig 14.1)
  • DBMS Installation parameters
  • Too few concurrent users
  • Poor selection of installation parameters
  • Database Design and Optimization
  • Application design, SQL quality/efficiency
  • Query efficiency
  • Tune performance from top-down

6
Figure 14.1Building Good Performance
7
Chapter 14 Physical Design
  • Server Scalability
  • Defn ability to handle increase in users
  • Old mainframes known for poor scalability
  • Many adopt multi-server environment
  • Server Flexibility
  • Achieve using data model and metadata as basis
    for physical design
  • May include giving users flexibility to handle
    analysis, query, reporting needs
  • Must accommodate change in todays business
    environment

8
Chapter 14 Physical Design
  • Designing a Data Mart
  • Defn A warehouse developed to meet requirements
    of a specific dept., group
  • A subset of a corporate warehouse
  • Caveat may result in isolated islands of
    information

9
Chapter 14 Physical Design
  • Components of the Data Architecture
  • May include the following elements
  • Legacy Systems
  • Relational format, Model, Metadata questionable
  • Data Warehouse
  • Historical detail, summary, external data
  • Specialized data subsets, multidimensional data
  • Data Marts
  • Tailored to specific dept. or workgroup
  • Personal Data Warehouse
  • Draw from warehouse, but not part of it
  • May contain external, personal data
  • Operational Data Store (See Fig. 14.2)
  • Optional
  • Shared between operational and warehouse envs
  • Source of detail data for warehouse
  • Typically updateable

10
Figure 14.2Data Flow Across the Corporation
11
Chapter 14 Physical Design
  • Types of Data Structures in the Warehouse
  • Detail Data
  • Summary Data
  • Multidimensional Data
  • Data Subsets
  • Specialized Data Caches
  • Replicated Data
  • Archived Data
  • Design choices depend on expected data usage,
    performance requirements.
  • Iterative process

12
Chapter 14 Physical Design
  • Detail Data
  • Most basic level of data
  • Subsets, summaries derived from detail
  • Should include only the data needed for decision
    support
  • Include the element of time
  • In theory, all queries reports could be run
    against detail data but usually denormalized for
    performance.
  • Advantages of denormalization
  • Better performance
  • More accessible
  • Spreadsheet view of data (intuitive)
  • Downside Implementation size

13
Chapter 14 Physical Design
  • Indexing for Performance
  • Warehouse typ. gt indexes than OLTP
  • Very beneficial to performance, but
  • Large of indexes degrade load time
  • Place indexes on primary, foreign keys to start
  • Add others as needed as feasible
  • Number of indexes varies with inversely with
    table size

14
Chapter 14 Physical Design
  • Partitioning Data
  • Can centralize on mainframe/server, or
  • May partition across multiple servers by
    workgroup, dept, subject area, application
  • Choice depends on expected pattern of use
  • Horizontal partitioning departmental data on
    dept. servers
  • Vertical partitioning implement columnar subset
    of normalized table across servers

15
Chapter 14 Physical Design
  • Parallel Processing
  • Key design issue
  • Appropriate database design to achieve potential
    for improved performance
  • Shared Resources
  • Disk
  • Memory
  • Advantages, Disadvantages
  • Choices based on expected use patterns

16
Chapter 14 Physical Design
  • Summary Levels
  • Summaries may offload detail processing
  • Separate tables or multidimensional D/B
  • Key design tradeoffs
  • Minimize I/O
  • Provide access to right data _at_ right time
  • Reasonable cost
  • Recommend incremental/spiral approach to build
    critical summary levels first

17
Chapter 14 Physical Design
  • External Data
  • Example government or industry-supplied
    demographic data
  • Individual queries may need to merge external,
    personal, and warehouse data
  • Document ext. data with metadata
  • Data quality may be questionable
  • Data Replication
  • Can improve performance, availability in
    client/server environment
  • Reduce network traffic
  • Redundant availability if primary server down

18
Chapter 14 Physical Design
  • Data Placement Guidelines
  • Do
  • Place tables commonly joined on same server
  • Consider replicating tables across servers if
    joins will cause severe network traffic issues
  • Consider placing shared detail data on
    centralized source
  • Dont
  • Place frequently joined tables on same device
    (e.g. same disk)
  • Place indexes on same devices as table on which
    they are built
  • All these point to reducing network traffic and
    device I/O.

19
Chapter 14 Physical Design
  • RAID Technology
  • Not a Dept. Of Defense Program!!
  • Redundant Arrays of Inexpensive Disks
  • Potential solution for Fault Tolerance
  • Techniques include
  • Disk Mirroring (2 disks, same controller)
  • Disk Duplexing (2 disks, 2 controllers)
  • Parity/Error Detection and Correction
  • Disk Striping

20
Chapter 14 Physical Design
  • RAID Levels
  • RAID 0 Sector interleave, no error checking (no
    redundancy)
  • RAID 1 Mirroring (duplicate copy)
  • RAID 2 Bit interleave with error correction
    codes on multiple drives
  • RAID 3 Bit interleave with error correction on
    single drive
  • RAID 4 Sector interleave with dedicated parity
    drive
  • RAID 5 Sector interleave, parity stored on all
    drives

21
Chapter 14 Physical Design
  • Archived Data
  • Cant store all the data all the time !!
  • Need to determine data retention needs
  • Issues with archiving and retrieval
  • What if data format changes?
  • What if we need archive data but no room in
    current table?

22
Chapter 14 Physical Design
  • OLAP and Multidimensional Data
  • Multidimensional Model
  • Data stored as facts and dimensions rather than
    rows and columns
  • A.K.A. Cubic, Star model
  • Major Fact tables, usually business related
  • Smaller dimension tables, descriptive data
  • Dimension table perspectives on factual data
  • by criteria
  • Sales by salesperson vs. sales by region
  • Fact quantitative or factual data
  • Usually a large denormalized table
  • Minimizes number of joins

23
Chapter 14 Physical Design
  • Designing Multidimensional Tables
  • Identify Requirements from analysts view (how
    will data be accessed?)
  • Identify facts, dimensions
  • Identify the roll-up (aggregation) levels
  • Design fact and dimension tables
  • Determine data retention, sizing requirements
  • Identify partitioning requirements
  • Validate the design
  • Modify as requirements changes

24
Chapter 14 Physical Design
  • Design Example
  • Figures 14.6 Partial E-R Diagram

25
Chapter 14 Physical Design
  • Figure 14.7 Transformation of E-R Diagram into
    Multidimensional Model
  • Entities mapped into dimensions
  • Product
  • Location
  • Time
  • Entities that belong to more than one dimension
    become facts

26
Chapter 14 Physical Design
  • Figure 14.10 Modified Star Schema
  • Note Level ID tags NULL entries
  • e.g. regional level data -gt STORE_ID NULL
  • Includes attribute descriptions
Write a Comment
User Comments (0)
About PowerShow.com