Title: Chapter 14 Physical Design
1Chapter 14 Physical Design
- Summary of Topics
- Design Process
- The Foundation for Design
- Design Objectives
- Designing a Data Mart
- Components of the Data Architecture
- Design Example
2Chapter 14 Physical Design
- Design Process
- Iterative process with following steps
- ID user requirements and project scope
- Develop subject area data model(s)
- Develop data warehouse logical model
- Develop data warehouse architecture
- Design the physical database
- Populate user-oriented repository/directory
- Identify sources of data
- Cleanse and integrate data from legacy systems
- Populate the data warehouse
- Test for user satisfaction (quality, performance)
- Rework design as needed
3Chapter 14 Physical Design
- The Foundation for Design
- Data model is the foundation
- 3rd normal form model recommended for depth of
understanding, though later de-normalization is
common - Top-down, global model ideal, but
- Lengthy duration
- Corporate funding requirements
- Bottom-up may be more practical
- Build up ,one subject area at a time
- Integrate into complete model over time
4Chapter 14 Physical Design
- Design Objectives
- Define in terms of business objectives
- Involves tradeoffs among
- Performance
- Flexibility
- Scalability
- Ease of Administration
- Data Integrity
- Data Consistency
- Data Availability
- User Satisfaction
5Chapter 14 Physical Design
- Design Objectives Performance
- Response time in DW typically gt OLTP
- Important to manage user expectations
- Poor performance may result from
- Inadequate hardware
- Inflexible data architecture
- Poor physical design
- Unrealistic user expectations
- Build performance bottom-up (Fig 14.1)
- DBMS Installation parameters
- Too few concurrent users
- Poor selection of installation parameters
- Database Design and Optimization
- Application design, SQL quality/efficiency
- Query efficiency
- Tune performance from top-down
6Figure 14.1Building Good Performance
7Chapter 14 Physical Design
- Server Scalability
- Defn ability to handle increase in users
- Old mainframes known for poor scalability
- Many adopt multi-server environment
- Server Flexibility
- Achieve using data model and metadata as basis
for physical design - May include giving users flexibility to handle
analysis, query, reporting needs - Must accommodate change in todays business
environment
8Chapter 14 Physical Design
- Designing a Data Mart
- Defn A warehouse developed to meet requirements
of a specific dept., group - A subset of a corporate warehouse
- Caveat may result in isolated islands of
information
9Chapter 14 Physical Design
- Components of the Data Architecture
- May include the following elements
- Legacy Systems
- Relational format, Model, Metadata questionable
- Data Warehouse
- Historical detail, summary, external data
- Specialized data subsets, multidimensional data
- Data Marts
- Tailored to specific dept. or workgroup
- Personal Data Warehouse
- Draw from warehouse, but not part of it
- May contain external, personal data
- Operational Data Store (See Fig. 14.2)
- Optional
- Shared between operational and warehouse envs
- Source of detail data for warehouse
- Typically updateable
10Figure 14.2Data Flow Across the Corporation
11Chapter 14 Physical Design
- Types of Data Structures in the Warehouse
- Detail Data
- Summary Data
- Multidimensional Data
- Data Subsets
- Specialized Data Caches
- Replicated Data
- Archived Data
- Design choices depend on expected data usage,
performance requirements. - Iterative process
12Chapter 14 Physical Design
- Detail Data
- Most basic level of data
- Subsets, summaries derived from detail
- Should include only the data needed for decision
support - Include the element of time
- In theory, all queries reports could be run
against detail data but usually denormalized for
performance. - Advantages of denormalization
- Better performance
- More accessible
- Spreadsheet view of data (intuitive)
- Downside Implementation size
13Chapter 14 Physical Design
- Indexing for Performance
- Warehouse typ. gt indexes than OLTP
- Very beneficial to performance, but
- Large of indexes degrade load time
- Place indexes on primary, foreign keys to start
- Add others as needed as feasible
- Number of indexes varies with inversely with
table size
14Chapter 14 Physical Design
- Partitioning Data
- Can centralize on mainframe/server, or
- May partition across multiple servers by
workgroup, dept, subject area, application - Choice depends on expected pattern of use
- Horizontal partitioning departmental data on
dept. servers - Vertical partitioning implement columnar subset
of normalized table across servers
15Chapter 14 Physical Design
- Parallel Processing
- Key design issue
- Appropriate database design to achieve potential
for improved performance - Shared Resources
- Disk
- Memory
- Advantages, Disadvantages
- Choices based on expected use patterns
16Chapter 14 Physical Design
- Summary Levels
- Summaries may offload detail processing
- Separate tables or multidimensional D/B
- Key design tradeoffs
- Minimize I/O
- Provide access to right data _at_ right time
- Reasonable cost
- Recommend incremental/spiral approach to build
critical summary levels first
17Chapter 14 Physical Design
- External Data
- Example government or industry-supplied
demographic data - Individual queries may need to merge external,
personal, and warehouse data - Document ext. data with metadata
- Data quality may be questionable
- Data Replication
- Can improve performance, availability in
client/server environment - Reduce network traffic
- Redundant availability if primary server down
18Chapter 14 Physical Design
- Data Placement Guidelines
- Do
- Place tables commonly joined on same server
- Consider replicating tables across servers if
joins will cause severe network traffic issues - Consider placing shared detail data on
centralized source - Dont
- Place frequently joined tables on same device
(e.g. same disk) - Place indexes on same devices as table on which
they are built - All these point to reducing network traffic and
device I/O.
19Chapter 14 Physical Design
- RAID Technology
- Not a Dept. Of Defense Program!!
- Redundant Arrays of Inexpensive Disks
- Potential solution for Fault Tolerance
- Techniques include
- Disk Mirroring (2 disks, same controller)
- Disk Duplexing (2 disks, 2 controllers)
- Parity/Error Detection and Correction
- Disk Striping
20Chapter 14 Physical Design
- RAID Levels
- RAID 0 Sector interleave, no error checking (no
redundancy) - RAID 1 Mirroring (duplicate copy)
- RAID 2 Bit interleave with error correction
codes on multiple drives - RAID 3 Bit interleave with error correction on
single drive - RAID 4 Sector interleave with dedicated parity
drive - RAID 5 Sector interleave, parity stored on all
drives
21Chapter 14 Physical Design
- Archived Data
- Cant store all the data all the time !!
- Need to determine data retention needs
- Issues with archiving and retrieval
- What if data format changes?
- What if we need archive data but no room in
current table?
22Chapter 14 Physical Design
- OLAP and Multidimensional Data
- Multidimensional Model
- Data stored as facts and dimensions rather than
rows and columns - A.K.A. Cubic, Star model
- Major Fact tables, usually business related
- Smaller dimension tables, descriptive data
- Dimension table perspectives on factual data
- by criteria
- Sales by salesperson vs. sales by region
- Fact quantitative or factual data
- Usually a large denormalized table
- Minimizes number of joins
23Chapter 14 Physical Design
- Designing Multidimensional Tables
- Identify Requirements from analysts view (how
will data be accessed?) - Identify facts, dimensions
- Identify the roll-up (aggregation) levels
- Design fact and dimension tables
- Determine data retention, sizing requirements
- Identify partitioning requirements
- Validate the design
- Modify as requirements changes
24Chapter 14 Physical Design
- Design Example
- Figures 14.6 Partial E-R Diagram
25Chapter 14 Physical Design
- Figure 14.7 Transformation of E-R Diagram into
Multidimensional Model - Entities mapped into dimensions
- Product
- Location
- Time
- Entities that belong to more than one dimension
become facts
26Chapter 14 Physical Design
- Figure 14.10 Modified Star Schema
- Note Level ID tags NULL entries
- e.g. regional level data -gt STORE_ID NULL
- Includes attribute descriptions