Title: Physical Database Design
1Chapter 6
2Introduction
- The purpose of physical database design is to
translate the logical description of data into
the technical specifications for storing and
retrieving data. - The goal is to create a design for storing data
that will provide adequate performance and insure
database integrity, security and recoverability.
3Inputs to Physical Design
- Normalized relations.
- Volume estimates.
- Attribute definitions.
- Data usage entered, retrieved, deleted, updated.
- Response time requirements.
- Requirements for security, backup, recovery,
retention, integrity. - DBMS characteristics.
4Physical Design Decisions
- Specifying attribute data types.
- Modifying the logical design.
- Specifying the file organization.
- Choosing indexes.
5Designing Fields
- Choosing data type.
- Coding, compression, encryption.
- Controlling data integrity.
- Default value.
- Range control.
- Null value control.
- Referential integrity.
6Example of Data Dictionary (for team deliverable)
7Example code-look-up table (Pine Valley Furniture
Company)
8Composite usage map (Pine Valley Furniture
Company)
9Designing Fields
- Handling missing data.
- Substitute an estimate of the missing value.
- Trigger a report listing missing values.
- In programs, ignore missing data unless the value
is significant.
10Designing Physical Files
- Physical File A file as stored on the disk.
- Constructs to link two pieces of data
- Sequential storage.
- Pointers.
- File Organization How the files are arranged on
the disk. - Access Method How the data can be retrieved
based on the file organization
11Creating Indexes
Consider this table EMPLOYEE(E_ID, Name,
Salary_Grade, Phone) Where E_ID is the primary
key And the domain for Salary_Grade is 1, 2, 3 or
4.
12Rules for Using Indexes
- 1. Use on larger tables.
- 2. Index the primary key of each table.
- 3. Index search fields.
- 4. Fields in WHERE clause of SQL commands.
- 5. When there are 100 values but not when there
are
13Rules for Using Indexes
- 6. DBMS may have limit on number of indexes per
table and number of bytes per indexed field(s). - 7. Null values will not be referenced from an
index. - 8. Use indexes heavily for non-volatile
databases limit the use of indexes for volatile
databases.
14Physical Design Decisions
- Denormalization
- Partitioning
- Selection of File Organization
- Clustering Files
- Placement of Indexes
- Derived Columns
- Repeating Groups Across Columns
- e.g., skill attribute
15Denormalization
- One-to-one relationship. Fig. 6-3 (former 7-3)
- STUDENT and APPLICATION become a single relation
STUDENT - Many-to-many relationship. Fig. 6-4 (former 7-4)
- Initial logical design suggests a need for 3
entities - Physical design suggests collapsing ITEM and
PRICE_QUOTE into a single relation ITEM_QUOTE - Small amount of data duplication in Description
field - One-to-many relationship. (next slide)
- Initial logical design suggests 2 relations
- STORAGE relation is combined into ITEM relation
yielding a single relation ITEM containing some
redundant data in Description field.
16A possible denormalization situation reference
data
17Partitioning
- Horizontal Partitioning Distributing the rows of
a table into several separate files. - Vertical Partitioning Distributing the columns
of a table into several separate files. - The primary key must be repeated in each file.
18Partitioning
- Advantages of Partitioning
- Records used together are grouped together.
- Each partition can be optimized for performance.
- Security, recovery.
- Partitions stored on different disks contention.
- Take advantage of parallel processing capability.
- Disadvantages of Partitioning
- Slow retrievals across partitions.
- Complexity.
19Designing Physical Files
- Physical file a named portion of secondary
memory (tape, hard disk, etc) allocated for the
purpose of storing physical records. - Logical versus physical
- Biblio.MDB physical file containing a MS Access
database - Authors.MDB is composed of tables, or logical
files Author, Title, Title_Author, Publisher
20Physical File Organization
- Sequential files
- Similar to alphabetical listing in the white
pages, or cassette tape - Indexed
- A table (index) is created to determine the
location of rows in a file that satisfy some
condition - Secondary key a field or combination of fields
used in the condition of a WHERE clause - eg, state, zip code
- Hashed the address of each record is determined
using an algorithm that converts a primary key
into a physical record address
21Rules for Adding Derived Columns
- Use when aggregate values are regularly
retrieved. - Use when aggregate values are costly to
calculate. - Permit updating only of source data.
- Create triggers to cascade changes from source
data.
22Rules for Storing Repeating Groups Across Columns
- Consider storing repeating groups across columns
rather than down rows when - The repeating group has a fixed number of
occurrences, each of which has a different
meaning or - The entire repeating group is normally accessed
and updated as one unit.
23Rules for Storing Repeating Groups Across Columns
Design Option EMPLOYEE(EmpID, EmpName,
) EMP_PHONE(EmpID, Phone) Another Design
Option EMPLOYEE(EmpID, EmpName, Phone1, Phone2,
)
24Selection of a Primary Key
- Consider contriving a shorter field or selecting
another candidate key to substitute for a long,
multi-field primary key (and all associated
foreign keys.) - System-generated non-information-carrying key
- Versus
- Primary key like Phone number
25RAID with four disks and striping
26RAID
- A set of physical disks that appear to the db
user/programs as if they are one large logical
storage unit. - Striping balances the workload by storing
sequential files across several disks can then
access them sequentially more rapidly - Risk a disk drive failure
- Solution redundant data stores