Physical Database Design - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Physical Database Design

Description:

The purpose of physical database design is to translate the logical description ... Consider contriving a shorter field or selecting another candidate key to ... – PowerPoint PPT presentation

Number of Views:169

Avg rating:3.0/5.0

Slides: 46

Provided by: wweymanwh

Category:

more less

Transcript and Presenter's Notes

Title: Physical Database Design

1
Physical Database Design

Chapter 7
MIS 2403
Dr. Segall
Fall 2001

2
Introduction

The purpose of physical database design is to
translate the logical description of data into
the technical specifications for storing and
retrieving data
The goal is to create a design for storing data
that will provide adequate performance, and
insure database integrity, security, and
recoverability

3
Inputs to Physical Design

Normalized relations
Attribute definitions
Estimations of data processing volume
Descriptions of where and when data are entered,
retrieved, deleted, and updated
Response time expectations/requirements
Requirements for data security, backup, recovery,
retention, and integrity
Characteristics of the DBMS to be used

4
Data Volume andUsage Analysis

Often the first step in the physical design
process
Relative numbers are what are most important(to
identify areas requiring attention to
performance)
Figures may be recorded and displayed using an
annotated entity-relationship diagram (Fig. 7-1)
Data volumes reflect number of records in tables
Access frequencies reflect number of table record
accesses per unit of time
Note what attributes are used in table
accesses(to aid design of table indexes)
Note of creations/retrievals/updates/deletions
(to aid selection of file organization technique)

5
Physical Design Decisions

Specify the data type for each attribute from the
logical data model
Specify physical records by grouping attributes
from the logical data model
Specify the file organization technique to use
for physical storage of data records
Specify indexes to optimize data retrieval
Specify query optimization strategies

6
Designing Fields

Choosing the field data type
Select from available types such as text, memo,
number, date/time, currency, etc.
Seek to
Minimize storage space
e.g., Integer vs. Floating Point
Represent all possible values
e.g., Floating Point vs. Integer
Improve data integrity (more on next slide)
e.g., Yes/No
Support all data manipulations
e.g., Date/Time

7
Designing Fields

Controlling data integrity
Default value
e.g., value FL for State field
Range control
e.g., value lt100 for Test_Score field
Null value control
e.g., prohibit leaving Date_of_Birth field blank
Referential integrity
e.g., restrict valid values for Part_No field in
Order table to the contents of this field in the
Part table

8
Designing Fields

Coding
e.g., J(onesboro), B(eebe), M(outain Home) for
the ASU branch campuses
implement by creating a look-up table
there is a trade-off in that you must create and
store a second table and you must access this
table to look up the code value
consider using when a field has a limited number
of possible values, each of which occupies a
relatively large amount of space, and the number
of records is large and/or the number of record
accesses is small.

(see next slide also)
9
Example Code Look-up Table (Figure 7-2)
10
Designing Fields

Fixed-Length Fields
Make it easy to locate a specific record in a
file and/or a specific field in that record
Each field has its maximum length specified and
unused space in any given field is padded with
spaces (text) or leading zeros (numeric)
Variable-Length Fields
When the need arises for a variable-length field
(e.g., a memo field), this field can be stored
separate from the rest of the record with a
pointer used to locate it when needed

11
Physical Records

Physical Record A group of fields stored in
adjacent memory locations and retrieved together
as a unit.
Page The amount of data read or written in one
secondary memory (disk) input or output
operation.
Blocking Factor The number of physical records
per page.

12
Database Access Model
The goal in structuring physical records is to
minimize performance bottlenecks resulting from
disk accesses (accessing data from disk is slow
compared to main memory)
13
Optimization Decisions

Denormalization
Partitioning
Selection of File Organization
Clustering Files
Creation of Indexes
RAID
Other

14
Denormalization

The process of transforming normalized relations
into unnormalized physical record specifications
for the purpose of improv-ing overall database
performance.

15
Denormalization

Involves a trade-off
Reduced disk accesses and greater performance
(due, for example, to fewer table joins)
- But -
Introduction of anomalies (and thus redundancies)
that will necessitate extra data maintenance

16
Partitioning

Horizontal Partitioning Distributing the rows
of a table into two or more separate files
e.g., Customer table is partitioned into four
separate files, one for each geographical region
Vertical Partitioning Distributing the columns
of a table into two or more separate files
e.g., Employee table is partitioned into public
file (name, office, extension, etc.) and private
file (salary, health history, etc.)
Note the primary key is repeated in each file

17
Partitioning

Advantages of Partitioning
Records used together are grouped together
Each partition can be optimized for performance
Security and recovery
Partitions stored on different disks less
contention
Parallel processing capability
Disadvantages of Partitioning
Slower retrievals when across partitions
Complexity for application programmers
Anomalies and extra storage space requirements
due to duplication of data across partitions

18
Physical Files

Physical File A file as stored on disk
Constructs to link two pieces of data
Sequential storage
Pointers
File Organization How the files are arranged on
the disk (more on this later)
Access Method How the data can be retrieved
based on the file organization
Relative - data accessed as an offset from the
most recently referenced point in secondary
memory
Direct - data accessed as a result of a
calculation to generate the beginning address of
a record

19
File Organizations

A technique for physically arranging the records
of a file on secondary storage devices.
Goals in selecting (trade-offs exist, of
course)
Fast data retrieval
High throughput for input and maintenance
Efficient use of storage space
Protection from failures or data loss
Minimal need for reorganization
Accommodation for growth
Security from unauthorized use

20
File Organizations

Sequential
Indexed
Indexed Sequential
Indexed Nonsequential
Hashed (also called Direct)
See Table 7-3 for comparison

21
Sequential File Organization

Records of the file are stored in sequence by the
primary key field values

22
Comparisons of file organizations (a) Sequential
23
Sequential Retrieval

Consider a file of 10,000 records each occupying
1 page
Queries that require processing all records will
require 10,000 accesses
e.g., Find all items of type 'E'
Many disk accesses are wasted if few records meet
the condition
However, very effective if most or all records
will be accessed (e.g., payroll)

24
Indexed File Organization

Index concept is like index in a book
Indexed-sequential file organization The
records are stored sequentially by primary key
values and there is an index built on the primary
key field (and possibly indexes built on other
fields, also)

25
(b) Indexed
26
Hashed File Organization

Hashing Algorithm Converts a primary key value
into a record address
Division-remainder method is common hashing
algorithm(more to come on this)

27
(c ) Hashed
28
Hashing

A technique for reducing disk accesses for direct
access
Avoids an index
Number of accesses per record can be close to one
The hash field is converted to a hash address by
a hash function

29
Hashing
30
Shortcomings of Hashing

Different hash fields may convert to the same
hash address
these are called Synonyms
store the colliding record in an overflow area
Long synonym chains degrade performance
There can be only one hash field per record
The file can no longer be processed sequentially

31
Clustering

In some relational DBMS, related records from
different tables that are often retrieved
together can be stored close together on disk
Since the related records are stored close to one
another on the physical disk, less time is needed
to retrieve the data
E.g., Customer data and Order data may frequently
be retrieved together
Can require substantial maintenance if the
clustered data changes frequently

32
Indexing

An index is a table file that is used to
determine the location of rows in another file
that satisfy some condition

33
Querying with an Index

Read the index into memory
Search the index to find records meeting the
condition
Access only those records containing required
data
Disk accesses are substantially reduced when the
query involves few records

34
Maintaining an Index

Adding a record requires at least two disk
accesses
Update the file
Update the index
Trade-off
Faster queries
Slower maintenance (additions, deletions, and
updates of records)
Thus, more static databases benefit more overall

35
Rules of Thumbfor Using Indexes

1. Indexes are most useful on larger tables
2. Index the primary key of each table(may be
automatic, as in Access)
3. Indexes are useful on search fields (WHERE)
4. Indexes are also useful on fields used for
sorting (ORDER BY) and categorizing (GROUP BY)
5. Most useful to index on a field when there are
many different values for that field

36
Rules of Thumbfor Using Indexes

6. Find out the limits placed on indexing by your
DBMS (Access allows 32 indexes per table, and no
index may contain more than 10 fields)
7. Depending on the DBMS, null values may not be
referenced from an index (thus, rows with a null
value in the field that is indexed may not be
found by a search using the index)

37
RAID

Redundant Arrays of Inexpensive Disks
Exploits economies of scale of disk manufacturing
for the computer market
Can give greater security
Increases fault tolerance of systems
Not a replacement for regular backup

38
RAID

The operating system sees a set of physical
drives as one logical drive
Data are distributed across physical drives
All levels, except 0, have data redundancy or
error-correction features
Parity codes or redundant data are used for data
recovery

39
Mirroring

Write
Identical copies of file are written to each
drive in array
Read
Alternate pages are read simultaneously from each
drive
Pages put together in memory
Access time is reduced by approximately the
number of disks in the array
Read error
Read required page from another drive
Tradeoffs
Provides data security
Reduces access time
Uses more disk space

40
Mirroring
Complete Data Set
Complete Data Set
No parity
41
Striping

Three drive model
Write
Half of file to first drive
Half of file to second drive
Parity bit to third drive
Read
Portions from each drive are put together in
memory
Read error
Lost bits are reconstructed from third drives
parity data
Tradeoffs
Provides data security
Uses less storage space than mirroring
Not as fast as mirroring

42
Striping
One-Half Data Set
One-Half Data Set
Parity Codes
43
One Other Rule of Thumbfor Increasing Performance

Consider contriving a shorter field or selecting
another candidate key to substitute for a long,
multi-field primary key (and all associated
foreign keys)

44
Database Architectures