Title: Database Design
1Database Design
Database Design
2Database Design
We Will Consider 3 general Topics Here
- Transforming (Mapping) ERDs into Relations
- How to Transfer your ERDs into Tabular form
- How to Maximize your table utility
- General Design Considerations
- Other Design Considerations
3ERDs to RDBMS
Transforming ERDs to the RDBMS
- Transforming (Mapping) ERDs into Relations is
relatively Straightforward
- Most CASE tools can automatically perform the
transformation
--- HOWEVER ---
- Many CASE tools can NOT transform certain complex
relationships (e.g, Ternary)
- There often many valid alternatives, and the
designer must choose the best one
- The designer must be prepared to evaluate the
correctness of the CASE generated solution
4ERDs to RDBMS
Step 1 Map Regular Entities
Given a simple table with no composite/complex
attributes
Physician
PhysID
Name
Specialty
Street
City
State
ZipCode
A Direct Mapping
5ERDs to RDBMS
The SQL Commands Needed would be
Physician Table Attributes/Fields
CREATE TABLE physician
( physid CHAR(9),
PhysID
Name
name CHAR(30),
Specialty
specialty CHAR(20),
Street
street CHAR(25),
city CHAR(20),
City
State
state CHAR(2),
zipcode CHAR(5),
Zipcode
Supervisor
supervisor CHAR(9),
PRIMARY KEY (physid),
FOREIGN KEY (supervisor)
REFERENCES TABLE physician (physid))
6ERDs to RDBMS
Some Composite attributes can also be easily
decomposed and Mapped
Patient
PatID
Name
Street
City
State
ZipCode
7ERDs to RDBMS
The SQL Commands Needed would be
Patient Table Attributes/Fields
CREATE TABLE patient
( patid CHAR(9),
PatID
Name
name CHAR(30),
street CHAR(25),
Street
City
city CHAR(20),
State
state CHAR(2),
Zipcode
zipcode CHAR(5),
Physid
physid CHAR(9),
PRIMARY KEY (patid),
FOREIGN KEY (physid)
REFERENCES TABLE physician (physid))
8ERDs to RDBMS
Multi-valued Attributes also may need
decomposition
Suppose that a Physician could have multiple
specialties
Physician
(We have omitted the field supervisor only to
save space)
How Many Specialties does a Physician have ??
One?? Two?? Twenty??
9ERDs to RDBMS
We already know the problems associated with such
a problem.
We must allocate enough field to meet the Maximum
Case.
Physician
PhysID
Name
Specialty1
Street
City
State
ZipCode
Specialtyn
The same problem we had when we put PhysID in
Patient
10ERDs to RDBMS
We need to decompose the multi-valued attributes
into separate relations, but, it isnt really too
difficult
It DOES depend on how we state the problem,
however
A Physician many have Many specialties.
A specialty can only have one Physician .
(A 1M Situation)
--- OR ---
A Physician many have Many specialties.
A specialty may be shared by many Physicians.
(A MM Situation)
11ERDs to RDBMS
In a 1M Physician - Specialty relationship
Physician
Specialty
The primary key for relation Specialty will be
the concatenated Key PhysID, SpecCode (which is
unique)
CREATE TABLE physician ( physid CHAR(9), name CHA
R(30), street CHAR(25), city CHAR(20), state CH
AR(2), zipcode CHAR(5), supervisor CHAR(9), PRI
MARY KEY (physid), FOREIGN KEY (supervisor)
REFERENCES TABLE physician (physid))
CREATE TABLE specialty ( physid CHAR(9), specCode
CHAR(10), Specname CHAR(20), PRIMARY KEY
(physid, specCode), FOREIGN KEY (physid)
REFERENCES TABLE physician
(physid))
NOTE There is no field specialty
12ERDs to RDBMS
Looking at the relations and their Mapping
Physician
Specialty
PhysID
Name
Street
City
State
ZipCode
SpecCode
Desc
13ERDs to RDBMS
What if the Relationship is MM ???
Physician
Specialty
The Relationship will become an Associative Entity
(More on that later)
14ERDs to RDBMS
Step 2 Map Weak Entities
Recall that a weak entity is one which exists
only through an identifying relationship with
another entity
Consider the following relationship
Physician
Where Dependents would not exist without
Physician (the owner of Dependents)
15ERDs to RDBMS
Notice that at this point in time there is no way
of associating the two relations (i.e., no common
key)
By now, however, we know to put the foreign key
in a 1M relationship on the side of the many
Physician
PhysID
Name
DepName
DOB
Gender
16ERDs to RDBMS
The SQL Commands Needed would be
Dependents Table Attributes/Fields
CREATE TABLE dependents
( depname CHAR(30),
depname
physID
physID CHAR(9),
DOB DATE,
DOB
gender
gender CHAR(1),
PRIMARY KEY (depname),
FOREIGN KEY (physid)
REFERENCES TABLE physician (physid))
17ERDs to RDBMS
Weak Relationships can be more complex, however
Suppose that two Physicians can be married to
each other (and hence, both have the same
dependents)
There are a few ways we could deal with this, but
lets take one (NOT necessarily the best)
Physician
PhysID
Name
DepName
PhysID1
DOB
Gender
PhysID2
18ERDs to RDBMS
How Else Could this be done ???
Since this is now a MM relationship, the
relationship Has could become an Associative
Entity
Physician
PhysID
Name
DepName
DOB
Gender
Again, More on this later
19ERDs to RDBMS
Step 3 Map Binary Relationships
Essentially, we have already done most of these
Binary 1M Relationships
Physician
PhysID
Name
PatID
Name
Street
City
State
ZipCode
Weve already seen the SQL Code Needed
20ERDs to RDBMS
Binary MM Relationships
Recall our previous example
Patient
Date
Time
PatID
Name
IllCode
Name
21ERDs to RDBMS
The SQL Commands Needed are
Illness Table Attributes/Fields
CREATE TABLE illness
Illcode
( illcode CHAR(9),
Name
name CHAR(30),
PRIMARY KEY (illcode))
Suffers Table Attributes/Fields
CREATE TABLE suffers
patID
Patient
( patID CHAR(9),
illcode
illcode CHAR(9),
DateTime
DateTime DATE,
PRIMARY KEY (patid,
illcode,
DateTime),
FOREIGN KEY (patID) REFERENCES TABLE patient
(patID))
FOREIGN KEY (illcode) REFERENCES TABLE illness
(illcode))
22ERDs to RDBMS
Binary 11 Relationships
Binary 11 Relationships ?? Like a person and
their Social Security Number ?? Why Binary ?? Why
Not put it in one Table ??
Usually, Yes.
Sometimes, 11 Binary relationships are very
useful
Situation 1 Storage Savings
Consider the field State (which we have used
quite often)
The Longest State Name is Massachusetts
(13-Bytes of storage 14, if we include the NULL
Character)
23ERDs to RDBMS
Given that there are only 50 states, we could
enter a state number in the main database (which
requires only 1-byte), and link it to a database
which stored the entire state name
1-byte of storage per record
1-byte of storage per record
14-bytes of storage per record
It Seems like we need 1 additional byte of
storage per record this way. Where is the Savings
??
24ERDs to RDBMS
Suppose that our Patient database had 5,000
records
If we did not have a separate database, we would
need
5,000 14-Bytes 70,000 Bytes of storage
If two databases, we need
5,000 1-Bytes 5,000 Bytes of storage
50 14-Bytes 50 1-Byte 750 Bytes of
storage
5,750 Bytes of storage
If there were 1,000,000 records that used state
as an attribute (not unrealistic), then our
savings would be
If in 1 database
1,000,000 14-Bytes 14,000,000 Bytes of storage
1,000,000 1-Byte 1,000,000 Bytes of storage
If in 2 databases
50 14-Bytes 50 1-Byte 750
Bytes of storage
1,000,750 Bytes of storage
A savings of 13,000,250 Bytes
25ERDs to RDBMS
The SQL Commands Needed are
Patient Table Attributes/Fields
CREATE TABLE patient
PatID
( patID CHAR(9),
Name
name CHAR(30),
State
state INTEGER,
PRIMARY KEY (patID),
FOREIGN KEY (state) REFERENCES TABLE state
(State_No))
State Table Attributes/Fields
CREATE TABLE state
( state_no INTEGER,
StateName
Statename CHAR(14),
PRIMARY KEY (state_no))
26ERDs to RDBMS
How can we refer to a table that doesnt yet
exist ???
(Table patient refers to table state, which
doesnt yet exist)
We Cant !
We Need to Reverse the order of Table creation
State Table Attributes/Fields
CREATE TABLE state
State_No
( State_no INTEGER,
StateName
StateName CHAR(14),
PRIMARY KEY (state_no))
Patient Table Attributes/Fields
CREATE TABLE patient
PatID
( patID CHAR(9),
name
name CHAR(30),
State
state INTEGER,
PRIMARY KEY (patID ))
FOREIGN KEY (state) REFERENCES TABLE state
(State_No))
27ERDs to RDBMS
This Does NOT appear to be a 11 relationship
!!!
Absolutely True !!
The Relationship is actually
Patient
State
(A M1 Relationship)
How Can this Be ???
Stuff Happens !!!
(ALWAYS be ready to change your ERD AND your
thinking)
28ERDs to RDBMS
Situation 2 Other Binary 11 Relationships
Suppose that each hospital clinic had one
physician assigned to be in charge of the clinic.
No Physician is assigned to more than 1 clinic
Physician
Clinic
Given 1 Physician, how many Clinics ??
Given 1 Clinic, how many Physicians ??
One
One
Mandatory??
Mandatory??
No
Yes
29ERDs to RDBMS
Mapping Binary 11 Relationships
Physician
Clinic
PhysID
Name
ClinID
Name
30ERDs to RDBMS
The SQL Commands Needed are
Clinic Table Attributes/Fields
CREATE TABLE clinic
ClinID
( ClinID CHAR(9),
Name
name CHAR(31),
PhysID
physID CHAR(9),
PRIMARY KEY (ClinID))
FOREIGN KEY (physID) REFERENCES TABLE physician
(physID))
For a total of 79 Bytes
(We Have Already Defined Table physician)
31ERDs to RDBMS
How Is This Helpful ????
Suppose we had included all the attributes in
one Table
Patient Table Attributes/Fields
CREATE TABLE patient
PatID
( patid CHAR(9),
Name
name CHAR(30),
Clinic
clinic CHAR(31),
PRIMARY KEY (patid))
In other words, Each record requires 70 Bytes
32ERDs to RDBMS
Suppose that there were 2,000 Physicians in the
database
We would need
2,000 70-Bytes 140,000 Bytes of storage
Assume that only 75 of those physicians were in
charge of a clinic. If there were 2 tables
Physician Table
2,000 48-Bytes 96,000 Bytes of storage
Clinic Table
75 79-Bytes 5,925 Bytes of storage
101,920 Bytes of storage
A Savings of 38,080 Bytes
(Remember, This is a Simplistic Example)
33ERDs to RDBMS
Step 4 Map Associative Entities (MM)
Recall the Relationship
Again, we have already Done this.
Patient
PatID
Name
IllCode
Name
34ERDs to RDBMS
The SQL Commands Needed are
Suffer Table Attributes/Fields
CREATE TABLE suffer
PatID
( PatID CHAR(9),
illcode
illcode CHAR(9),
TDATE
TDate DATE,
PRIMARY KEY (PatID,
illcode,
TDate),
FOREIGN KEY (PatID ) REFERENCES TABLE patient
(PatID),
FOREIGN KEY (illcode ) REFERENCES TABLE illness
(illcode))
35ERDs to RDBMS
Step 5 Map Unary Relations
Mapping Unary 1M Relationships
Recall the Relationship
Physician
PhysID
Name
Supervisor
36ERDs to RDBMS
The SQL Commands Needed are
Patient Table Attributes/Fields
CREATE TABLE patient
PatID
( PatID CHAR(9),
name
name CHAR(30),
Supervisor
Supervisor CHAR(9),
PRIMARY KEY (PatID),
FOREIGN KEY (Supervisor ) REFERENCES TABLE
patient (PatID))
37ERDs to RDBMS
Mapping Unary MM Relationships
Consider a Part in inventory
- A Part may not require any components
(e.g., a screw)
- A Part may consist of multiple parts
(e.g., a motor)
We Might set up the ERD as
Parts
A Many to Many (MM) Relationship
38ERDs to RDBMS
But, if there is a MM relationship dont we run
into the same problem as we did when we first
tried to relate Patient with Illness ??
(i.e., where do we put the foreign key ???)
- If we put PatID in Illness, we need multiple
fields
- If we put IllCode in Patient, we still need
multiple fields
The Solution is Also the same
Decompose the Relationship into 2 tables
All we need to do is identify the appropriate
attributes
39ERDs to RDBMS
Consider the Following attributes/Mapping
Lets Call the Table
Component
Parts
PartNo
Name
UnitCost
40ERDs to RDBMS
The SQL Commands Needed are
Part Table Attributes/Fields
CREATE TABLE part
partno
( partno CHAR(9),
Name
name CHAR(30),
unitcost
unitcost DECIMAL(7,2),
PRIMARY KEY (partno))
Component Table Attributes/Fields
CREATE TABLE component
partno
( partno CHAR(9),
compno
Compno CHAR(9),
PRIMARY KEY (partno,
compno),
FOREIGN KEY (partno) REFERENCES TABLE part
(partno),
FOREIGN KEY (compno) REFERENCES TABLE part
(partno))
41ERDs to RDBMS
Step 6 Map Ternary (and n-ary) relationships
Once again, we should already be familiar with
this
Recall the Relationship
Patient
Prescription
42ERDs to RDBMS
The SQL Commands Needed are
Treatment Table Attributes/Fields
CREATE TABLE treatment
patid
( patid CHAR(9),
illCode
illCode CHAR(9),
drugcode
drugcode CHAR(9),
TDate
TDate DATE,
PRIMARY KEY (patid,
illCode,
drugcode,
TDate),
FOREIGN KEY (patid) REFERENCES
TABLE patient (patid),
FOREIGN KEY (illcode)
REFERENCES TABLE illness (illcode),
FOREIGN KEY (drugcode)
REFERENCES TABLE prescription (drugcode))
43ERDs to RDBMS
The Final Mapping is
Patient
Prescription
44ERDs to RDBMS
Summary
Mapping ERDs into a Relational Database is
generally straightforward
---- BUT ----
ONLY if the ERD is well-conceived
45Normalization
Normalization
Why Normalization ??
Domain Constraints
- Assuring that there are no illegal values within
a domain (set) of attributes
Entity Integrity
- Assuring that Every relation has a primary key
- Assuring that every primary key is valid
- Assuring that every primary key is NOT NULL
46Normalization
Normalization is the formal process of
determining which attributes should be grouped
together in a relation
- A method to validate the logical design (e.g.,
ERD)
- A Tool to improve the logical design
- Satisfaction of certain constraints
- Avoidance of unnecessary data duplications
- The process of Decomposing relations with
anomalies to produce smaller, well-structured
relations
47Normalization
(Continued)
Why Normalization ??
Referential Integrity Constraint
- Assuring that consistency between relational rows
is maintained
- If a foreign key exists in one relation, it MUST
be
- A Primary Key in another relation,
OR
48Normalization
Normalization Steps
First Normal Form
(1NF)
- Removal of repeating groups (multi-valued
attributes) so that there is a single value at
the intersection of each row and column
Remember the first problem we encountered in our
ERD example
Each patient can have only one primary
physician.
Our first solution to this was not a very good
one
- Put PatID in the Physician database
49Normalization
The lay-out (for some of the attributes) then
became
Or perhaps (in tabular lay-out)
50Normalization
One solution, which we did not come up with
before, is
NOTE PatID does NOT repeat in a record
(There are NO Repeating Groups)
51Normalization
? A Table is said to be in 1NF if
- All of the prime key (or key) attributes are
defined
(In our case, BOTH PhysID and PatID)
- There are no repeating groups
- ALL Attributes are dependent upon the primary key
(In our case, ALL other attributes can be defined
by PhysID )
52Normalization
Naturally, we can see that there are problems
- In order to identify a record, we need to use a
combin-ation of PhysID AND PatID
- Our Primary key is no longer unique
53Normalization
In this situation, we have a FUNCTIONAL DEPENDENCY
? We can readily determine attribute (e.g.,
PatID) based on the value of another attribute
(e.g., PhysID)
? Unnecessary duplication
? Increased probability of entry errors
54Normalization
(2NF)
Second Normal Form
Assumes that the relation is already in 1NF AND
any of the following Conditions exist
- The primary key consists of only 1 attribute
- No non-key attributes exist
(all of the non-key attributes are components of
the primary key)
- Every non-key attribute is functionally depended
upon the full set of primary key attributes
Lets consider each of these conditions
independently
55Normalization
The Primary key key consists of only 1 attribute
In our Physician Relation
A record Can only be Uniquely identified ONLY
through the composite key PhysID, PatID
We could have specified our fields as
(Note This lay-out is different than our ERD
Example)
56Normalization
No non-key attributes exist
(all of the non-key attributes are related to the
primary key)
Consider the (Rearranged) Physician table fields
In the above table
- The name of the clerk who inputted (Input Clerk
Name) the record has no relation to either of the
keys (PhysID or PatID)
(Input Clerk Name is a non-key attribute)
57Normalization
Every non-key attribute is functionally depended
upon the full set of primary key attributes
Consider the (Rearranged) Physician table fields
In the above table
- PhysName and PhysAddr are functionally dependent
on the key field PhysID (but NOT on PatID)
- PatName and PatAddr are functionally dependent on
the key field PatID (but NOT on PhysID)
- There is a Partial Functional Dependency
58Normalization
How do we convert to 2NF ??
- Consider elimination if the attribute is not
necessary
(Non-key attributes)
- Decompose the relation into one (or more)
relations that satisfy one (or more) of the
conditions needed for 2NF
We already know (in our case) that PhyID should
be placed in Relation Patient as a Foreign Key
(Since there is a 1M relationship between
Physician and Patient)
59Normalization
If we now look at our relations, we might see
Physician Table Attributes/Fields
Patient Table Attributes/Fields
1
M
Which is (almost) how we laid-out the
relationship previously
60Normalization
Notice that all of the conditions for 2NF have
been met
- The primary key consists of only 1 attribute
(PhysID in table Physician PatID in table
Patient)
- There are NO non-key attributes
(in either table)
- Every non-key attribute is functionally depended
on the primary key
Physician(PhysID, Name, Street, City, State, Zip)
Patient(PatID, Name, Street, City, State, Zip,
PhysID, RX Needs)
61Normalization
Third Normal Form
(3NF)
- The relation is already in 2NF
- No Transitive Relationships exist
A Transitive Relationship is a functional
dependency between two (or more) non-key
attributes
Consider our Patient Relation, which might appear
as
62Normalization
Third Normal Form
(3NF)
- IF a physician can have only one specialty, then
the field RX Needs is defined by PhysID
In other words, RX Needs is transitively
dependent upon PhysID
(Again, assuming each Physician has a single
specialty)
63Normalization
SO ?? If there is a functional dependency between
PatID and their RX Needs, then we need that
information, dont we ??
Yes
--- BUT ---
If this is the ONLY relation we have, then the
only way we have of knowing what specialty a
physician has is to determine it vis-Ã -vis a
Patients treatment needs
Operationally, transitive dependencies can cause
severe maintenance problems
Such as ??
64Normalization
Transitivity Anomalies
Suppose that a new physician is assigned to a
specialty (e.g., Pediatrics). Until a patient is
assigned to that physician, we have no way of
knowing that physicians specialty
If a Patient is deleted from the relation, and
that is the only Patient a Physician has, we lose
information about the Physician
If a Patient is deleted from the relation, and
that is the only Patient a Physician has, we lose
information about the Physician
65Normalization
How do we convert to 3NF ??
- Simply remove the dependency, as we originally did
- Our new relations would appear (as before) as
Physician Table Attributes/Fields
Patient Table Attributes/Fields
1
M
66Normalization
Boyce-Codd Normal Form
(BCNF)
- Consider the following relation
- The table is in 3NF since no functional
dependencies exist between two (or more) non-key
attributes
- Major is NOT dependent on Advisor (Or vice
versa)
- Major is NOT dependent on MajGPA (Or vice versa)
- Advisor is NOT dependent on MajGPA (Or vice
versa)
67Normalization
Boyce-Codd Normal Form
(BCNF)
- HOWEVER, in the table, Beethoven appears twice
- If Beethoven were to quit and be replaced by Dr.
Dre, we would have to make TWO different changes
(A modification anomaly)
68Normalization
Boyce-Codd Normal Form
(BCNF)
- To correct this condition, we might set up two
tables
- By simply changing Beethoven to Dr. Dre we have
made all the changes necessary
69General Design Considerations
General Design Considerations
Simons Model of Decision Making
1. Intelligence
Is There a Problem?
2. Design
Enumeration of Alternatives
3. Choice
Selection from List of Alternatives
--------------------------------- Extensions
--------------------------------
4. Implementation
Putting Choice into Practice
5. Review
Evaluation of Choices Made
Modification as Necessary
70General Design Considerations
Systems Development Life Cycle (SDLC)
? Generic Term
? Describe a basic series of activities, not a
set series of tasks
? Corresponds to Simons Model of Decision
Making
? Follows the Systems Approach (which
Corresponds to Simon)
Simon
Systems Approach
SDLC
Text
Intelligence
Define Problem
Investigation
Planning
Design
Devel. Altern.
Analysis
Analysis
Choice
Select Solution
Design
Systems Specific.
Design Solution
Implement Sol.
Implement.
Imple-mentation
Implement.
Review
Maintenence
71General Design Considerations
SDLC Activities
Investigation/Planning
Analysis
- Logical Analysis (User Requirements)
Design/ System Spec.
- Code/test/debug Install/Fine-tune
Implement.
Maintain.
72General Design Considerations
DataBase Development Life Cycle (DBLC)
- Analyze Company Situation/Operations
- Analyze problems and Constraints
DB Study
- Define Objectives, Scope, and Boundaries
DB Design
- Logical Analysis (User Requirements)
Design/ System Spec.
- Code/test/debug Install/Fine-tune
Implement.
Maintain.
73General Design Considerations
There ARE alternatives
Prototyping Rapid Development and Testing
? Investigation/Analysis
ID End User Requirements
? Quick Requirements Definitions
? ID Several Alternatives
? Develop Prototype
? Include User Specified Feature(s)
? Establish Interactive Components
? Test Prototype
? Get User feedback
? Revise Prototype
? Based on User feedback
? Implement and Maintain
74General Design Considerations
Prototyping Vs. SDLC
? Prototyping Advantages
? Quicker Development/Implementation
? Generally Cheaper
? Users can better Identify requirements if they
see what they are getting
? Prototyping Disadvantages
? Bottom-Up Approach
? Code-Test-Patch
? More Prone to Errors
? Increased maintenance costs
? Users may want too much and could be
disappointed with resulting system
75General Design Considerations
SDLC Costs vs. Prototyping Costs
Prototype
SDLC
Time
76General Design Considerations
Physical DB Design Prerequisites
? Normalized Relations
? Attribute Definitions
? Descriptions of where and when data are used
? Entered
? Deleted
? Retrieved
? Updated
? Expectations/Requirements for
? Response Times
? Backup
? Retention
? Data Security
? Recovery
? Integrity
? Description of the technologies (DBMS) used
for implementing the system
77General Design Considerations
Estimating Data Volume and Usage
? Consider our Tables
(PKs FKs omitted to save space)
CREATE TABLE physician ( physid CHAR(9) NOT
NULL,
physname CHAR(30) NOT NULL,
specialty CHAR(15) NOT NULL,
street CHAR(20), city CHAR(20),
state CHAR(2), DEFAULT TX, zip CHAR(5))
101 7 108 Bytes
CREATE TABLE patient ( patid CHAR(9),
name CHAR(30) NOT NULL, street CHAR(20),
city CHAR(20), state CHAR(2), DEFAULT TX,
zip CHAR(5), physid CHAR(9))
95 7 102 Bytes
Per Record Entered
CREATE TABLE illness ( illcode CHAR(10) NOT
NULL,
name CHAR(20) NOT NULL)
30 2 32 Bytes
CREATE TABLE prescription ( drugcode CHAR(10),
name CHAR(20) NOT NULL)
30 2 32 Bytes
CREATE TABLE treatment ( patid CHAR(9 ),
illcode CHAR(10), drugcode CHAR(10),
tdate DATE)
33 3 36 Bytes
78General Design Considerations
Estimating Data Volume and Usage
? We Need to analyze our Present Usage first
? We have 50 Physicians
? 50 108 5,400 Bytes
? We have 1,000 Patients
? 1,000 102 102,000 Bytes
? There are 50,000 known Illnesses
We Require 2,477,840 Bytes of storage
? 50,000 32 1,600,000 Bytes
? We carry 8,000 different drugs
? 8,000 32 256,000 Bytes
? The Average Patient has 7.8 visits
? 1,000 7.8 36 280,800 Bytes
? The Ave. treatment requires 1.8 prescriptions
? (1.8 1) 1,000 7.8 36 224,640 Bytes
79General Design Considerations
Estimating Data Volume and Usage
? Next we need to figure out our needs for the
next five years
? We expect a 5 increase in Physicians per year
? fv(50, 5, 5) 64 108 6,912 Bytes
? We expect a 15 increase in Patients per year
? fv(1,000, 15, 5) 2,012 102 205,224 Bytes
? We expect at most a 1 increase in known
Illnesses
? (50,000 1.01) 32 1,616,000 Bytes
? We expect a 3 increase in the number of drugs
carried per year
? fv(8,000, 3, 5) 9,275 32 296,800 Bytes
? The Average Patient visits will increase to 10
? 2,012 10 36 724,320 Bytes
? The Ave. treatment will require 2.2
prescriptions
? (2.2 1) 2,012 10 36 869,184 Bytes
( a 50 increase)
We will need 3,718,440 Bytes of storage
80General Design Considerations
Estimating Data Volume and Usage
? OR, Should we plan on the following
? We expect a 5 increase in Physicians per year
? fv(50, 5, 5) 64 108 6,912 Bytes
? We expect a 15 increase in Patients per year
? (1,000 1,150 1,322 1,521 1,749
2,011) 102 892,806 Bytes
? We expect at most a 1 increase in known
Illnesses
? (50,000 1.01) 32 1,616,000 Bytes
? We expect a 3 increase in the number of drugs
carried per year
? fv(8,000, 3, 5) 9,275 32 296,800 Bytes
? The Average Patient treatments will increase
to 10
? 8,753 10 36 3,151,080 Bytes
? The Ave. treatment will require 2.2
prescriptions
? (2.2 1) 8,753 10 36 3,781,296 Bytes
( a 235 increase)
We will need 8,290,494 Bytes of storage
81General Design Considerations
Objectives in choosing Data Types
? Minimize Storage Requirements
? Can a SMALLINT (-32,768 to 32,767) be used?
? What confidence interval is acceptable??
? Maybe VARCHARs should be used (Careful)
? Represent ALL possible values
? Except perhaps unnecessarily long strings
? Improve data integrity
? FKs must match the referring field
? Support all data manipulations
? Make sure numeric data is entered into fields
requiring calculations
82General Design Considerations
Coding and Compression
? Some attributes have sparse sets of values or
are very large and might better be translated
into a code
? Suppose we were to find out that in our
patient table
? 72 of patients lived in El Paso (requiring
71 bytes of storage/field)
? 14 of patients lived in Juarez (requiring 61
bytes of storage/field)
? 10 of patients lived in Las Cruces (requiring
101 bytes of storage/field)
? The rest live in 15 other cities, the longest
city name requiring 25 bytes
---- SO ??? -----
? Presently, the city field in table patient
requires 15 bytes per record, or a total of 1,000
15 15,000 bytes of storage
---- SO ??? -----
? We can save space by using a code look-up table
83General Design Considerations
Coding and Compression
? Consider the following 2 tables
Physician Table Attributes/Fields
Cities Table Attributes/Fields
CREATE TABLE patient ( patid CHAR(9), name CHAR(3
0), street CHAR(20), citycode SMALLINT,
state CHAR(2), zip CHAR(5), physid CHAR(9),
PRIMARY KEY (patid), FOREIGN KEY (citycode)
REFERENCES physician (physid),
FOREIGN KEY (citycode) REFERENCES
cities (citcode))
CREATE TABLE cities ( Citcode SMALLINT, Citname C
HAR(15), PRIMARY KEY (citcode))
Each record requiring 19-Bytes
---- SO ??? -----
84General Design Considerations
Coding and Compression
? Our field citycode (now a SMALLINT) requires 2
1,000 2,000 bytes of storage (vs. 1,000 15
15,000 bytes of storage)
? Our table cities requires a total of 18 19
342 bytes of storage
- A 12,658 net byte savings
Patient Table
Cities Table
85General Design Considerations
Controlling Data Integrity
(Weve already seen this)
? Default Values
? Helps reduce input time
? Helps reduce input errors
? Range Control
? Helps reduce input errors
? Null Value Control
? Helps to ensure entity/referential Integrity
? Entity Integrity
? Every Table should have predefined primary key
? Referential Integrity
? Assures that every foreign key is either NULL
or has a corresponding value in the referenced
table
86General Design Considerations
File Organization
? How the records of a file are physically
arranged on secondary storage
? Objectives
? Fast data retrieval
? High throughput for processing
? Efficient use of storage space
? Protection from disk failure and data loss
? Minimization of need for reorganization
? Accommodation of growth
? Security from unauthorized use
87General Design Considerations
File Organization
? Sequential Access
? Records stored in sequence according to a
primary key
? Indexed File Organization
? Records may or may not be stored in sequence
? A separate file (index) is created to access
the records
? Indexed Sequential Files (layered lists)
88General Design Considerations
File Organization
? Hashing
? The record value is used to determine the
address (Hashing Algorithm)
? Suppose we wished to store 1,000 student
records
StudID
123456 mod 997 825
123456
234567 mod 997 272
234567
345678 mod 997 716
345678
456789 mod 997 163
456789
? There may be collisions (which we need not
get into here)
997 is the closest prime number to 1000 (the
number of records)
89General Design Considerations
Indices
? May be used for any field (most commonly the
primary key)
- Creating a primary key index
CREATE UNIQUE INDEX patindex ON patient (patid)
filename
Table
Field
- Creating a secondary key index
CREATE INDEX patnameindex ON patient (name)
-- Notice that we can not use the command UNIQUE
--
90General Design Considerations
When to Use Indices
? Most Useful on Large Tables
? Specify a unique Index for the primary key of
each table
? Most useful for frequently used WHERE clauses
which search for specific records
(e.g., . WHERE phsyid 123456789 )
? Most useful for frequently used ORDER BY
clauses on specific Attributes
? Most useful when there are a large number of
attribute values
Cautions
? Some DBMSs have limits on the number of
indices allowed
? Be careful about indexing on fields that have
NULL values
91Database Design