Title: Principles of Database Design
1Principles of Database Design
- James J. Cimino
- NIH Clinical Center
2Outline
- Definition
- Motivation
- History and evolution
- Design principles
- Design methods
- Exercises
- Take-Home Messages
3Database Definition
- A collection of data that
- is organized
- usually computer-based
- represents repetitive information implicitly
- supports retrieval
4Information
- Content
- Name
- Date
- Diagnosis
- Medication
- Problem
- Procedure
- Visit
- Structure
- Field
- Record
- Table
- Database
5Paper Database as Expert System
6Motivation
- Power and flexibility depend on data model
- Database is the realization of data model
- Evaluation of commercial products
- Communicating with vendors and IT staff
- Building your own databases
7In the beginning, there were punch cards
8Sequential Files
- Stored on magnetic tape
- Longer (or shorter) than 80 characters
- 8-bit characters (256 characters)
- Variable-length records
Len21 Data Len16
Data
0211234567SandiegoCarmen0161234568CiminoJim
ID Loc ID Loc
12345678901234567890123
SandiegoCarmenCiminoJim
1234567000112345680015
9Random Access Files
- Disk storage with moving heads
- Larger capacity (MB!)
- Addressable records and fields using pointers
- Indexes created as lists of pointers
- Separation of physical and logical models
- Can be difficult to recover if index corrupted
10Random Access Files
- 0000 Name1000Addr2000DOB 6000
- 1000 Sandiego, Carmen
- 2000 123 Main Street
- 3000
- 4000
- 5000
- 6000 Feb 1, 1948
11Indexed Files
0000 0000 0004 4000 0008 2000
0000 Cimino, James 1000 2000 Norton,
Cathy 3000 4000 Lindberg, Don
12Hierarchical Databases
1234567
Sandiego, Carmen
123 Main Street
Labs
Chem7
Chem7
K 3.9
Na142
K 4.3
Na136
13Hierarchical Databases
- Easy to use
- Efficient storage
- Tree walking is fast
- Queries across trees are slow
- Flexible
- Too flexible chaos is allowed
- Too easy to modify
- Difficult to document complex structures
14Hierarchical Databases
- EMR(1234567)Sandiego, Carmen
- EMR(1234567, Address)123 Main Street
- EMR(1234567, Chem7, 2/2/02, Na)136
- EMR(1234567, Chem7, 2/2/02, K)4.3
- EMR(1234567, Chem7, 2/3/02, Na)142
- EMR(1234567, Chem7, 2/3/02, K)3.9
15Hierarchical Chaos
1234567
Admissions
Admission 1
Admit Date 2/2/02
Primary DX CHF
Other DX
AODM
A Fib
Flag S
Flag P
16Network Databases
1234567
Gyn Clinic
2 Main St.
Sandiego
305-2500
Secretary
Gyn Clinic
8AM-5PM
Ms Smith
305-1000
Service
Pap
Dr. Jones
Gyn Visit
Beeper 34
17Extensible Markup Language (XML) Databases
- SGML is a metalanguage
- SGML is used to write Document Type Definitions
(DTDs) that define languages - HTML is a language with an SGML DTD
- Tags are for formatting/presentation syntax
- XML is a proper subset of SGML
- XML defines tags that convey semantics
- We could write Health Markup Language (HML)
in XML (if we could agree on the semantics and
tags) - Tags may or may not be stored with data
18ltdocumentgt lt/documentgt
ltdocument.idgtCXR001lt/document.idgt ltdoc.
dategt19991101lt/doc. dategt ltdocument.typegt lt/d
ocument.typegt ltdocument.bodygt ltdocument.bod
ygt
ltidentifiergtP5-00010lt/identifiergt
lttextgtChest X-Raylt/textgt
ltfindingsgtNo infiltrate, cardiac shadow not
enlarged...lt/findingsgt ltimpressiongtNormal
X-raylt/impressiongt
19ltpatientgt lt/patientgt
ltpatient.idgt lt/patient.idgt ltpatient.namegt lt/pa
tient.namegt ltpatient.dobgt19230113lt/patient.dobgt ltp
atient.sex value"male"/gt ltinpatient/gt
ltid.valuegt1234789lt/id.valuegt
ltfamily.namegtSandiegolt/family.namegt ltgiven.namegtCa
rmenlt/given.namegt ltsuffixgtM.D.lt/suffixgt
20Extensible Markup Language (XML) Databases
- Strengths
- Flexibility to represent wide range of data
- Data carries its field assignment
- Sparse data handled compactly
- Tags can have platform-specific display
- Weaknesses
- Immature database tools
- Verbose
- I/O intensive
- A trade-off of decreased efficiency for increased
flexibility ? scalability
21Relational Databases - Features
- Tables with columns and rows
- Logical vs. physical representation
- Multiple indexes
- Inter-table relationships
- Virtual sequential files (with simultaneous
update)
22Relational Databases
From table Patient, get Pt_UI where
LnameJones and FnameCasey
and then Get Tname and Date from table
Lab_test for the same Pt_UI
23Normalization
- Efficient database organization
- Eliminate redundant data
- Ensure data dependencies make sense
- E.F. Codd, 1970 five normal forms
- First Normal Form
- Eliminate duplicative columns
- Create separate tables for each group of related
data - Identify each row with a unique column or set of
columns (the primary key).
24Normalization (continued)
- Second normal form
- Remove subsets of data that apply to multiple
rows of a table and place them in a separate
table - Create relationships between these new tables and
their predecessors through the use of foreign
keys - Third normal form
- Remove columns that are not dependent upon the
primary key
25Relational Databases - Advantages
- Comprehensible
- Multiple views possible
- Easy to modify
- New elements dont break programs
- Database management systems (DBMS)
- Referential integrity
- Reorg for efficiency
- Access control
- Locking for multiple simultaneous use
26Relational Databases - Disadvantages
- Storage overhead
- I/O-intense
- Cost
27Systems Design Fundamentals
- The data model is the most critical aspect
- Data model should reflect real world objects and
relationships to ensure durability - A correct data model outlasts applications,
including many not anticipated at system start-up
28System Design Basic Concepts
- The world contains things
- Develop abstractions called objects
- Group objects by criteria which represent the
abstract object as an empty table
29Types of Objects (Tables)
- Tangible things (book, person)
- Roles (doctor, patient, supervisor)
- Events (ordering of a lab test)
- Interactions - bind two or more other objects via
a transaction (purchase relates buyer to seller)
30Objects
- All of the real-world things in the set (the
instances) have the same characteristics - All instances conform to the same rules
31Basic Concepts (continued)
- Empty tables can be filled in to represent the
real world things from which the object was
abstracted
32Basic Concepts (continued)
- Relationships between objects are attributes of
those objects
Relationship Has-Doc Patient Has-Doc
Physician
33Table Notation
Textual Form Patient_Admissions
(Pt_ID, Date_Adm, Time_Adm, Unit, Room)
34Formalisms for Tables
- Rule 1 One instance of an object has
- exactly one value for each attribute
- only one element per row-column intersection
- no repeating groups
- no true holes in table
- Rule 2 Attributes contain no internal structure
35Formalisms for Tables
36Formalisms for Tables
- Rule 1 One instance of an object has
- exactly one value for each attribute
- only one element per row-column intersection
- no repeating groups
- no true holes in table
- Rule 2 Attributes contain no internal structure
- Rule 3 Every attribute should represent a
characteristic of the entire object, not a
characteristic of a limited part of the object
37Formalisms for Tables
38Formalisms for Tables
- Rule 1 One instance of an object has
- exactly one value for each attribute
- only one element per row-column intersection
- no repeating groups
- no true holes in table
- Rule 2 Attributes contain no internal structure
- Rule 3 Every attribute should represent a
characteristic of the entire object, not a
characteristic of a limited part of the object
39Relationships
- Relationship an abstraction of an association
between real world things - Patient OCCUPIES Bed
- Library CONTAINS Books
- Specimen IS ASSAYED by Lab Method
- Inverse relationships
- Bed is OCCUPIED BY Patient
- Book IS LENT BY Library
40Relationship Types
41Modeling Many-to-Many Relationships
42Exercise Devise a Relational Model for MEDLINE
citations
PMID- 2405204 TI- Medical informatics. An
emerging academic discipline and institutional
priority. AB- Information management constitutes
a major activity of the health care\ AD- Departme
nt of Radiology, Brigham and Women's Hospital,
Boston, MA 02115 AU- Greenes RA AU- Shortliffe
EH LA- eng PT- Journal Article PT- Review JT- JAMA
the journal of the American Medical
Association SO- JAMA. 1990 Feb 23263(8)1114-20.
MH- Career Choice MH- Hospital Information
Systems MH- Information Systems MH- Medical
Informatics/education/organization
administration/trends MH- Medical Informatics
Applications MH- National Library of Medicine
(U.S.) MH- Research MH- Training
Support MH- United States
43Case Presentation
- The patient is a 50 year old, Native American
female who present to the emergency room (ER)
with the chief complaint of lip numbness, nausea
and chest pain. - The patient was generally well until about one
half hour prior to arrival in the ER, while
eating dinner at as seafood restaurant in Rock
Harbor, MA. She was finishing a dinner of New
England clam chowder, lobster, steamed clams, and
corn on the cob when she noted onset of symptoms.
Others in her party ate fish and chips, although
two other people ate the clam chowder none at
the steamers. - She gives a history of hypertension and states
that she was getting a "capsule, half green, half
blue-green" from her private doctor. She also
reports that she was treated in the past for
tuberculosis while she was pregnant, but doesn't
remember what she was treated with or for how
long. She reports that she was at another
hospital on the other side of town, where she had
a liver biopsy. She reports that he thinks the
diagnosis was "hemachromatosis". The patient
reports an allergy to Bufferin. - Physical examination revealed a well-developed,
well-nourished diaphoretic female in moderate
respiratory distress. Vital signs showed a pulse
of 110, a respiratory rate of 8, an oral
temperature of 100.3, and a blood pressure of
150/100. Examination revealed rales over both
lower lung fields. Abdominal exam revealed a
tender, palpable liver edge. Neurologic exam
reveals dysarthria, diffuse muscle weakness, and
hyperreflexia. - Chem7 (serum) Glucose 100 (70-105) Chem7
(plasma) Glucose 150 (75-110) - CBC Hgb 15 (12.0-15.8), Hct 45 (42.4-48.0), WBC
11,000 (3,540-9,060), Platelets 145K (165-415K) - A fingerstick blood sugar was 80
- Urinalysis showed protein of 1 and glucose of 0
- A blood culture was positive for
methicillin-resistant Staphylococcus aureus
(MRSA) - ECG - Sinus Rhythm, 74BPM, Axis -30 degrees, ST
segment 2mm elevated and - T-waves down in leads I, L, V5 and V6
- Chest X-ray Left upper lobe infiltrate, left
ventricular hypertrophy - The patient's nurse reported that the patient
seemed more worried about who would care for her
elderly father if anything happened to her. - A medical student reviewing the case wonders
whether paralytic shellfish poisoning could cause
a myocardial infarction she decides to do a
literature search. - The patient was treated with activated charcoal
and stomach lavage , followed by enteric-coated
aspirin. Due to worsening respiratory
insufficiency, she was intubated and placed on
mechanical ventilation.
44Take-Home Messages
- Data model is the most critical aspect of system
design and function - Data models should reflect real world objects and
their relationships to ensure durability - A correct data model outlasts applications,
including many not anticipated at system start-up