Title: Database Design, a Practical Guide
1Database Design, a Practical Guide
Gus Björklund (gus_at_progress.com)
Wizard, Progress Software Corporation
2Ask questions as we goif I am not being
clear.Warning there is a mistake in these
slides.
3Rules are made to be broken
To every rule, there is an exception!
4If you thought this talk was going to be about
indexing
It isnt. Nor is it about performance.
5Topics
- Theory
- What is Database Design
- Basic Elements
- Representing the Model as Tables
- Practice
- An Example
- Some Other Topics
6First, a little theory
7What do we mean by database design?
- A process for defining a model of a subset of the
real1 world, then representing it as data in
tables in a relational databaseAt least, thats
the definition we will use for the purposes of
this talk.
1 Well, for small values of real, anyway.
8Basic Elements
What do we put in our model?
- Just 3 Things
- Entities
- Attributes
- Relationships
The entity-relationship model was described by
Peter Chen in 1976. See http//bit.csc.lsu.edu/c
hen/chen.html
9Basic Elements Entities
- Can be thought of as nouns
- People
- author, composer, performer, seller, buyer
- Places
- home, IP address, URL, destination, factory,
store - Things
- song, recording, instrument, car, invoice
Is telephone number a place or a thing?
10Basic Elements Attributes
Entities have attributes
- Can be thought of as adjectives (but only
loosely) - Length
- Color
- Horsepower
- Part number
- Song Title
- Publication Date
- Size
- Fabric
- Owner
Is telephone number a attribute or an entity?
11Basic Elements Relationships
Entities are connected by relationships
- Can be thought of as verbs
- has a
- owns
- contains
- supervises
- performs
- called
- sold
- purchased
- proved
Is telephone number a relationship?
12Relationships have attributes too
In May, 1995, Andrew Wiles published a proof of
Fermats Last Theorem
13Relationships have attributes too
In May, 1995, Andrew Wiles published a proof of
Fermats Last Theorem
attribute
entity
relationship
entity
14What goes in an entity
- Identifying attributes
- Must be able to uniquely identify the entity
- Can have more than one way to id
- Id can be composite
- Descriptive attributes
- the values you need to keep track of
- generally should be simple, not complex
15What to include in your model
- The things your application has to keep track of
- Telephones, wires, switches
- The actions your application or its users perform
- Make calls, send telephone bills, collect
payments - Some attributes of the things and actions
- Originating number, date and time of call,
duration, called number - Keep it simple
- Be accurate
- Keep it up to date
16What to include in your model
- Consider the goals of the system
- Everything you include should be there for a
reason you can state - in no more than two sentences
- Everything should have a clear name
- if you cant name it, it doesnt belong
- Talk to the stakeholders !!!
17What to leave out of your model
- The real world has properties that dont matter
(to your application) - The real world has relationships that dont
matter - Things happen in the real world that dont matter
- Keep it simple
- If you cant say why you need it, leave it out
18Logical vs Physical Data Models
- Logical entities often require multiple tables to
represent them - Tables can be thought of as logical or physical
- It depends on your point of view
- There is also the physical storage database
layout - storage areas
- data extents
- disks
- etc.
- We arent going to talk about the physical
database layout - We will talk about tables
19Mapping Your Model to a Database
Simply put,
- Entities become tables
- Identifiers become indexes
- Attributes become columns
- Data types pick appropriate
- Relationships become tables or foreign keys
20In theory, there is no difference betweentheory
and practice, but in practice there is.Jan van
de Snepscheut
21Now for some practice.
22An example
- Music store
- Buys compact disc recordings from distributors
- Has inventory
- Allows customers to search for what they want
- Maybe in an in-store kiosk or on the web
- Sells compact discs to customers
23What should we do first?
24Activities
- We buy discs from a distributor
- Orders are sent to a distributor
- Orders are delivered to the store
- Orders may be cancelled
- We sell discs to customers in sales transactions
- Customers buy discs in sales transactions
- Customers search for what they want to buy
Which of these must be remembered by the system?
25What do we need to keep track of
- Discs we have
- Discs we sold
- Discs we know about and can get
- Discs we have ordered
- Information needed to do our income tax
- what we paid for stock
- when we bought it
- what we sold it for
- when we sold it
26Disc entities
- UPC Code 8697-07416-2
- Manufacturer Sony BMG
- Cost to us 2.00
- Price charged 17.95
- Tax charged 0.80
- Date purchased March 19, 2007
- Date sold June 9, 2007
27Disc table might look like this
28Whats wrong?
- Is upc a unique identifier?
- Might have bought from a distributor
- Have no information about what is on the disc
- How do customers search?
- Dont know when disc was made
- Could be more than one tax jurisdiction
- provincial tax, city tax
- Dont know if disc is on order
- Dont know who bought it
- Duplicated data
- Etc., etc.
29Disc entities take 2
- UPC Code 8697-07416-2
- Manufacturer Sony BMG
- Distributor Bobs Wholesale CDs
- Cost to us 2.00
- Price charged 17.95
- Tax charged 0.80
- Date ordered March 19, 2007
- Date received March 20, 2007
- Date sold June 9, 2007
- Disc Title The Essential Joshua Bell
- Artist Joshua Bell
- Track 1 Danse Russe
- Track 2 Violin Concerto in E Minor
- Track 3 Nocturne in C-sharp Minor
- etc.
30Example Now Whats wrong?
- This is getting messy
- Activities combined with discs attributes
- Have duplicated information
- How many tracks can there be?
- What if there is more than one artist?
- Dont have all the information a customer might
want to use to search
31Discs revisited
- Discs have titles
- Discs have pictures on the cover
- Discs contain tracks
- Discs are made by manufacturers
- Discs are purchased from distributors
- Discs are ordered from distributors
- Discs are delivered to the store
- Discs are sold to customers
32Discs contain tracks
- Tracks contain songs
- Tracks occur in order
- Tracks have a duration
- Songs are performed in performances
- Songs have performers (usually)
- Songs have composers
- Songs have names (titles)
- Songs have a key (but not always)
- Performances are done by performers
- Performers can be groups (bands, orchestras,
etc.) - Performances are performed in a location or venue
33We seem to need these entities
- Discs
- Manufacturers
- Distributors
- Orders
- Customers
- Inventory
- Tracks
- Songs
- Performers
- Groups ?
34Songs have names (titles).Are names properties
of songs?Or are they entities related to
songs?Or are they something else?
35Song data (track 1)
36Song data (track 2)
37Performance data
38Performance data take 2
39Performer data
40Performance to Performer Relationship
41Performance data take 3
42Track to Performance Relationship
43Relationships (so far)
track
performance
one to one
performer
performance
performance
disc
performer
track
performance
track
one to many
many to many
track
44What happened to Songs?
45Relationships (take 2)
song
performance
track
song
one to many
performance
one to one
performance
disc
performer
track
performance
track
performance
one to many
performer
track
performance
many to many
46Relationships (take 3)
47What aboutbusiness entities?Where are they?
48Business entities
Here is one kind of business entity
49Business entities
Here is a different kind of business entity
50Business entities
Here is still another kind of business entity
51Should you use arrays?
52Indexes
- Enforce uniqueness
- Make searches faster
- Enable fast retrieval of entities by their
identities - Enable finding entities with certain attributes
53What indexes do we needfor the music store
database?
54Tables
0) Discs1) Tracks2) Songs3) Performers4)
Performances5) Tracks of discs6) Performances
of songs7) Performers of performances
55What indexes do we need
0) Indexes for identifying attributes1) A unique
row identifier2) Indexes for the queries you
will do
56What should we do next ?
57Other Topics
- Normalization
- Unique keys
- Word indexes
- Naming
- Customisation
58Normalization
- Oversimplified, it means
- Dont duplicate data
- Attributes should be simple
- have only one value
- be necessary
- not derived data
- dont repeat
- Complicated attributes are often entities in
their own right - For example, addresses might be
59Unique keys
- EVERY table must have a unique key
- EVERY row needs a unique identifier
- that never changes even if moved to another
database (i.e. if you replicate) - Often, users dont need to see it
- Use a UUID or sequence or maybe datetime
- Unique key is the ONLY way to identify rows
unambiguously - ROWIDs are temporary and can change
- Use the same method throughout
- Youll be glad you did
60Word indexes
- Can be used to hold multiple status or attribute
values - Conflicts with normalisation
- Flexible
- Easy to add new ones
- Queries are fast
- Example
- Category classical, violin, orchestral, concerto
61Naming
Good names are crucial to understanding
- What is in the column GL01262 ?
62Naming
Good names are crucial to understanding
- Table and column names should have clear meanings
everyone can understand - GL01262 vs dateEntered
- Names with dashes cause inconvenience with SQL
- order-date
- Booleans should be named for truth value
- backOrdered
- No double negations
- notOutOfStock
63Making tables customizable
We will look at 4 ways
- Spare columns
- Separate table with spare columns
- Separate table with name/value pairs
- Name/value pairs in word-indexed column
64Table and columns
65Spare columns in table
66Spare columns in table
What data types should you use? How many spare
columns? Wasted columns when not used How do you
know what each spare got used for? How do you
know how many unused spares you have?
67Separate table for spare columns
68Separate table for spare columns
69Separate table with name/value pairs
70Name/value pairs in word-indexed column
71Modeling Tools
- PCase
- Enterprise Architect
- Power Designer
- ConceptDraw
- Erwin
- Rational
Pencil and paper !
Blackboard !
72Summary
- Understand the requirements
- Leave out what is not needed
- Review the design with stakeholders
- Evolve the design as changes come up
- Test to make sure it works
- Can it do everything that is needed?
- Does it perform adequately?
- Expect changes to come
73Homework
- Papers
- Wiles, A. "Modular elliptic curves and Fermat's
Last Theorem, Annals of Mathematics 141 (3)
443-551 - Chen, P. The Entity-Relationship Model --
Toward a Unified View of Data, ACM TODS Vol 1,
No 1, 1976 - Wikipedia articles to start from
- entity-relationship model
- data model
- Books
- Teorey, Lightstone, Nadeau Database Modeling
and Design, Morgan Kaufmann.
74Questions
?