Title: ARCH3: Database Design, a Practical Guide
1ARCH-3 Database Design, a Practical Guide
Gus Björklund
Wizard, Progress Software Corporation
2Rules are made to be broken
To every rule, there is an exception!
3If you thought this talk was going to be about
indexing
It isnt.
4Topics
- Theory
- What is Database Design
- Basic Elements
- Representing the Model as Tables
- Practice
- An Example
- Some Other Topics
5First, some theory
6What do we mean by database design?
- A process for defining a model of a subset of the
real world, then representing it as data in
tables in a relational databaseAt least, thats
the definition we will use for the purpose of
this talk.
7Basic Elements
What do we put in our model?
- Just 3 Things
- Entities
- Attributes
- Relationships
The entity-relationship model was invented by
Peter Chen in 1976. See http//bit.csc.lsu.edu/c
hen/chen.html
8Basic ElementsEntities
- Can be thought of as nouns
- People
- author, composer, performer, seller, buyer
- Places
- home, IP address, URL, destination, factory,
store - Things
- song, recording, instrument, car, invoice
Is telephone number a place or a thing?
9Basic ElementsAttributes
Entities have attributes
- Can be thought of as adjectives (but only
loosely) - Length
- Color
- Horsepower
- Part number
- Song Title
- Publication Date
- Size
- Fabric
- Owner
Is telephone number a attribute or an entity?
10Basic Elementsrelationships
Entities are connected by relationships
- Can be thought of as verbs
- has a
- owns
- contains
- supervises
- performs
- called
- sold
- purchased
- proved
11Basic elementsrelationships have attributes too
In May, 1995, Andrew Wiles published a proof of
Fermats Last Theorem
12What goes in an entity
- Identifying attributes
- Must be able to uniquely identify the entity
- Can have more than one way to id
- Id can be composite
- Descriptive attributes
- the values you need to keep track of
- generally should be simple, not complex
13What to include in your model
- The things your application has to keep track of
- Telephones, wires, switches
- The actions your application or its users perform
- Make calls, send telephone bills, collect
payments - Some attributes of the things and actions
- Originating number, date and time of call,
duration, called number - Keep it simple
- Be accurate
- Keep it up to date
14What to include in your model
- Consider the goals of the system
- Everything you include should be there for a
reason you can state - in no more than two sentences
- Everything should have a clear name
- if you cant name it, it doesnt belong
15What to leave out of your model
- The real world has properties that dont matter
(to your application) - The real world has relationships that dont
matter - Things happen in the real world that dont matter
- Keep it simple
- If you cant say why you need it, leave it out
16Logical vs Physical Data Models
- Logical entities often require multiple tables to
represent them - Tables can be thought of as logical or physical
- It depends on your point of view
- There is also the physical storage database
layout - storage areas
- data extents
- disks
- etc.
- We arent going to talk about the physical
database layout - We will talk about tables
17Mapping Your Model to a Database
Simply put,
- Entities become tables
- Identifiers become indexes
- Attributes become columns
- Data types pick appropriate
- Relationships become tables or foreign keys
18In theory, there is no difference betweentheory
and practice, but in practice there is.Jan van
de Snepscheut
19Now for some practice.
20An example
- Music store
- Buys compact disc recordings from distributors
- Has inventory
- Allows customers to search for what they want
- Maybe in an in-store kiosk or on the web
- Sells compact discs to customers
21What should we do first?
22Activities
- We buy discs from a distributor
- Orders are sent to a distributor
- Orders are delivered to the store
- Orders may be cancelled
- We sell discs to customers in sales transactions
- Customers buy discs in sales transactions
- Customers search for what they want to buy
Which of these must be remembered by the system?
23What do we need to keep track of
- Discs we have
- Discs we sold
- Discs we know about and can get
- Discs we have ordered
- Information needed to do our income tax
- what we paid for stock
- when we bought it
- what we sold it for
- when we sold it
24Disc entities
- UPC Code 8697-07416-2
- Manufacturer Sony BMG
- Cost to us 2.00
- Price charged 17.95
- Tax charged 0.80
- Date purchased March 19, 2007
- Date sold June 9, 2007
25Disc table might look like this
26Whats wrong?
- Is upc a unique identifier?
- Might have bought from a distributor
- Have no information about what is on the disc
- How do customers search?
- Dont know when disc was made
- Could be more than one tax jurisdiction
- provincial tax, city tax
- Dont know if disc is on order
- Dont know who bought it
- Duplicated data
- Etc., etc.
27Disc entities take 2
- UPC Code 8697-07416-2
- Manufacturer Sony BMG
- Distributor Bobs Wholesale CDs
- Cost to us 2.00
- Price charged 17.95
- Tax charged 0.80
- Date ordered March 19, 2007
- Date received March 20, 2007
- Date sold June 9, 2007
- Disc Title The Essential Joshua Bell
- Artist Joshua Bell
- Track 1 Danse Russe
- Track 2 Violin Concerto in E Minor
- Track 3 Nocturne in C-sharp Minor
- etc.
28Example Now Whats wrong?
- This is getting messy
- Activities combined with discs attributes
- Have duplicated information
- How many tracks can there be?
- What if there is more than one artist?
- Dont have all the information a customer might
want to use to search
29Discs revisited
- Discs have titles
- Discs have pictures on the cover
- Discs contain tracks
- Discs are made by manufacturers
- Discs are purchased from distributors
- Discs are ordered from distributors
- Discs are delivered to the store
- Discs are sold to customers
30Discs contain tracks
- Tracks contain songs
- Tracks occur in order
- Tracks have a duration
- Songs are performed in performances
- Songs have performers (usually)
- Songs have composers
- Songs have names (titles)
- Songs have a key (but not always)
- Performances are done by performers
- Performers can be groups (bands, orchestras,
etc.) - Performances are performed in a location or venue
31We seem to need these entities
- Discs
- Manufacturers
- Distributors
- Orders
- Customers
- Inventory
- Tracks
- Songs
- Performers
- Groups ?
32Songs have names (titles).Are names properties
of songs?Or are they entities related to
songs?Or are they something else?
33Song data (track 1)
34Song data (track 2)
35Performance data
36Performance data take 2
37Performer data
38Performance to Performer Relationship
39Performance data take 3
40Track to Performance Relationship
41Relationships
track
performance
one to one
performer
performance
performance
disc
performer
track
performance
track
one to many
many to many
track
42What happened to Songs?
43Relationships (take 2)
song
performance
track
song
one to many
performance
one to one
performance
disc
performer
track
performance
track
performance
one to many
performer
track
performance
many to many
44What aboutbusiness entities?
45Should you use arrays?
46Indexes
- Enforce uniqueness
- Make searches faster
- Allow fast retrieval of entity by any of its
identities
47What indexes do we needfor the music store
database?
48What should we do next ?
49Other Topics
- Normalization
- Unique keys
- Word indexes
- Naming
- Customisation
50Normalization
- It mostly means Dont duplicate data
- Properties should be simple
- have only one value
- are necessary
- are not derived data
- dont repeat
- Complicated properties are often entities in
their own right - For example, addresses might be
51Unique keys
- EVERY table must have a unique key
- Every row needs a unique identifier
- that never changes even if data moved to another
database (i.e. if you replicate) - Often, users dont need to see it
- Use a UUID or a sequence or maybe datetime
- Unique key is the ONLY way to identify rows
unambiguously - ROWIDs are temporary and can change
- Use the same method throughout
- Youll be glad you did
52Word indexes
- Can be used to hold multiple status or attribute
values - Conflicts with normalisation
- Flexible
- Easy to add new ones
- Queries are fast
- Example
- Category classical, violin, orchestral, concerto
53Naming
Good names are crucial to understanding
- Table and column names should have clear meanings
everyone can understand - GL01262 vs dateEntered
- Names with dashes cause inconvenience with SQL
- order-date
- Booleans should be named for truth value
- backOrdered
- No double negations
- notOutOfStock
54Making tables customizable
We will look at 3 ways
- Spare columns
- Separate table with spare columns
- Separate table with name/value pairs
55Spare columns
What data types should you use? How many spare
columns? Wasted columns when not used How do you
know what each spare got used for? How do you
know how many unused spares you have?
56Separate table with spare columns
57Separate table with spare columns
58Separate table with name/value pairs
59Modeling Tools
- Enterprise Architect
- Power Designer
- ConceptDraw
- Erwin
- Rational
Pencil and paper !
Blackboard !
60Summary
- Understand the requirements
- Leave out what is not needed
- Review the design with stakeholders
- Evolve the design as changes come up
- Test to make sure it works
- Can it do everything that is needed?
- Does it perform adequately?
61Where to learn more
- Papers
- Wiles, A. "Modular elliptic curves and Fermat's
Last Theorem, Annals of Mathematics 141 (3)
443-551 - Chen, P. The Entity-Relationship Model --
Toward a Unified View of Data, ACM TODS Vol 1,
No 1, 1976 - Wikipedia articles to start from
- entity-relationship model
- data model
- Books
- Teorey, Lightstone, Nadeau Database Modeling
and Design, Morgan Kaufmann.
62Questions
?