Title: Prof. Ray Larson
1Lecture 11 Intro to Database Design
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 pm
- Fall 2003
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
3Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
4Streams vs. Clips
5Stream-Based Representation
- Makes annotation pay off
- The richer the annotation, the more numerous the
possible segmentations of the video stream - Clips
- Change from being fixed segmentations of the
video stream, to being the results of retrieval
queries based on annotations of the video stream - Annotations
- Create representations which make clips, not
representations of clips
6Keywords vs. Semantic Descriptors
dog, biting, Steve
7Why Keywords Dont Work
- Are not a semantic representation
- Do not describe relations between descriptors
- Do not describe temporal structure
- Do not converge
- Do not scale
8Natural Language vs. Visual Language
Jack, an adult male police officer, while walking
to the left, starts waving with his left arm, and
then has a puzzled look on his face as he turns
his head to the right he then drops his facial
expression and stops turning his head,
immediately looks up, and then stops looking up
after he stops waving but before he stops
walking.
9After Capture Media Streams
10Media Streams Features
- Key features
- Stream-based representation (better segmentation)
- Semantic indexing (what things are similar to)
- Relational indexing (who is doing what to whom)
- Temporal indexing (when things happen)
- Iconic interface (designed visual language)
- Universal annotation (standardized markup schema)
- Key benefits
- More accurate annotation and retrieval
- Global usability and standardization
- Reuse of rich media according to content and
structure
11Video Retrieval In Media Streams
- Same interface for annotation and retrieval
- Assembles responses to queries as well as finds
them - Query responses use semantics to degrade
gracefully
12Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
13What is a Database?
14Files and Databases
- File A collection of records or documents
dealing with one organization, person, area or
subject (Rowley) - Manual (paper) files
- Computer files
- Database A collection of similar records with
relationships between the records (Rowley) - Bibliographic, statistical, business data,
images, etc.
15Database
- A Database is a collection of stored operational
data used by the application systems of some
particular enterprise (C.J. Date) - Paper Databases
- Still contain a large portion of the worlds
knowledge - File-Based Data Processing Systems
- Early batch processing of (primarily) business
data - Database Management Systems (DBMS)
16Why DBMS?
- History
- 50s and 60s all applications were custom built
for particular needs - File based
- Many similar/duplicative applications dealing
with collections of business data - Early DBMS were extensions of programming
languages - 1970 - E.F. Codd and the Relational Model
- 1979 - Ashton-Tate and first Microcomputer DBMS
17File Based Systems
File
Application
Toys
Addresses
Delivery List
Naughty
Coal Estimation
Toys
Nice
Just what asked for
18From File Systems to DBMS
- Problems with file processing systems
- Inconsistent data
- Inflexibility
- Limited data sharing
- Poor enforcement of standards
- Excessive program maintenance
19DBMS Benefits
- Minimal data redundancy
- Consistency of data
- Integration of data
- Sharing of data
- Ease of application development
- Uniform security, privacy, and integrity controls
- Data accessibility and responsiveness
- Data independence
- Reduced program maintenance
20Terms and Concepts
- Data independence
- Physical representation and location of data and
the use of that data are separated - The application doesnt need to know how or where
the database has stored the data, but just how to
ask for it - Moving a database from one DBMS to another should
not have a material effect on application program - Recoding, adding fields, etc. in the database
should not affect applications
21Database Environment
22Database Components
23Types of Database Systems
- PC databases
- Centralized database
- Client/server databases
- Distributed databases
- Database models
24PC Databases
E.g. Access FoxPro Dbase Etc.
25Centralized Databases
26Client Server Databases
27Distributed Databases
Location B
Location C
computer
computer
Homogeneous Databases
computer
Location A
28Distributed Databases
Client
Heterogeneous Or Federated Databases
Remote Comp.
Local Network
Database Server
Comm Server
Remote Comp.
Client
29Terms and Concepts
- A database application is an application
program (or set of related programs) that is used
to perform a series of database activities - Create
- Add new data to the database
- Read
- Read current data from the database
- Update
- Update or modify current database data
- Delete
- Remove current On behalf of database users
30Terms and Concepts
- Enterprise
- Organization
- Entity
- Person, Place, Thing, Event, Concept...
- Attributes
- Data elements (facts) about some entity
- Also sometimes called fields or items or domains
- Data values
- Instances of a particular attribute for a
particular entity
31Terms and Concepts
- Key
- An attribute or set of attributes used to
identify or locate records in a file - Primary Key
- An attribute or set of attributes that uniquely
identifies each record in a file
32Terms and Concepts
- Models
- (1) Levels or views of the Database
- Conceptual, logical, physical
- (2) DBMS types
- Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
33Models (1)
More later on this
34Data Models(2) History
- Hierarchical Model (1960s and 1970s)
- Similar to data structures in programming
languages
35Data Models(2) History
- Network Model (1970s)
- Provides for single entries of data and
navigational links through chains of data.
36Data Models(2) History
- Relational Model (1980s)
- Provides a conceptually simple model for data as
relations (typically considered tables) with
all data visible
37Data Models(2) History
- Object Oriented Data Model (1990s)
- Encapsulates data and operations as Objects
38Data Models(2) History
- Object-Relational Model (1990s)
- Combines the well-known properties of the
Relational Model with such OO features as - User-defined datatypes
- User-defined functions
- Inheritance and sub-classing
- All of the major enterprise DBMS systems are now
Object-Relational or incorporate
Object-Relational features
39Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
40Database System Life Cycle
41Design
- Determination of the needs of the organization
- Development of the Conceptual Model of the
database - Typically using Entity-Relationship diagramming
techniques - Construction of a Data Dictionary
- Development of the Logical Model
42Physical Creation
- Development of the Physical Model of the Database
- Data formats and types
- Determination of indexes, etc.
- Load a prototype database and test
- Determine and implement security, privacy and
access controls - Determine and implement integrity constraints
43Conversion
- Convert existing data sets and applications to
use the new database - May need programs, conversion utilities to
convert old data to new formats
44Integration
- Overlaps with Phase 3
- Integration of converted applications and new
applications into the new database
45Operations
- All applications run full-scale
- Privacy, security, access control must be in
place - Recovery and Backup procedures must be
established and used
46Growth, Change, and Maintenance
- Change is a way of life
- Applications, data requirements, reports, etc.
will all change as new needs and requirements are
found - The Database and applications and will need to be
modified to meet the needs of changes
47Another View of the Life Cycle
Integration 4
Operations 5
Design 1
Physical Creation 2
Conversion 3
Growth, Change 6
48Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
49Database Design Process
50Entity
- An Entity is an object in the real world (or even
imaginary worlds) about which we want or need to
maintain information - Persons (e.g. customers in a business,
employees, authors) - Things (e.g. purchase orders, meetings, parts,
companies)
Employee
51Attributes
- Attributes are the significant properties or
characteristics of an entity that help identify
it and provide the information needed to interact
with it or use it (this is the Metadata for the
entities)
52Relationships
- Relationships are the associations between
entities - They can involve one or more entities and belong
to particular relationship types
53Relationships
54Types of Relationships
- Concerned only with cardinality of relationship
1
1
n
1
n
m
Chen ER notation
55Other Notations
Crows Foot
56Other Notations
IDEFIX Notation
57More Complex Relationships
1/1/1
1/n/n
n/n/1
SSN
Project
Date
1
4(2-10)
Manages
1
Is Managed By
n
58Weak Entities
- Owe existence entirely to another entity
59Supertype and Subtype Entities
60Many to Many Relationships
Employee
61Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Discussion
- Next Time/Readings
62Questions Brooke Maury
- Discussion Questions on Hoffer McFadden
- The relational database model has remained fairly
static since its inception in the 1970s. Is this
evidence of its strength as an organizational
model or an indication of its inflexibility?
63Questions Brooke Maury
- If the goal of the relational database model is
to encode a conceptual design into a logical
design, is it possible that improved technology
and the development of new modeling techniques
will supplant the RDBMS? Specifically, what
impact will XML and the development of document
engineering have on organizing information in
multiple normalized tables? - Conversely, what does the relational model have
that would be lost if a conceptual design was
encoded in another model?
64Questions Brooke Maury
- (Next time?) The drive to develop the RDBM was in
part motivated by a need to minimize the space
required and improve the performance of database
systems by removing redundancies. What impact
will very inexpensive data storage and computing
power have on the relational database model and
the third normal form especially?
65Questions Shane Ahern
- Discussion Questions for "Logical Database Design
and the Relational Model" - Is the normalization process described really
necessary? When I design a database schema, I
find that by thinking of tables in terms of they
entities they represent (employees, sales,
events), I avoid most of the problems of
normalization that the process seeks to address
(i.e. salesperson and region in Sales table,
salesperson is clearly a distinct entity from
sales). If the formal process described in the
article is not followed, are there potential
pitfalls that might lead to problems with your
database schema?
66Questions Shane Ahern
- The article points out that "the relational model
does not yet directly support supertype/subtype
relationships." Once the tables in a relational
database have been decomposed to third normal
form, the database is efficient from systems
point-of-view, but the tables no longer represent
a representation of the data that is intuitive to
humans. The object-oriented model more accurately
mirrors the way we think about the concepts that
we wish to store in databases. So perhaps
object-oriented database systems are worth
considering. What about XML databases?
67Questions Arthur Law
- The three models that we have been presented
with, Entity Relationship Model, NIAM Model, and
Object Oriented Model all enforce a specific
thought process in the organization and
relationship between items in a database. With
all of our recent discussion of computers
understanding natural language are these methods
now out of date with how we should be organizing
information? Should we use artificial
intelligence or learning algorithms to
statistically determine the relationship between
entities or is there still value in using these
models?
68Questions Arthur Law
- Each model is approximately one decade apart in
development and a quick Google search shows that
companies are using databases with one of the
three models. However, as new models arise there
doesn't seem too much interest in migrating from
one data model to another. Which makes sense
given that an organization using a given model
probably finds that it works. Now with the
proliferation of XML, we see more information
being shared between organizations, so are we
fated for an expensive and lengthy translation
process between databases? Or should all DB
administrators be responsible for upgrading to
the latest model?
69Lecture Overview
- Review
- MediaStreams
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Discussion
- Next Time/Readings
70Next Time
- Database Design Normalization and SQL
- Readings (no additional DBMS readings)
- Additional Questions/ or revisit some of todays
discussion questions in the light of the next
lecture?