Title: CS157B Lecture 1
1CS157B Lecture 1
- Prof. Sin-Min Lee
- Department of Computer Science
- San Jose State University
2Tuesday Thursday 1015 1130 Also by
appointment
3Your evaluation in this course is determined by
30
Class Presentation 10 Presentation report
5
??!
4You are required to write up your report using
LaTeX. LaTeX is the standard tool for typesetting
scientific articles. Read http//www.latex-projec
t.org/intro.html
Inventor of TeX
5(No Transcript)
6 Sometimes
??
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Outline of Course
- Study of principals and techniques of databases
- Grades assigned as in information sheet
- Examples of use of databases
- Programming projects in database design and
implementation - Programming in Microsoft Access
- Programming in Java with Oracle
- Development of a web site with database support
11Textbook and class meetings
- Principles of Database Systems With Internet and
Java Applications - by Greg Riccardi
- 2001, Addison-Wesley
- Lectures and recitations
12Students Role in Class
- Attend class, please.
- The class notes are available, but they are not
the full classroom experience - Recitation sections are provided to personalize
and enhance your learning environment - You are paying for this, take advantage of it
- Read the book.
- There are many topics covered in the text, but
not the lectures - There are many details and examples in the text
- Seek help during office hours
13Current Events
- Each lecture will cover current events that
affect the database industry - Please bring info to lectures and recitations
14Why Study Files and Databases?
- Next few slides address the following
- Importance of Databases to Economy
- Can you get a job in the database field?
- Representation of Information
- Add meaning to data
- Management of Complexity
- Divide system into layers, focus on data
- Management of Access and Security
- Efficiency of Access and Storage
- Separate data, allow optimizations
15Importance of Databases to Economy
- Expanding use of databases in retail sales
- Walmart, retail sales information tracking
- LL Bean, catalog sales information tracking
- Examples of analyses
- Sales of items
- Comparisons between daily totals of items sold
and items in inventory - Seasonal variations in sales of specific and
similar items - Relative sales of similar items with different
features - Market-basket collections (all items in a single
purchase) - Average and variation in total purchase amount
- Average and variation in number and price of
items - Correlation between sales of items in a single
purchase - Customer analysis
- Behavior of average customer
- Preferences of individual customers
16Importance of Database Companies
- Oracle is the 2nd largest software company
- Its stock has outperformed SP 500 and Microsoft
- This picture is the stock performance, as shown
on the BigCharts Web site from July, 1999 to
July, 2000
17E commerce
- Companies are fighting for the market
- See Oracle Web site
- See IBM Web site
- XML and XSL at Microsoft
- http//www.microsoft.com/xml
- http//msdn.microsoft.com/xml/demos/
- http//msdn.microsoft.com/isapi/msdnlib.idc?theURL
/library/techart/fm2koffice.htm
18Representation of Information
- Data is collections of bits
- physical database
- Information is data with meaning
- logical database
- Representation of meta-data
- database system is self-describing
- Database Management System (DBMS)
- define information content
- construct database
- manipulate by queries, reports and updates
- data plus software
19Management of Complexity
- Insulation between programs and data
- Program-data independence
- Program-operation independence
- Data abstraction
- conceptual model for users
- physical model for administrators
- Sharing data and multi-user transactions
- People
- Database administrators
- Database designers
- Applications programmers
- End users
20Management of Access and Security
- Controlling redundancy
- inconsistency and duplication
- Restricting unauthorized access
- Enforcing integrity constraints
- Providing backup and recovery
- Persistent storage for program objects
21Efficiency of Access and StorageCost of Access
for Seagate Cheetah Disk
- Seek time
- Move access arm to the cylinder
- Avg 6 msec, min 0.6 msec
- Rotational delay
- 1000 rpm, one revolution per 6 millisecond
- Average 3 msec
- Total latency max 12 msec, avg 9 msec
- Transfer rate
- 24 Mbytes/sec
- Speed of memory access, Athlon 750 mhz
- Latency lt100 nanosecond, 10,000 times faster than
disk - Transfer rate 1.6 GBytes/sec, 7 times faster than
disk
22Hierarchical Cost of Storage
- Registers and Cache are fixed size
- Primary storage, memory (RAM) limited by hardware
- 1000 Mbytes per CPU
- Secondary storage, disk, also limited by hardware
- 100 Gbytes per CPU
- Tertiary storage, tape, etc. limited by storage
volume
23Vocabulary
- Glossary of terms
- Define the terms as used in this subject
- Database literature is filled with terms
- Example of terms
- Data, bits
- Information, bits with meaning (type)
- Entity
- Schema
24Client-server computing
- Examples from web sites
- New York Times
- Pricewatch
- Industry movement
25What is a Database?
- Database is a collection of data
- data is known facts with implicit meaning
- database is logically coherent, organized.
- database is designed, built and populated for a
specific purpose. - Database management system
- collection of programs which support creation and
maintenance of databases.
26Time Line for Database Systems
- before 1960 transition from punched card and tape
- 1960s, from file management to databases
- Bachman (GE) IDS and data structure diagrams
- IMS from IBM, Hierarchical Data Model
- IMS DB/DC, Network Model and communication
- SABRE, multi-user access with network
- 1970s, CODASYL and Relational Model
- Codd (IBM) Relational Model
- Chen introduced Entity Relationship Model
- Query languages developed (SQL)
- 1980s, Client/Server DBs, Oracle, DB2
- PC databases, DBase, Paradox, etc.
- SQL standard for definition and manipulation
- 1990s, web-based information delivery
- Trends expert DBs, object DBs, distributed DBs
27Concepts and Architecture
- Data Model is a set of concepts that can be used
to describe the structure of a database - data types, relationships, and constraints
- basic operations, for retrieval and updates
- user-defined operations, behavior
- 3 types of data models
- High level or conceptual model
- entities,attributes, and relationships
- low-level or physical model
- record formats, indexes and access paths
- representational or implementation model
- record structures or object models
28Data Modeling
- A data model is a specification of the
information content of a system - conceptual data model describes information in
terms the users will understand - logical data model describes information in a way
that can be used to build a database - physical data model describes information in
terms of its representation in physical storage
29Schemas and Instances
- Schema is the structure of a database
- intention or meaning of the data
- data models are schemas
- table definitions are schemas
- class definitions are schemas
- Instances are the contents of a database
- extension or values of the data
- objects are instances
- objects in a database are typically rows in a
table
30Levels of database schemas
- Different schemas are presented to different users
31Data Independence
- Logical data independence
- Change in conceptual schema does not require
change in external schemas - Expand or contract database with no change to
external applications - View mappings must be changed
- Physical data independence
- Change in internal schema does require change in
conceptual schema - Reorganize the file and index structure,
especially for improved performance - Conceptual mapping must be changed
32Database Languages
- DDL, data definition language, conceptual schema
- describe conceptual schemas
- SDL, storage definition language, internal schema
- describe file structures, indexes
- VDL, view definition language, external schema
- DML, data manipulation language
- High-level or non-procedural (e.g. SQL)
- Select Last Name from Roster where Section 2
- Low-level or procedural
- For r in Roster loop if r.section 2
then result.Add ( r.lastname )
33Information Engineering
Planning
Analysis
Design
Implementation
34Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
35Stages in Database Design
- Requirements formulation and analysis
- Conceptual Design -- Conceptual Model
- Implementation Design -- Logical Model
- Physical Design --Physical Model
36Database Design Process
- Requirements formulation and analysis
- Purpose Identify and describe the data that are
used by the organization - Results Metadata identified, Data Dictionary,
Conceptual Model-- ER diagram
37Database Design Process
- Requirements Formulation and analysis
- Systems Analysis Process
- Examine all of the information sources used in
existing applications - Identify the characteristics of each data element
- numeric
- text
- date/time
- etc.
- Examine the tasks carried out using the
information - Examine results or reports created using the
information
38Database Design Process
- Conceptual Model
- Merge the collective needs of all applications
- Determine what Entities are being used
- Some object about which information is to
maintained - What are the Attributes of those entities?
- Properties or characteristics of the entity
- What attributes uniquely identify the entity
- What are the Relationships between entities
- How the entities interact with each other?
39Database Design Process
- Logical Model
- How is each entity and relationship represented
in the Data Model of the DBMS - Hierarchic?
- Network?
- Relational?
- Object-Oriented?
40Database Design Process
- Physical (AKA Internal) Model
- Choices of index file structure
- Choices of data storage formats
- Choices of disk layout
41Chapter 2, Representing Information with Data
Models
- Entity Relationship (ER) Model
- high-level, conceptual data model
- Specify conceptual schema
- conceptual database design
- Identify the data requirements of users and
detailed descriptions of data types,
relationships and constraints. - Concentrate on specifying the properties of the
data, not storage.
42An Example of ER Modeling
- Company database
- Department
- name, number, manager (employee), start date of
manager - Projects controlled by department
- name, number, single location
- Employees
- name, ssn, address, salary, sex, birthdate
- assigned to department, several projects
- Dependents of employees
43Principals of ER Modeling
- Entities and classes
- Entity, a thing in the real world
- Entity Class, the structure of a collection of
similar entities - Attributes
- Attribute, a property of an entity
- Each entity has a value for each of its
attributes - Types of attributes
- simple vs. composite, single-valued vs.
multi-valued, stored vs. derived - domains of attributes
44Relationships Between Entities
- Relationship type defines a set of associations
among given types. - Relationsip Instances are particular
relationships among objects. - Examples of relationship types in company
database - Manages 11 between employee and department
- Works-for 1N between department and employee
- Controls 1N between department and project
45Relationships, Roles, and Structural Constraints
- Roles are attributes that signify the function of
a particular entity (type) in a relationship - Employee manages department
- Department is managed by employee
- Employee works-for department
- Department has employees who work for it
- Constraints can be
- cardinality
- Each department can have no more than one manager
- participation
- Each department must have a manager
46ER schema diagram for Company
47Entity Classes for BigHit Video
48Sample Attribute Specifications
49Entity Classes, Attributes and Constraints
50Entities, instances of classes
51Relationships Between Entities
- Relationship type defines a set of associations
among given types. - Relationship Instances are particular
relationships among objects. - Examples of relationship types in company
database - Manages 11 between employee and department
- Works-for 1N between department and employee
- Controls 1N between department and project
52Relationships, Roles, and Structural Constraints
- Roles are attributes that signify the function of
a particular entity (type) in a relationship - Employee manages department
- Employee works-for department
- Constraints can be
- cardinality
- Each department can have no more than one manager
- participation
- Each department must have a manager
53Relationship Types and Instances
- Marriage relationship type
- Person related to Person
- One person has the role of wife one has the
role of husband - Relationship type may have one or more attributes
- e.g. weddingDate
- Marriage relationship (instance)
- Jane Block is married to Joe Block (relationship)
- Jane Block is the wife of Joe Block (role)
- Joe Block is the husband of Jane Block (role)
- Parent-child relationship type
- A person may have zero or more children
54Relationships are always one-to-one
- A relationship is an instance
- These pictures are sets of instances
55Find the Entities, Attributes and Relationships
56ER schema diagram for BigHit Video
57Chapter 4 Relational Model
- Structure of Relational Databases
Posted on Sun, Apr. 20, 2003 IBM database
developer dead at 79 RELATIONAL' MODEL IS
BASIS OF TODAY'S TRANSACTIONS By Lisa M.
Krieger Mercury News
58Edgar F. Codd, an IBM computer pioneer who
created the relational database model'' that
underlies a 7 billion industry of storing the
world's online business data, died of heart
failure at home Friday in Williams Island, Fla.
He was 79. Bank accounts, credit cards, stock
trading, travel reservations, online auctions and
innumerable other now-routine data transactions
all rely on Codd's model, based on highly
abstract and complex mathematical theory. Before
Codd's landmark research paper in 1970, it was
possible to store lots of information -- but
analyzing it was difficult, requiring lines and
lines of code for even simple tasks.
59His model made it possible to access large
amounts of data from small computers, giving
businesses and government agencies something they
desperately needed quick and easy access to
information. He had a vision about data that
was considered radical at the time,'' said
computer scientist Don Chamberlin, also of IBM.
Larry Ellison of Oracle used Codd's model to
build the first commercially available relational
database management system. As complex and
abstract as the math he loved, over the decades
Codd retained his British accent, his dry wit and
his love of a strong cup of tea, say family
members. Codd was the youngest of seven children
born to a leather processor and his schoolteacher
wife in the remote town of Portland, England. He
attended Oxford University on a full scholarship,
earning degrees in math and chemistry. Although
eligible for a military deferment because of his
studies, he chose to fly in the Royal Air Force.
60Codd first came to the United States in 1948, at
the age of 25. He found work with IBM as a
programming mathematician for an early
proto-computer that filled two floors of a
Manhattan office building. In 1953, Codd moved to
Canada, frustrated that no one insisted that Sen.
Joseph McCarthy produce proof of his charges that
Communists were embedded in the U.S.
government. He later returned and became a U.S.
citizen. In 1965, he earned a doctorate from the
University of Michigan in Ann Arbor. A
disappointing job rating from his supervisor in
Poughkeepsie, N.Y., spurred Codd to transfer to
IBM's Santa Teresa development laboratories in
San Jose. There he found existing data management
systems seat-of-the-pants, with no theory at
all,'' he recalled in one interview. I began
reading documentation,'' Codd said, and I was
disgusted.''
61He proposed a solution that leaned heavily on
mathematical logic the relational model. He
believed that all the information in a database
should be represented as values in the rows and
columns of tables, and that no information should
be represented by pointers or connections among
records. But support for the traditional database
system within IBM was large, powerful and
antagonistic. It was at a meeting of a high-level
IBM technical committee that the relational model
caught the attention of IBM chairman Frank Cary.
IBM subsequently announced SQL/DS, its first
relational product, in 1981. DB2, for larger MVS
machines, was announced in 1983. When he put
two and two together, he didn't think about what
they added up to, but what they meant,'' said son
Ronald Codd, 47, of Alamo. He had this natural
ability to see a situation and reach a conclusion
that was a step beyond what people would
ordinarily think
62Codd's life changed in 1983, when he suffered a
serious injury from a fall. After his recovery,
he retired from IBM and quit his beloved hobby of
recreational flying. But he continued to work
until 1999, commuting to his San Jose office at
Codd and Date Consulting Group, joined by
longtime IBM collaborator Chris Date and
mathematician Sharon Weinberg, another IBM
colleague, who after 12 years of courtship became
Codd's second wife. Edgar F. Codd Born Aug. 23,
1923, in Portland, England Died April 18, 2003,
in Williams Island, Fla.
63An Appreciation by C. J. Date
Ted Codd was a genuine computing pioneer. He was
also an inspiration to all of us who had the
fortune to know him and work with him. He began
his career in 1949 as a programming mathematician
for IBM on the Selective Sequence Electronic
Calculator. He subsequently participated in the
development of several important IBM products,
including its first commercial electronic
computer (IBM 701) and the STRETCH machine, which
led to IBM's 7090 mainframe technology. Then, in
the 1960's, he turned his attention to the
problem of managing large commercial databases
and over the next few years he created, single
handed, the invention with which his name will
forever be associated the relational model of
data.
64The relational model is widely recognized as one
of the great technical innovations of the 20th
century. Codd described it and explored its
implications in a series of research papers
staggering in their originality--which he
published throughout the period 1969-1979. The
effect of those papers was twofold They changed
for good the way the IT world (including the
academic component f that world in particular)
perceived the database management problem and
they laid the foundation for an entire new
industry, the relational database industry, now
worth many billions of dollars a year. In fact,
not only did Codd's relational model set the
entire discipline of database management on a
solid scientific footing, it also formed the
basis for a technology that has had, and
continues to have, a major impact on the very
fabric of our society. It is no exaggeration to
say that Ted Codd is the intellectual father of
the modern database field.
65Codd's supreme achievement with the relational
model should not be allowed to eclipse the fact
that he made major original contributions in
several other important areas as well, including
multiprogramming, natural language processing,
and more recently Enterprise Delta (a relational
approach to business rules management), for which
he and his wife were granted a US patent. The
depth and breadth of his contributions were
recognized by the long list of honors and elected
positions that were conferred on him during his
lifetime, including IBM Fellow elected ACM
Fellow elected Fellow of the Britain Computer
Society elected member of the National Academy
of Engineering and elected member of the
American Academy of Arts and Sciences. In 1981 he
received the ACM Turing Award, the most
prestigious award in the field of computer
science. He also received an outstanding
recognition award from IEEE the very first
annual Achievement Award from the international
DB2 Users Group and another annual achievement
award from DAMA in 2001. Computerworld, in
celebration of the 25th anniversary of its
publication, selected him as one of 25
individuals in or related to the field of
computing who have had the most effect on our
society. And Forbes magazine, which in December
2002 published a list of the most important
innovations and contributions for each of the 85
years of its existence, selected for the year
1970 the relational model of data, by E. F. Codd.
66Ted Codd was a native of England and a Royal Air
Force veteran of World War II. He moved to the
United States in 1946 and became a naturalized US
citizen. He held MA degrees in mathematics and
chemistry from Oxford University and MS and PhD
degrees in communication sciences from the
University of Michigan. He is survived by his
wife Sharon and her parents, Sol and Nora Boroff,
of Williams Island, Florida a brother David Codd
and his wife, Barbara and a sister, Katherine
Codd, all of England and a second sister, Lucy
Pickard, of Hamilton, Ontario. He also leaves
four children and their families Katherine Codd
Clark, her husband Lawrence, and their daughters,
Shannon and Allison, of Palo Alto, California
Ronald E. F. Codd, his wife Susie, and their son,
Ryan and daughter, Alexis, of Alamo, California
Frank Codd and his wife, Aydes of Castro Valley,
CA and David Codd, his wife Ileana, and their
daughter Melissa and son, Andrew, of Boca Raton,
Florida. He also leaves nieces and nephews in
England, Canada, and Australia, as well as many,
many friends and colleagues worldwide. He is
mourned and greatly missed by all.
67Example of a Relation
68Basic Structure
- Formally, given sets D1, D2, . Dn a relation r
is a subset of D1 x D2 x x DnThus a
relation is a set of n-tuples (a1, a2, , an)
where each ai ? Di - Example if
- customer-name Jones, Smith, Curry,
Lindsay customer-street Main, North,
Park customer-city Harrison, Rye,
PittsfieldThen r (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) is a relation over
customer-name x customer-street x customer-city
69Attribute Types
- Each attribute of a relation has a name
- The set of allowed values for each attribute is
called the domain of the attribute - Attribute values are (normally) required to be
atomic, that is, indivisible - E.g. multivalued attribute values are not atomic
- E.g. composite attribute values are not atomic
- The special value null is a member of every
domain - The null value causes complications in the
definition of many operations - we shall ignore the effect of null values in our
main presentation and consider their effect later
70Relation Schema
- A1, A2, , An are attributes
- R (A1, A2, , An ) is a relation schema
- E.g. Customer-schema
(customer-name, customer-street, customer-city) - r(R) is a relation on the relation schema R
- E.g. customer (Customer-schema)
71Relation Instance
- The current values (relation instance) of a
relation are specified by a table - An element t of r is a tuple, represented by a
row in a table
attributes (or columns)
customer-name
customer-street
customer-city
Jones Smith Curry Lindsay
Main North North Park
Harrison Rye Rye Pittsfield
tuples (or rows)
customer
72Relations are Unordered
- Order of tuples is irrelevant (tuples may be
stored in an arbitrary order) - E.g. account relation with unordered tuples
73Database
- A database consists of multiple relations
- Information about an enterprise is broken up into
parts, with each relation storing one part of the
information E.g. account stores
information about accounts
depositor stores information about which
customer
owns which account customer
stores information about customers - Storing all information as a single relation such
as bank(account-number, balance,
customer-name, ..)results in - repetition of information (e.g. two customers own
an account) - the need for null values (e.g. represent a
customer without an account) - Normalization theory (Chapter 7) deals with how
to design relational schemas
74The customer Relation
75The depositor Relation