Title: L01: Course Overview
1L01 Course Overview Review
- Course overview
- Review -- RDBMS
- Review computer networks
2The Teaching Staff
- Instructor Lu Hongjun
- Office 3543 (Lift 25-26), HKUST
- E-Mail luhj_at_cs.ust.hk
- URL http//www.cs.ust.hk/luhj
- Research Interests
- Data/Knowledge base management with emphasis on
query processing and optimization - Data warehousing and data mining
- Applied performance evaluation
- Database application development
- Parallel and distributed database systems
- TA
- Name Lou Wenwu
- Office 4212 (DB Lab) HKUST
- E-Mail wwlou_at_cs.ust.hk
- URL http//ihome.ust.hk/wwlou
3References
- Principles of Distributed Database Systems, M.
Tame Özsu Patrick Valduriez, Prentice-Hall,
1999, 2nd ed.
4Course Contents
- Distributed DBMS architecture
- Distributed database design
- Distributed query processing optimization
- Distributed transaction management
- Parallel database systems
- Other related issues
- Course Web Page http//course.cs.ust.hk/comp334/
5Grading
- Class participation/assignment
- Written assignment
- Mid-term exam
- Final exam
- Course project (50 )
- Carried in teams of two to four
- A distributed database applications on Oracle
- Submitted and graded in phases
6Summary
- It is a UG-level DB course
- Not a DBA course
- Not an introductory database course
- Not a programming course, but you need to know
how to write programs - Hopefully, you will leave with
- A good grade
- A good understanding of studied topics
7L01 Course Overview Review
- Course overview
- Review -- RDBMS
- Review computer networks
8Review -- RDBMS
- Relational database systems
- The basic concepts in database systems
- Relational data model
- Relational languages
- Database design
- Previous course conceptual and logic design
- This course physical database design
- Database management systems
- The basic components of DBMS
- Storage management
- Transaction management
- Query processing optimization
9What Is Database DBMS?
- Database a very large, integrated, persistent
collection of data. - Models real-world enterprise.
- Entities (e.g., students, courses)
- Relationships (e.g., James is taking CSIT530)
- A Database Management System (DBMS) is a software
package designed to store and manage databases.
10Data Models
- A data model is a collection of concepts for
describing - data and related operations,
- semantics of data,
- relationship among data, and
- constraints on data
- Two types of data models
- Conceptual models emphasize semantics of data
- Entity-Relationship model, Object-Oriented model
- Logical models ways how the data is organized in
the logical level - Hierarchical model, Network model, Relational
model
11Instances and Schemas
- A schema is a description of a particular
collection of data, using a given data model -
the logical structure of the database (e.g., set
of customers and accounts and the relationship
between them) - Schema Instance - the actual content of the
database at a particular point in time - Similar to types and variables in programming
languages
12Levels of Abstraction
- ANSI-SPARC three-level architecture
- Many views, single conceptual (logical) schema
and physical schema. - Views describe how users see the data.
- Conceptual schema defines logical structure
- Physical schema describes the files and indexes
used.
View
View
View
Conceptual Schema
Physical Schema
13Data Independence
- Applications insulated from how data is
structured and stored. - Ability to modify a schema definition in one
level without affecting a schema definition in
the next higher level. - The interfaces between the various levels and
components should be well defined so that changes
in some parts do not seriously influence others. - Logical data independence Protection from
changes in logical structure of data. - Physical data independence Protection from
changes in physical structure of data.
14Database Environment
Specifies enforces
Database Administrator
System administrator
Database Designer
manages
designs
Hardware
DBMS
Application Programs
use
write
Data
designs
15DBMS Related Languages
- Data Definition Language (DDL)
- Specification notation for defining the database
schema - Data storage and definition language - special
type of DDL in which the storage structure and
access methods used by the database system are
specified - Data Manipulation Language (DML)
- Language for accessing and manipulation the data
organized by the appropriate data model - Two classes of languages
- Procedural - user specifies what data is required
and how to get those data. - Nonprocedural - user specifies what data is
required without specifying how to get those data
16DBMS Related Languages
Host Language
Programming Language for DBMS Applications
DDL
Data Sublanguage
Procedural
DML
Non-Procedural
Query Language
17Evolution of Database Technology
- 1960s Hierarchical (IMS) network (CODASYL)
DBMS. - 1970s Relational data model, relational DBMS
implementation. - 1980 RDBMS rules the earth
- 1985- Advanced data models (extended-relational,
OO, deductive, etc.) - Application-oriented DBMS (spatial,
scientific, engineering, etc.). - 1990s ORDB, OLAP, Data mining, data warehousing,
multimedia databases, and
network databases.
18What is an RDBMS
- A piece of software that manages data based on
the relational model - Relational data, SQL queries
- Commercial products
- Oracle, IBM DB2, IBM Informix, Sybase, Microsoft
SQL Server - Each has 10 million lines of C/C code
- Smaller packages MySQL, PostgresSQL
19Relational Data Model
- Main concept relation
- A table with rows and columns
- Every relation has a schema
- Description of the columns, or fields
- Relational data rows in a table
- No order among the rows in a table
- The most widely used data model!
20University Database
- Conceptual schema
- Students (sid string, name string, login
string, age integer, gpareal) - Cardinality 3, degree 5 , all rows distinct
- Courses (cid string, cnamestring,
creditsinteger) - Enrolled (sidstring, cidstring, gradestring)
21Relational Languages
- Formal languages
- Relational algebra
- Relational calculus
- Commercial language SQL
- DDL (Data Definition Language)
- Create Table, Create Index, Create View
- DML (Data Manipulation Language)
- Queries
- Select
- Updates
- Insert, Delete, Update
22Creating Tables
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER,
gpa REAL)
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2))
23Primary Key Constraints
- A set of fields is a key for a relation if
- 1. Any two distinct tuples differ in some fields
of the set, and - 2. This is not true for any subset of the set.
- A superkey Condition 1 true and 2 false.
- E.g., sid is a key for Students. sid, gpa is a
superkey. - One primary key can be set per relation.
24Primary and Candidate Keys
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER,
gpa REAL, PRIMARY KEY (sid), UNIQUE (login))
CREATE TABLE Enrolled (sid CHAR(20) cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid))
25Foreign Key Constraints
- Foreign key a set of fields in a relation
- Refers to the primary key of another relation
- Referential integrity
- No dangling references
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES
Students )
Enrolled
Students
26Integrity Constraints (ICs)
- IC condition that must be true for any db
instance - Domain constraints
- Primary constraints
- Foreign key constraints
- ICs are specified when a schema is defined.
- ICs are checked when relations are modified.
- A legal instance of a relation
- Satisfies all specified ICs
27Adding and Deleting Tuples
INSERT INTO Students (sid, name, login, age,
gpa) VALUES (53688, Smith, smith_at_ee, 18, 3.2)
DELETE FROM Students S WHERE S.name Smith
28Queries
SELECT FROM Students S WHERE S.sid 53688
29 Querying Multiple Tables
SELECT S.name, E.cid FROM Students S, Enrolled
E WHERE S.sidE.sid AND E.gradeA
Enrolled
Students
30Functional Components of DBMS
User/Application
Database Administrator
Security Control
Transaction Manager
DML Stmt.
DDL Command
Transaction Management
Query Processing Optimization
DDL Compiler
Concurrency Control
Recovery
Query Plan
Execution Engine
Query Processing
Lock Table
Log
Buffer
Index/file/record Management
Buffer Management
Storage Manager
Statistics Metadata
Indexes User data
Storage Management
31Query Optimization
- A major strength of RDBMS
- SQL queries are declarative
- Optimizer figures out how to answer them
- Re-order operations
- Pick among alternatives of one operation
- Ensure that the answer is correct!
32Transaction
- A key concept in databases
- An atomic sequence of actions (read/write)
- Brings DB from a consistent state to another
- ACID
- Atomicity
- Consistency
- Isolation
- Durability
33Concurrency Control Recovery
- Concurrency Control
- Essential for good DBMS performance
- Run several user programs concurrently
- Interleave actions of different users
- Ensure the correctness
- Users may think it is a single-user system.
- Recovery
- Essential for durability of transactions
34RDBMS Features
- Effective and efficient access
- Easier application development
- Data independence
- Data integrity and security
- Concurrent access
- Recovery from crashes
- Uniform data administration
35Summary
- DBMS used to maintain, query large datasets.
- Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security. - Levels of abstraction give data independence.
- A DBMS typically has a layered architecture.
- DBAs hold responsible jobs
- and are well-paid!
- DBMS RD is one of the broadest,
- most exciting areas in CS.
36L01 Course Overview Review
- Course overview
- Review -- RDBMS
- Review computer networks
37What is a Computer Network
- An interconnected collection of computers
- Autonomous, capable of communicating
- Components
- Nodes/Hosts/Servers/Sites, Switches, Routers
- Links / Wires / Channels / Circuits
- Examples
- Internet, Campus LAN, P2P Networks, MMOG, Sensor
Networks,
38Data Communication Concepts
- Data, Signal (Encoding of data), Signaling
- Channels/Links
- Capacity/Bandwidth, Simplex/Duplex
- Frames/Packets
Src, Dest, msg, packet, ack, control info
39Classifying Types of Networks
- By topology (interconnection structure)
- Star, ring, bus, meshed, irregular
- By mode of transmission
- Point-to-point (unicast), broadcast (multicast)
- By geographic distribution (scale)
- Local Area Networks, Wide Area Networks
40Network Topology - Star
- One central site
- Disadvantages
- Unreliability
- Unbalanced load
Satellite Node 2
Central Node
Satellite Node 3
Satellite Node 1
Satellite Node 4
41Network Topology - Ring
- Connected as a loop
- Usually unidirectional
- Use a control token
- Improving unreliability
- Bidirectional ring
- Central switch
42Network Topology - Bus
Node 1
Node 2
- A common shared link
- Link control schemes
- CSMA (Carrier Sense Multiple Access)
- CSMA/CD (CSMA w/ Collision Detection)
- Token
Node 4
Node 3
43Network Topology - Mesh
- Fully connected
- Many links
- High reliability
44Network Topology - Irregular
- Partially connected
- Example Internet
45Communication Schemes
- Point-to-point
- One or more (indirect) links between each pair of
nodes - Communication is always between two nodes
- Switching forwards a message to its next
immediate - Broadcast
- A common communication channel utilized by all
- Messages transmitted to all the nodes at the same
time - Multicasting (a special case of broadcasting)
- Messages transmitted to a subset of nodes
46Local Area Networks
- Covering small geographical areas
- High bandwidth communication links
- Variants of bus or ring topologies
47Wide Area Networks
- Wide geographical area
- Heterogeneous
- Slower than LANs
- More error prone
Figure taken from Kurose Book slides
48ISO/OSI Reference Architecture
- ISO - International Standards Organization
- OSI - Open Systems Interconnections
- Seven layers of protocol stack
- Nodes talk at the same layer
49Five-Layer Protocol Stack
- Application ftp, smtp, http
- Transport tcp, udp
- Network ip, routing protocols
- link ppp, ethernet
- physical bits on the wire
50Data Passing Through Layers
Figure taken from Kurose Book Slides
51Header Changes Across Layers
- Each layer takes data from above
- Adds header information to create new data unit
- Passes new data unit to layer below
source
destination
message
segment
datagram
frame
Figure taken from Kurose Book Slides
52Broadband Networks
- High capacity
- Ability of carrying multiple data streams
- Possibility of negotiating for a level of QoS
- ATM (Asynchronous Transfer Mode)
- A popular broadband network technology
53Wireless Networks
- Cellular networks in mobile computing
- A wired backbone of control stations
- Cells of star topology around control stations
- Different cells talk through control stations
- Handover required when cell boundary crossed
- Prone to disconnection, noise, low bandwidth
- Mobility causes volatility of data
- Portability may limit storage and op duration
54Internet Nuts and Bolts View
- Protocols control sending, receiving of msgs
- e.g., TCP, IP, HTTP, FTP, PPP
- Internet network of networks
- Loosely hierarchical
- Public Internet vs private intranet
- Internet standards
- RFC Request for comments
- IETF Internet Engineering Task Force
Local ISP
Regional ISP
Company network
Figure taken from Kurose Book Slides
55Internet a Service View
- Communication infrastructure
- enables distributed applications
- WWW, email, games, e-commerce, databases, voting,
file (MP3) sharing
Figure taken from Kurose Book Slides
56Summary
- Computer Networks connected computers
- Autonomous, communicating to one another
- Networks can be classified by
- Topology, communication, or geographical area
- Network protocols are built in layers
- ISO/OSI and simpler five-layer models
- Internet, cellular networks, sensor networks