L01: Course Overview - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

L01: Course Overview

Description:

L01: Course Overview & Review -- 2. The Teaching Staff. Instructor: ... Smaller packages MySQL, PostgresSQL. H.Lu/HKUST. L01: Course Overview & Review -- 19 ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 57
Provided by: Lu47
Category:

less

Transcript and Presenter's Notes

Title: L01: Course Overview


1
L01 Course Overview Review
  • Course overview
  • Review -- RDBMS
  • Review computer networks

2
The Teaching Staff
  • Instructor Lu Hongjun
  • Office 3543 (Lift 25-26), HKUST
  • E-Mail luhj_at_cs.ust.hk
  • URL http//www.cs.ust.hk/luhj
  • Research Interests
  • Data/Knowledge base management with emphasis on
    query processing and optimization
  • Data warehousing and data mining
  • Applied performance evaluation
  • Database application development
  • Parallel and distributed database systems
  • TA
  • Name Lou Wenwu
  • Office 4212 (DB Lab) HKUST
  • E-Mail wwlou_at_cs.ust.hk
  • URL http//ihome.ust.hk/wwlou

3
References
  • Principles of Distributed Database Systems, M.
    Tame Özsu Patrick Valduriez, Prentice-Hall,
    1999, 2nd ed.

4
Course Contents
  • Distributed DBMS architecture
  • Distributed database design
  • Distributed query processing optimization
  • Distributed transaction management
  • Parallel database systems
  • Other related issues
  • Course Web Page http//course.cs.ust.hk/comp334/

5
Grading
  • Class participation/assignment
  • Written assignment
  • Mid-term exam
  • Final exam
  • Course project (50 )
  • Carried in teams of two to four
  • A distributed database applications on Oracle
  • Submitted and graded in phases

6
Summary
  • It is a UG-level DB course
  • Not a DBA course
  • Not an introductory database course
  • Not a programming course, but you need to know
    how to write programs
  • Hopefully, you will leave with
  • A good grade
  • A good understanding of studied topics

7
L01 Course Overview Review
  • Course overview
  • Review -- RDBMS
  • Review computer networks

8
Review -- RDBMS
  • Relational database systems
  • The basic concepts in database systems
  • Relational data model
  • Relational languages
  • Database design
  • Previous course conceptual and logic design
  • This course physical database design
  • Database management systems
  • The basic components of DBMS
  • Storage management
  • Transaction management
  • Query processing optimization

9
What Is Database DBMS?
  • Database a very large, integrated, persistent
    collection of data.
  • Models real-world enterprise.
  • Entities (e.g., students, courses)
  • Relationships (e.g., James is taking CSIT530)
  • A Database Management System (DBMS) is a software
    package designed to store and manage databases.

10
Data Models
  • A data model is a collection of concepts for
    describing
  • data and related operations,
  • semantics of data,
  • relationship among data, and
  • constraints on data
  • Two types of data models
  • Conceptual models emphasize semantics of data
  • Entity-Relationship model, Object-Oriented model
  • Logical models ways how the data is organized in
    the logical level
  • Hierarchical model, Network model, Relational
    model

11
Instances and Schemas
  • A schema is a description of a particular
    collection of data, using a given data model -
    the logical structure of the database (e.g., set
    of customers and accounts and the relationship
    between them)
  • Schema Instance - the actual content of the
    database at a particular point in time
  • Similar to types and variables in programming
    languages

12
Levels of Abstraction
  • ANSI-SPARC three-level architecture
  • Many views, single conceptual (logical) schema
    and physical schema.
  • Views describe how users see the data.
  • Conceptual schema defines logical structure
  • Physical schema describes the files and indexes
    used.

View
View
View
Conceptual Schema
Physical Schema
13
Data Independence
  • Applications insulated from how data is
    structured and stored.
  • Ability to modify a schema definition in one
    level without affecting a schema definition in
    the next higher level.
  • The interfaces between the various levels and
    components should be well defined so that changes
    in some parts do not seriously influence others.
  • Logical data independence Protection from
    changes in logical structure of data.
  • Physical data independence Protection from
    changes in physical structure of data.

14
Database Environment
Specifies enforces
Database Administrator
System administrator
Database Designer
manages
designs
Hardware
DBMS
Application Programs
use
write
Data
designs
15
DBMS Related Languages
  • Data Definition Language (DDL)
  • Specification notation for defining the database
    schema
  • Data storage and definition language - special
    type of DDL in which the storage structure and
    access methods used by the database system are
    specified
  • Data Manipulation Language (DML)
  • Language for accessing and manipulation the data
    organized by the appropriate data model
  • Two classes of languages
  • Procedural - user specifies what data is required
    and how to get those data.
  • Nonprocedural - user specifies what data is
    required without specifying how to get those data

16
DBMS Related Languages
Host Language
Programming Language for DBMS Applications
DDL
Data Sublanguage
Procedural
DML
Non-Procedural
Query Language
17
Evolution of Database Technology
  • 1960s Hierarchical (IMS) network (CODASYL)
    DBMS.
  • 1970s Relational data model, relational DBMS
    implementation.
  • 1980 RDBMS rules the earth
  • 1985- Advanced data models (extended-relational,
    OO, deductive, etc.)
  • Application-oriented DBMS (spatial,
    scientific, engineering, etc.).
  • 1990s ORDB, OLAP, Data mining, data warehousing,
    multimedia databases, and
    network databases.

18
What is an RDBMS
  • A piece of software that manages data based on
    the relational model
  • Relational data, SQL queries
  • Commercial products
  • Oracle, IBM DB2, IBM Informix, Sybase, Microsoft
    SQL Server
  • Each has 10 million lines of C/C code
  • Smaller packages MySQL, PostgresSQL

19
Relational Data Model
  • Main concept relation
  • A table with rows and columns
  • Every relation has a schema
  • Description of the columns, or fields
  • Relational data rows in a table
  • No order among the rows in a table
  • The most widely used data model!

20
University Database
  • Conceptual schema
  • Students (sid string, name string, login
    string, age integer, gpareal)
  • Cardinality 3, degree 5 , all rows distinct
  • Courses (cid string, cnamestring,
    creditsinteger)
  • Enrolled (sidstring, cidstring, gradestring)

21
Relational Languages
  • Formal languages
  • Relational algebra
  • Relational calculus
  • Commercial language SQL
  • DDL (Data Definition Language)
  • Create Table, Create Index, Create View
  • DML (Data Manipulation Language)
  • Queries
  • Select
  • Updates
  • Insert, Delete, Update

22
Creating Tables
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER,
gpa REAL)
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2))
23
Primary Key Constraints
  • A set of fields is a key for a relation if
  • 1. Any two distinct tuples differ in some fields
    of the set, and
  • 2. This is not true for any subset of the set.
  • A superkey Condition 1 true and 2 false.
  • E.g., sid is a key for Students. sid, gpa is a
    superkey.
  • One primary key can be set per relation.

24
Primary and Candidate Keys
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER,
gpa REAL, PRIMARY KEY (sid), UNIQUE (login))
CREATE TABLE Enrolled (sid CHAR(20) cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid))
25
Foreign Key Constraints
  • Foreign key a set of fields in a relation
  • Refers to the primary key of another relation
  • Referential integrity
  • No dangling references

CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES
Students )
Enrolled
Students
26
Integrity Constraints (ICs)
  • IC condition that must be true for any db
    instance
  • Domain constraints
  • Primary constraints
  • Foreign key constraints
  • ICs are specified when a schema is defined.
  • ICs are checked when relations are modified.
  • A legal instance of a relation
  • Satisfies all specified ICs

27
Adding and Deleting Tuples
INSERT INTO Students (sid, name, login, age,
gpa) VALUES (53688, Smith, smith_at_ee, 18, 3.2)
DELETE FROM Students S WHERE S.name Smith
28
Queries
SELECT FROM Students S WHERE S.sid 53688
29
Querying Multiple Tables
SELECT S.name, E.cid FROM Students S, Enrolled
E WHERE S.sidE.sid AND E.gradeA
Enrolled
Students
30
Functional Components of DBMS
User/Application
Database Administrator
Security Control
Transaction Manager
DML Stmt.
DDL Command
Transaction Management
Query Processing Optimization
DDL Compiler
Concurrency Control
Recovery
Query Plan
Execution Engine
Query Processing
Lock Table
Log
Buffer
Index/file/record Management
Buffer Management
Storage Manager
Statistics Metadata
Indexes User data
Storage Management
31
Query Optimization
  • A major strength of RDBMS
  • SQL queries are declarative
  • Optimizer figures out how to answer them
  • Re-order operations
  • Pick among alternatives of one operation
  • Ensure that the answer is correct!

32
Transaction
  • A key concept in databases
  • An atomic sequence of actions (read/write)
  • Brings DB from a consistent state to another
  • ACID
  • Atomicity
  • Consistency
  • Isolation
  • Durability

33
Concurrency Control Recovery
  • Concurrency Control
  • Essential for good DBMS performance
  • Run several user programs concurrently
  • Interleave actions of different users
  • Ensure the correctness
  • Users may think it is a single-user system.
  • Recovery
  • Essential for durability of transactions

34
RDBMS Features
  • Effective and efficient access
  • Easier application development
  • Data independence
  • Data integrity and security
  • Concurrent access
  • Recovery from crashes
  • Uniform data administration

35
Summary
  • DBMS used to maintain, query large datasets.
  • Benefits include recovery from system crashes,
    concurrent access, quick application development,
    data integrity and security.
  • Levels of abstraction give data independence.
  • A DBMS typically has a layered architecture.
  • DBAs hold responsible jobs
  • and are well-paid!
  • DBMS RD is one of the broadest,
  • most exciting areas in CS.

36
L01 Course Overview Review
  • Course overview
  • Review -- RDBMS
  • Review computer networks

37
What is a Computer Network
  • An interconnected collection of computers
  • Autonomous, capable of communicating
  • Components
  • Nodes/Hosts/Servers/Sites, Switches, Routers
  • Links / Wires / Channels / Circuits
  • Examples
  • Internet, Campus LAN, P2P Networks, MMOG, Sensor
    Networks,

38
Data Communication Concepts
  • Data, Signal (Encoding of data), Signaling
  • Channels/Links
  • Capacity/Bandwidth, Simplex/Duplex
  • Frames/Packets

Src, Dest, msg, packet, ack, control info
39
Classifying Types of Networks
  • By topology (interconnection structure)
  • Star, ring, bus, meshed, irregular
  • By mode of transmission
  • Point-to-point (unicast), broadcast (multicast)
  • By geographic distribution (scale)
  • Local Area Networks, Wide Area Networks

40
Network Topology - Star
  • One central site
  • Disadvantages
  • Unreliability
  • Unbalanced load

Satellite Node 2
Central Node
Satellite Node 3
Satellite Node 1
Satellite Node 4
41
Network Topology - Ring
  • Connected as a loop
  • Usually unidirectional
  • Use a control token
  • Improving unreliability
  • Bidirectional ring
  • Central switch

42
Network Topology - Bus
Node 1
Node 2
  • A common shared link
  • Link control schemes
  • CSMA (Carrier Sense Multiple Access)
  • CSMA/CD (CSMA w/ Collision Detection)
  • Token

Node 4
Node 3
43
Network Topology - Mesh
  • Fully connected
  • Many links
  • High reliability

44
Network Topology - Irregular
  • Partially connected
  • Example Internet

45
Communication Schemes
  • Point-to-point
  • One or more (indirect) links between each pair of
    nodes
  • Communication is always between two nodes
  • Switching forwards a message to its next
    immediate
  • Broadcast
  • A common communication channel utilized by all
  • Messages transmitted to all the nodes at the same
    time
  • Multicasting (a special case of broadcasting)
  • Messages transmitted to a subset of nodes

46
Local Area Networks
  • Covering small geographical areas
  • High bandwidth communication links
  • Variants of bus or ring topologies

47
Wide Area Networks
  • Wide geographical area
  • Heterogeneous
  • Slower than LANs
  • More error prone

Figure taken from Kurose Book slides
48
ISO/OSI Reference Architecture
  • ISO - International Standards Organization
  • OSI - Open Systems Interconnections
  • Seven layers of protocol stack
  • Nodes talk at the same layer

49
Five-Layer Protocol Stack
  • Application ftp, smtp, http
  • Transport tcp, udp
  • Network ip, routing protocols
  • link ppp, ethernet
  • physical bits on the wire

50
Data Passing Through Layers
Figure taken from Kurose Book Slides
51
Header Changes Across Layers
  • Each layer takes data from above
  • Adds header information to create new data unit
  • Passes new data unit to layer below

source
destination
message
segment
datagram
frame
Figure taken from Kurose Book Slides
52
Broadband Networks
  • High capacity
  • Ability of carrying multiple data streams
  • Possibility of negotiating for a level of QoS
  • ATM (Asynchronous Transfer Mode)
  • A popular broadband network technology

53
Wireless Networks
  • Cellular networks in mobile computing
  • A wired backbone of control stations
  • Cells of star topology around control stations
  • Different cells talk through control stations
  • Handover required when cell boundary crossed
  • Prone to disconnection, noise, low bandwidth
  • Mobility causes volatility of data
  • Portability may limit storage and op duration

54
Internet Nuts and Bolts View
  • Protocols control sending, receiving of msgs
  • e.g., TCP, IP, HTTP, FTP, PPP
  • Internet network of networks
  • Loosely hierarchical
  • Public Internet vs private intranet
  • Internet standards
  • RFC Request for comments
  • IETF Internet Engineering Task Force

Local ISP
Regional ISP
Company network
Figure taken from Kurose Book Slides
55
Internet a Service View
  • Communication infrastructure
  • enables distributed applications
  • WWW, email, games, e-commerce, databases, voting,
    file (MP3) sharing

Figure taken from Kurose Book Slides
56
Summary
  • Computer Networks connected computers
  • Autonomous, communicating to one another
  • Networks can be classified by
  • Topology, communication, or geographical area
  • Network protocols are built in layers
  • ISO/OSI and simpler five-layer models
  • Internet, cellular networks, sensor networks
Write a Comment
User Comments (0)
About PowerShow.com