First 4 Weeks - PowerPoint PPT Presentation

About This Presentation
Title:

First 4 Weeks

Description:

... Language (DDL): used by DBA or database designer to define database schemas. ... Efficient algorithms for data collections that reside on disks (or which are ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 49
Provided by: Christo9
Learn more at: https://www2.cs.uh.edu
Category:
Tags: first | weeks

less

Transcript and Presenter's Notes

Title: First 4 Weeks


1
First 4 Weeks
  • Introduction to Databases
  • Course Information
  • Grading and Other Things
  • Questionnaire
  • The Relational Data Model
  • Relational Algebra / SQL Part1
  • The E/R Data Model
  • SQL Part2

Week1/2
Weeks 2-4
2
Textbooks for COSC 6340
  • Required Text Raghu Ramakrishnan and Johannes
    Gehrke, Data Management Systems, McGraw Hill,
    Third Edition, 2002
  • Other books with relevant material Ramez Elmasri
    and Shamkant Navathe, Fundamentals of Database
    Systems, Fourth Edition

3
Lectures in COSC 3480
  • Basic Concepts of Database Management (2 classes
    Chapter 1, 2.1,2.2, 2.3 instructor teaching
    material)
  • Introduction to the Relational Data Model (1.5
    classes Chapter 3.1, 3.2, 3.3, 3.4, 3.6)
  • Introduction to the Relational Algebra and SQL (3
    classes Chapter 4.2, Chapter 5)
  • Conceptual Schema Design using the Entity
    Relationship Data Model (2-3 classes instructor
    material Chapters 2.4, 2.5)
  • Relational Database Design and Normalization (2
    classes instructor material, Chapter 19)
  • Introduction to KDD and Data Warehousing (1.5
    classes instructor material, Chapters 25 and 26)
  • Disks, Files, Storage Structures, Index
    Structures and Physical Database Design (4
    classes, Chapter 8, 9, 10, 11.1, 11.2, 13, 20)
  • Spatial Data Management (1,5 classes, Chapter 28)
  • Internet Databases and XML (2.5 classes Chapter
    7 and 27)
  • Query Optimization (1 class Chapter 12, only if
    enough time left)
  • Summary Where Do We Stand? (1 class instructor
    material)

4
Other News
  • The lab will start on Th, January 26, 2006,
    830a. More details about the lab will be
    discussed next week.
  • There will be three exams in Spring 2006 Tu.,
    Feb. 28, Th., April 6, and ??, May ??.
  • All important information about the course can be
    found in the webpage associated with this course
    http//www2.cs.uh.edu/ceick/3480.html
  • Please inspect the webpage regularly.
  • The webpage is an evolving information document.
    Most of the information still refers to the Fall
    2005 teaching of the course. The webpage will be
    updated during the course of the semester!

5
UH Data Mining and Machine Learning Group
(UH-DMML) Christoph F. Eick and Ricardo Vilalta
  • http//www.tlc2.uh.edu/dmmlg
  • Goal Development of data analysis and data
    mining techniques and the application of these
    techniques to challenging problems in physics,
    geology, astronomy, environmental sciences, and
    medicine.
  • Topics investigated
  • Meta Learning
  • Classification and Learning from Examples
  • Clustering
  • Distance Function Learning
  • Using Reinforcement Learning for Data Mining
  • Spatial Data Mining
  • Knowledge Discovery

6
Databases
  • Definition A database is a collection of data
    with the following properties
  • It represents certain aspect of the real-world.
  • Its data are logically related.
  • It is created for a specific purpose.

7
DBMS
  • Definition A database management system (DBMS)
    is a set of software that are used to define,
    store, manipulate and control the data in a
    database.
  • define --- define data types, structures and
    constraints.
  • store --- store data provide efficient access.
  • manipulate --- perform retrieval and update
    operations using a query language.
  • control --- control access to data.
  • Database System Database DBMS

8
A Brief History Note
  • Database technology has a history of about 40
    years.
  • Database technology has gone through several
    generations .
  • First Generation File systems, 50's -- 60's
  • A typical file system consists of a set of
    independent files, and a number of application
    programs
  • Definition A file stores a set of record (on a
    disk drive) all of which have the same format.

9
An Example File System
  • A banking system may have
  • files for customers, saving accounts and checking
    accounts
  • application programs to deposit and withdraw
    money, to find balance, etc.
  • different files are used for customers, saving
    and checking accounts

10
Problems of File Systems (1)
  • It is difficult to support new applications. Two
    existing application programs
  • (i) find customers who have a checking
    account
  • (ii) find customers who have a saving
    account
  • Need a new program to find the customers who
    have a checking account and a saving account.

11
Problems of File Systems (2)
  • It has no centralized control of all data.
  • Files are often created for a particular
    application.
  • Files are created and managed independently.

12
Problems of File Systems (3)
  • There often exists severe data redundancy and
    inconsistency.
  • Checking-Account Acct, Owner-name,
    Owner-SSN, Owner-Addr, Balance, ...
  • Saving-Account Acct, Owner-name, Owner-SSN,
    Owner-Addr, Balance, Interest,

13
Problems of File Systems (4)
  • It lacks concurrency control.
  • Concurrency control prevent mutual
    interference of concurrent requests.
  • Example (Airplane ticket reservation) Consider
    the situation when two customers are trying to
    book the only ticket left for a flight through
    two operators at about the same time.

14
Problems of File Systems (5)
  • Weak security
  • Can not provide multiple views of the same data
  • Lack isolation between program and data
  • Lack self-describing feature

15
Database History (Continued)
  • Second Generation Hierarchical database systems
    (HDBS), late 60's -- early 70's
  • Best known HDBS IMS (Information Management
    System of IBM).
  • One-to-many relationships between parent records
    and child records which can have different types.
  • Data are organized in trees
  • Records are connected by pointers.

16
An IMS Query
  • Query find all Binghamton University students
    whose major is computer science and whose GPA is
    higher than 3.5.
  • GU University (Name Binghamton
    University')
  • Department (Name Computer
    Science')
  • Student (GPA gt 3.5)
  • L1 GNP Student (GPA gt 3.5)
  • Goto L1

17
History of Database (Continued)
  • Third Generation Network database systems
    (NDBS), late 60's -- early 70's
  • Some commercial NDBSs IDS II (Honeywell), DMS II
    (UNISYS).
  • In NDBS, record types are organized into an
    acyclic graph.
  • Main problem with HDBS and NDBS difficult to
    use.

18
History of Database (Continued)
  • Fourth Generation Relational database systems
    (RDBS), early 70's -- now
  • Example relational DBSs Oracle 7, Sybase,
    Informax, DB2, Ingres, ...
  • In RDBS, data are organized into tables
    (relations).

19
History of Database (Continued)
  • Fifth Generation Object-oriented and
    Object-Relational database systems (OODBS), 80's
    -- now
  • Example OODBSs O2, Objectivity, ObjectStore,
    Versant,
  • Example ORDBSs Oracle 8, Informix, UniSQL/X.

20
Database Languages
  • Data Definition Language (DDL) used by DBA or
    database designer to define database schemas.
  • Data Manipulation Language (DML) used by
    database users to retrieve, insert, delete and
    update data in the database.
  • Query language The part of DML that is used to
    retrieve data.
  • Data Control Language (DCL) used by database
    owners and DBA to control the access of data.

21
Persons Involving DBS (1)
  • DBMS developers Those who design and implement
    DBMS software buffer manager, query processor,
    transaction manager, interface, ...
  • Database designers Those who are responsible for
    determining
  • what data should be stored in the database
  • how data in the database should be organized
  • the design of customized views
  • the design of special data structures to improve
    the performance of the system.

22
Persons Involving DBS (2)
  • Database administrator (DBA) Those who manage
    and monitor the daily operation of a database
    system.
  • authorization for database access, e.g., who can
    access what data in what mode.
  • routine maintenance backup, install new tools,
    ...
  • modification to existing database design.

23
Persons Involving DBS (3)
  • End-users
  • Casual users those who access the database using
    SQL directly.
  • Naive users those who access the database using
    pre-prepared packages.
  • Application programmers Those who write menu
    applications for naive users, typically, through
    database calls embedded in a program.

24
After This Course, You Will Be
  • familiar with the relational data model
  • a decent database designer
  • a sophisticated casual user
  • a good application programmer
  • knowledgeable with major aspects on how to use a
    DBMS
  • knowledge in a few advanced topics with respect
    to database systems
  • This course will not teach you to become a
    database administrator
  • The implementation of DBMS will only be partially
    covered.

25
Popular Topics in Databases
  • Efficient algorithms for data collections that
    reside on disks (or which are distributed over
    multiple disk drives, multiple computers or over
    the internet).
  • Study of data models (knowledge representation,
    mappings, theoretical properties)
  • Algorithms to run a large number of transactions
    on a database in parallel finding efficient
    implementation for queries that access large
    databases database backup and recovery,
  • Database design
  • How to use database management systems as an
    application programmer / end user.
  • How to use database management systems as
    database administrator
  • How to implement database management systems
  • Data summarization, knowledge discovery, and data
    mining
  • Special purpose databases (genomic, geographical,
    internet,)

26
Review Why are integrated databases popular?
Bookkeeping Device
Integrated Database
Car Salesman
27
Review Why are integrated databases popular?
  • Avoidance of uncontrolled redundancy
  • Making knowledge accessible that would otherwise
    not be accessible
  • Standardization --- uniform representation of
    data facilitating import and export
  • Reduction of software development (though the
    availability of data management systems)
  • Support for Parallel Access and Data Security

Bookkeeping Device
Integrated Database
Car Salesman
28
Data Model
Data Model
is used to define
Schema (defines a set of database states)
Current Database State
29
Schema for the Library Example using the E/R
Data Model
author
B
Book
(0,35)
(0,1)
Many-to-Many
1-to-1
1-to Many
Many-to-1
30
Relational Schema for Library Example in SQL/92
CREATE TABLE Book (B INTEGER, title
CHAR(30), author CHAR(20), PRIMARY KEY
(B))
CREATE TABLE Person (ssn CHAR(9), name
CHAR(30), phone INTEGER, PRIMARY KEY
(ssn))
CREATE TABLE Checkout( book INTEGER,
person CHAR(9), since DATE, PRIMARY KEY
(book), FOREIGN KEY (book) REFERENCES Book,
FOREIGN KEY (person) REFERENCES Person))
31
Example Instances
  • Person(name, ssn, phone) (Eick,111111111.33345),
    (Miller, 222222222,33337)
  • Book(B,title, author) (1,Today,Yu), (2, Today,
    Yu), (7, Blue, Xu)
  • Checkout(book,person,since) (2,222222222.8/8/05),
    (7,222222222,8/8/05)

32
Referential Integrity in SQL/92
  • SQL/92 supports all 4 options on deletes and
    updates.
  • Default is NO ACTION (delete/update is
    rejected)
  • CASCADE (also delete all tuples that refer to
    deleted tuple)
  • SET NULL / SET DEFAULT (sets foreign key value
    of referencing tuple)

CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2), PRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES
Students ON DELETE CASCADE ON UPDATE SET
DEFAULT )
33
Example of an Internal Schemafor the Library
Example
  • INTERNAL Schema Library12 references Library.
  • Book is stored sequentially,
  • index on B using hashing,
  • index on Author using hashing.
  • Person is stored using hashing on ssn.
  • Check_out is stored sequentially,
  • index on since using B-tree.

34
Example Stored Database
Index on B Block B mod 10
Index on Author
Relation Book
1 11 51
20 30
0
W,
(1, C,W)
(20, Y,W) (51, C, B)
1
(11, Y,W) (30, Z, B)
Relation Checkout
(101,)
(200,) (500,)
0
Index on since
Relation Person Block sss mod 10
1
35
3 Schema Architecture
How users see the Data
External Schema
External Schema
External Schema
What the database contains and what constraints
hold with respect to the database
Conceptual Schema
How the data are physically stored
Internal Schema
36
Data Independence
  • Data Independence the ability to modify the
    lower level descriptions of a database without
    causing application programs to be rewritten.
  • Logical Data Independence the ability to modify
    the conceptual schema without causing application
    programs to be rewritten.
  • Physical Data Independence the ability to modify
    the internal schema without causing application
    programs to be rewritten.
  • Data independence is achieved through proper
    manipulation of the above two mappings.

37
Modern Relational DBMS
Support for Web-Interfaces, XML, and Data Exchange
Transaction Concepts capability of running
many transactions in parallel support
for backup and recovery.
Modern DBMS
Support for OO capability to store operations
Support for data- driven computing
Efficient Implementation of Queries (Query
Optimization, Join Selection Indexing
techniques)
Support for Data Mining operations
Support for OLAP and Data Warehousing
Support for special Data-types long
fields, images, html-links, DNA-sequences, spatial
information,
Support for higher level user interfaces graphica
l, natural language, form-based,
38
Disks and Files
  • DBMS stores information on (hard) disks.
  • This has major implications for DBMS design!
  • READ transfer data from disk to main memory
    (RAM).
  • WRITE transfer data from RAM to disk.
  • Both are high-cost operations, relative to
    in-memory operations, so must be planned
    carefully!

39
Why Not Store Everything in Main Memory?
  • Costs too much. 100 will buy you either 512MB
    of RAM or 50GB of disk today --- that is disk
    storage 100 times cheaper (but it is approx.
    10000 times slower).
  • Main memory is volatile. We want data to be
    saved between runs. (Obviously!)
  • Typical storage hierarchy
  • Main memory (RAM) for currently used data.
  • Disk for the main database (secondary storage).
  • Tapes for archiving older versions of the data
    (tertiary storage).

Remark All reported disk performance/prize data
are as of middle of 2003
40
Components of a Disk
Spindle
Disk head
  • The platters spin (say, 90rps).
  • The arm assembly is moved in or out to position
    a head on a desired track. Tracks under heads
    make a cylinder (imaginary!).

Sector
Platters
  • Only one head reads/writes at any one time.
  • Block size is a multiple of sector
    size (which is fixed).

41
Accessing a Disk Page
  • Time to access (read/write) a disk block
  • seek time (moving arms to position disk head on
    track)
  • rotational delay (waiting for block to rotate
    under head)
  • transfer time (actually moving data to/from disk
    surface)
  • Seek time and rotational delay dominate.
  • Seek time varies from about 1 to 20msec
  • Rotational delay varies from 0 to 10msec
  • Transfer rate is about 1msec per 32KB page

42
DBMS Support Transactions
  • Database management systems provide powerful
    transaction concepts that guarantee ACID
    properties
  • Transaction
  • Begin_Transaction
  • ltset of operations that read and modify the
    databasegt
  • End_Transaction
  • Usually 2 commands are available to terminate a
    transaction Abort and Commit

43
Review The ACID properties
  • A tomicity All actions of the transaction
    happen, or none happen.
  • C onsistency If each transaction is consistent,
    and the DB starts consistent, it ends up
    consistent.
  • I solation Execution of one transaction is
    isolated from that of other transaction s.
  • D urability If a transaction commits, its
    effects persist.
  • The Recovery Manager guarantees Atomicity
    Durability.

44
Example
  • Consider two transactions (Xacts)

T1 BEGIN AA100, BB-100 END T2 BEGIN
A1.06A, B1.06B END
  • Intuitively, the first transaction is
    transferring 100 from Bs account to As
    account. The second is crediting both accounts
    with a 6 interest payment.
  • There is no guarantee that T1 will execute before
    T2 or vice-versa, if both are submitted together.
    However, the net effect must be equivalent to
    these two transactions running serially in some
    order.

45
Atomicity of Transactions
  • A transaction might commit after completing all
    its actions, or it could abort (or be aborted by
    the DBMS) after executing some actions.
  • A very important property guaranteed by the DBMS
    for all transactions is that they are atomic.
  • DBMS logs all actions so that it can undo the
    actions of aborted transactions and redo the
    actions of successful transactions.

46
Concurrency in a DBMS
  • Users submit transactions, and can think of each
    transaction as executing by itself.
  • Concurrency is achieved by the DBMS, which
    interleaves actions (reads/writes of DB objects)
    of various transactions.
  • Each transaction must leave the database in a
    consistent state if the DB is consistent when the
    transaction begins.
  • DBMS will enforce some ICs, depending on the ICs
    declared in CREATE TABLE statements.
  • Beyond this, the DBMS does not really understand
    the semantics of the data. (e.g., it does not
    understand how the interest on a bank account is
    computed).
  • Issues Effect of interleaving transactions, and
    crashes.

47
Example (Contd.)
  • Consider a possible interleaving (schedule)

T1 AA100, BB-100 T2
A1.06A, B1.06B
  • This is OK. But what about

T1 AA100, BB-100 T2
A1.06A, B1.06B
  • The DBMSs view of the second schedule

T1 R(A), W(A), R(B), W(B) T2
R(A), W(A), R(B), W(B)
48
Summary
  • Concurrency control and recovery are among the
    most important functions provided by a DBMS.
  • Users need not worry about concurrency.
  • System automatically inserts lock/unlock requests
    and schedules actions of different transactions
    in such a way as to ensure that the resulting
    execution is equivalent to executing the
    transactions one after the other in some order.
  • Write-ahead logging (WAL) is used to undo the
    actions of aborted transactions and to restore
    the system to a consistent state after a crash.
Write a Comment
User Comments (0)
About PowerShow.com