CSC 485D/586D/SEng 480D Introduction - PowerPoint PPT Presentation

About This Presentation
Title:

CSC 485D/586D/SEng 480D Introduction

Description:

... over a specified field, e.g. on the name field for employees, allows fast ... One more access gets us the tuple for 'Sally' ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 18
Provided by: scie232
Category:
Tags: 480d | 485d | 586d | csc | field | introduction | sally | seng

less

Transcript and Presenter's Notes

Title: CSC 485D/586D/SEng 480D Introduction


1
CSC 485D/586D/SEng 480D Introduction
2
Whats a database?
  • In essence a database is nothing more than a
    collection of information that exists over a long
    period of time.
  • Databases are empowered by a body of knowledge
    and technology embodied in specialized software
    called a database management system, or DBMS.
  • A DBMS is a powerful tool for creating and
    managing large amounts of data efficiently and
    allowing it to persist over long periods of time,
    safely.
  • Among the most complex types of software
    available.

3
The database management system
  • Allows users to create new databases and specify
    their schema (logical structure of the data),
    using a data-definition language.
  • Gives user the ability to query the data and
    modify the data, using a query language and
    data-manipulation language.
  • Supports intelligent storage of very large
    amounts of data.
  • Protects the data from accident or not proper
    use.
  • Example We can require from the DBMS to not
    allow the insertion of two different employees
    with the same SIN.
  • Allows efficient access to the data for queries
    and modifications.
  • Example The use of indexes over a specified
    field, e.g. on the name field for employees,
    allows fast response for queries asking a
    specific name.
  • Controls access to data from many users at once
    (concurrency), without allowing bad
    interactions that can corrupt the data
    accidentally.
  • Recovers from failures and crashes.

4
Relational Model
  • Based on tables
  • Today used in most DBMSs
  • Oracle, SQL-Server, IBM DB2, Sybase, Microsoft
    Access etc
  • Challengers Object oriented DBs (ObjectStore,
    Objectivity)
  • Object-relational oo-extension of rels

5
Database Studies
  • Design of databases.
  • What kinds of information go into the database?
  • How is the information structured?
  • How do data items connect?
  • Database programming.
  • How does one express queries on the database?
  • How does one use other capabilities of a DBMS,
    such as transactions or constraints, in an
    application?
  • How is database programming combined with
    conventional programming?
  • Database system implementation.
  • How does one build a DBMS, including such matters
    as query processing, transaction processing and
    organizing storage for efficient access?

6
Fictitious Megatron 2006 DBMS
  • Stores relations as Unix files
  • Students(name, sid, dept) is stored in the file
    /home/megatron/students as
  • Smith123CS
  • Jones533EE
  • Schemas are stored in /home/megatron/schemas e.g.
  • StudentsnameSTRidINTdeptSTR
  • DeptsnameSTRofficestr

7
Megatron sample session
  • mayne megatron
  • WELCOME TO MEGATRON 2006
  • megaSQL SELECT FROM Students
  • Name id dept
  • ----------------------------------
  • Smith 123 CS
  • Johnson 522 EE
  • megaSQL

8
Megatron sample session II
  • megaSQL SELECT FROM Students
  • WHERE id gt 500 HighId.txt
  • megaSQL more HighId.txt
  • Jones522EE
  • megaSQL quit
  • THANK YOU FOR USING MEGATRON 2006
  • mayne

9
Megatron Implementation
  • To execute SELECT FROM R WHERE ltCONDgt
  • Read file schema to get attributes of R
  • Check that the ltCONDgt is semantically valid for R
  • Read file R,
  • for each line
  • check condition
  • if OK, display
  • If we pipe the result into a file, say T, then
    add an entry for T in the file /home/megatron/sche
    mas

10
Megatron Implementation II
  • To execute
  • SELECT office
  • FROM Students, Dept
  • WHERE Students.name 'Smith' AND
  • Students.dept Depts.name
  • Read file schema to get attributes and do
    semantic check.
  • If Ok, then,
  • for each tuple s in Students
  • for each tuple d in Depts
  • if s and d satisfy the WHERE condition,
  • display the office value from s

11
Whats wrong with Megatron?
  • Tuple layout on disk no flexibility for DB
    modifications.
  • Change CS to ECON and the entire file has to be
    rewritten.
  • Search Expensive no indexes always read entire
    relation.
  • Bruteforce query processing.
  • Did we need to look at all pairs of studentdept
    tuples?
  • No buffer manager everything comes off of disk
    all the time.
  • No concurrency control several users can modify
    a file at the same time with unpredictable
    results.
  • No reliability can lose data in a crash or leave
    operations half done.
  • Little security file system protection too
    coarse.

12
Architecture of a DBMS
  • The cylindrical component contains not only
    data, but also metadata, i.e. info about the
    structure of data.
  • If the DBMS is relational the metadata includes
  • names of relations,
  • names of attributes of those relations, and
  • data types for those attributes (e.g., integer or
    character string).
  • Often a database maintains indexes for the data.
  • Indexes are part of the stored data.
  • A description of which attributes have indexes is
    part of the metadata.

13
Storage and BufferManager
  • The job of the storage manager is
  • to obtain requested information from the data
    storage, and
  • to modify the information to the data storage
    when requested.
  • The buffer manager handles main memory. It
    obtains and returns blocks of data from/to the
    file manager and stores the blocks temporarily in
    main memory pages.
  • E.g. 1 block 1 page 4,000 to 16,000 bytes.  
  • One block is the smallest unit of data that is
    read/written from/to disk.

14
Query Processor
  • The query processor handles queriesmodifications
    to the data.
  • Its job is to find the best way to carry out a
    requested operation and,
  • to issue commands to the storage manager that
    will carry them out.
  • E.g. A bank has a DB with two relat.  
  • Customers (name, ssn, address),
  • Accounts (accountNo, balance, ssn)
  • Query Find the balances of all accounts of
    which Sally is the owner.
  • SELECT Accounts.balance
  • FROM Customers, Accounts
  • WHERE Customers.ssn Accounts.ssn AND
    Customers.name Sally

15
Query Processor (Cont.)
  • What this query logically says is
  • Make the Cartesian product of the tables
    specified in the FROM-clause,
  • i.e. associate each tuple of Customers with each
    tuple of Accounts.
  • We get a new temporary relation R with longer
    tuples,
  • the attributes are renamed so as to include the
    name of originating relation. (Customer.ssn etc.)
  • Chose from R only the tuples satisfying the
    condition in the WHERE clause.
  • Produce in the answer only the values of
    attributes in SELECT-clause.
  • Of course, if we would answer this query as it
    says the performance would be terrible (step 1).
  • Supp. we have an index on name of Customer and an
    index on ssn of Accounts.
  • Using the index on name of Customer we need
    usually three disk access.
  • One more access gets us the tuple for Sally
  • Similarly we need four disk accesses for finding
    the correspond. account.
  • If there are several accounts of Sally we need
    a few more accesses.

16
Transaction Manager
  • The transaction manager is responsible for the
    integrity of the system. It must assure that
  • several queries running simultaneously do not
    interfere with each other and that,
  • the system will not lose data even if there is a
    power failure.
  • The transaction manager interacts with
  • execution engine,
  • it may need to delay certain queries or
    operations in order to avoid conflicts.
  • storage manager
  • schemes for protecting the data usually involve
    storing a log of changes to the data.

17
What will be covered
  • Storage Systems
  • Physical devices and characteristics, especially
    disks.
  • Logical layout of data data structures,
    especially, indexedsequential files, Btrees,
    hashing.
  • Multidimensional indexes for GIS and OLAP
  • Query optimization (we concentrate a lot here)
  • Queryplan generation algebraic transformations.
  • Join methods.
  • Resilience
  • Logging.
  • Authorization and encryption.
  • Transaction processing Serialization, deadlocks,
    locking, timestamping.
  • Distributed DB's.
  • OLAP in detail
  • Data Integration
Write a Comment
User Comments (0)
About PowerShow.com