CSC 485E/CSC 571 Advanced Databases Introduction - PowerPoint PPT Presentation

About This Presentation
Title:

CSC 485E/CSC 571 Advanced Databases Introduction

Description:

... Megatron 2006 DBMS. Stores relations as Unix files. Students(name, sid, dept) is stored in the file /home/megatron/students ... Megatron Implementation II ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 16
Provided by: scie232
Category:

less

Transcript and Presenter's Notes

Title: CSC 485E/CSC 571 Advanced Databases Introduction


1
CSC 485E/CSC 571 Advanced Databases
Introduction
2
Whats a database?
  • In essence a database is nothing more than a
    collection of information that exists over a long
    period of time.
  • Databases are empowered by a body of knowledge
    and technology embodied in specialized software
    called a database management system, or DBMS.
  • A DBMS is a powerful tool for creating and
    managing large amounts of data efficiently and
    allowing it to persist over long periods of time,
    safely.
  • Among the most complex types of software
    available.

3
The database management system
  • Allows users to create new databases and specify
    their schema (logical structure of the data),
    using a data-definition language.
  • Enables users to query and modify the data, using
    a query language and data-manipulation language.
  • Supports intelligent storage of very large
    amounts of data.
  • Protects data from accident or not proper use.
  • Example We can require from the DBMS to not
    allow the insertion of two different employees
    with the same SIN.
  • Allows efficient access to the data for queries
    and modifications.
  • Example Indexes over a specified fields
  • Controls access to data from many users at once
    (concurrency), without allowing bad
    interactions that can corrupt the data
    accidentally.
  • Recovers from software failures and crashes.

4
Early DBMSs (1960s)
  • They encouraged the user to view the data much as
    it was stored.
  • The chief models were the Hierarchical and
    Network.
  • The main characteristic of these models was the
    possibility of easy jumping or navigating from
    one object to another through pointers.
  • E.g. From one employee to his department.
  • However these models didnt provide a high-level
    query language for the data.
  • So, one had still to write programs for querying
    the data.
  • Also they didnt allow on-line schema
    modifications.

5
Relational databases
  • Codd (1970)
  • A database system should present the user with a
    view of data organized as tables (also called
    relations).
  • Behind the scene there could be a complex data
    structure that allows rapid response to a variety
    of queries.
  • But the user would not be concerned with the
    storage structure.
  • Queries could be expressed in a very high-level
    language, which greatly increases the efficiency
    of database programmers.
  • This high-level query language for relational
    databases is called Structured Query Language
    (SQL)

6
Database Studies
  • Design of databases.
  • What kinds of information go into the database?
  • How is the information structured?
  • How do data items connect?
  • Database programming.
  • How does one express queries on the database?
  • How does one use other capabilities of a DBMS,
    such as transactions or constraints, in an
    application?
  • How is database programming combined with
    conventional programming?
  • Database system implementation.
  • How does one build a DBMS, including such matters
    as query processing, transaction processing and
    organizing storage for efficient access?

Well focus on this part
7
Fictitious Megatron 2006 DBMS
  • Stores relations as Unix files
  • Students(name, sid, dept) is stored in the file
    /home/megatron/students as
  • Smith123CS
  • Jones533EE
  • Schemas are stored in /home/megatron/schemas e.g.
  • StudentsnameSTRidINTdeptSTR
  • DeptsnameSTRofficestr

8
Megatron sample session
  • mayne megatron
  • WELCOME TO MEGATRON 2006
  • megaSQL SELECT FROM Students
  • Name id dept
  • ----------------------------------
  • Smith 123 CS
  • Johnson 522 EE
  • megaSQL

9
Megatron sample session II
  • megaSQL SELECT FROM Students
  • WHERE id gt 500
  • Johnson522EE
  • megaSQL quit
  • THANK YOU FOR USING MEGATRON 2006
  • mayne

10
Megatron Implementation
  • To execute SELECT FROM R WHERE ltCONDgt
  • Read file schema to get attributes of R
  • Check that the ltCONDgt is semantically valid for R
  • Read file R,
  • for each line
  • check condition
  • if OK, display
  • If we pipe the result into a file, say T, then
    add an entry for T in the file /home/megatron/sche
    mas

11
Megatron Implementation II
  • To execute
  • SELECT office
  • FROM Students, Dept
  • WHERE Students.name 'Smith' AND
  • Students.dept Depts.name
  • Read file schema to get attributes and do
    semantic check.
  • If Ok, then,
  • for each tuple s in Students
  • for each tuple d in Depts
  • if s and d satisfy the WHERE condition,
  • display the office value from s

12
Whats wrong with Megatron?
  • Tuple layout on disk no flexibility for DB
    modifications.
  • Change CS to ECON and the entire file has to be
    rewritten.
  • Search Expensive no indexes always read entire
    relation.
  • Bruteforce query processing.
  • Did we need to look at all pairs of studentdept
    tuples?
  • No buffer manager everything comes off of disk
    all the time.
  • No concurrency control several users can modify
    a file at the same time with unpredictable
    results.
  • No reliability can lose data in a crash or leave
    operations half done.
  • Little security file system protection too
    coarse.

13
Architecture of a DBMS
  • The cylindrical component contains not only
    data, but also metadata, i.e. info about the
    structure of data.
  • If DBMS is relational, metadata includes
  • names of relations,
  • names of attributes of those relations, and
  • data types for those attributes (e.g., integer or
    character string).
  • A database also maintains indexes for the data.
  • Indexes are part of the stored data.
  • Description of which attributes have indexes is
    part of the metadata.

14
What will be covered
  • Secondary Storage Management
  • Disks
  • Accelerated access
  • Handling disk failures
  • Arranging data on disk
  • Index Structures
  • B-Trees, Extensible Hash Tables, etc.
  • Multidimensional Indexes (for GIS and OLAP)
  • Query Execution (we concentrate a lot here)
  • Algorithms for relational operators
  • Join methods.
  • Query Compiler (we concentrate a lot here)
  • Algebraic laws for improving query plans.
  • Cost based plan selection
  • Join orders

15
What will be covered
  • Concurrency Control
  • Pessimistic schemes (locking)
  • Optimistic schemes (timestamps)
  • Parallel and Distributed Databases
  • Parallel algorithms on relations
  • Distributed query processing
  • Distributed transactions
  • Googles Map-Reduce framework
  • Peer-to-peer distributed search
  • Data Mining
  • Frequent-Itemset Mining
  • Finding similar items
  • Clustering of large-scale data
  • Databases and the Internet
  • Search engines
  • PageRank
  • Data streams
  • Data mining of streams
Write a Comment
User Comments (0)
About PowerShow.com