CS157B - Fall 2004 - PowerPoint PPT Presentation

1 / 163
About This Presentation
Title:

CS157B - Fall 2004

Description:

Title: Slide 1 Last modified by: Fain Butt Document presentation format: Custom Other titles: Times New Roman Arial WingDings LotusWP Type Default Design Slide 1 ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 164
Provided by: csSjsuEd2
Category:

less

Transcript and Presenter's Notes

Title: CS157B - Fall 2004


1
CS157B - Fall 2004
2
Entity-Relationship Data Model
  • Entity Sets
  • a collection of similar entities
  • Attributes
  • properties of the entities in an entity set
  • Relationships
  • connection among two or more entity sets
  • Constraints
  • Keys uniquely identify an entity within its
    entity set
  • Single-value constraints requirement that the
    value in a certain context be unique
  • Referential integrity constraints existence
    test
  • Domain constraints range test
  • General constraints data set constraints

3
E/R - Cont.
  • Constraints Are of the Schema
  • Should be part of the DB design
  • Must be carefully constructed
  • Can be troublesome if major redesign is required
  • Subclasses
  • Special "isa" relationship
  • Similar to OO's definition
  • Weak Entity Sets
  • Some of its key may be composed of attributes
    which belong to another entity set
  • Need additional qualifier

4
The Relational Data Model
  • Relation
  • Represented by a two-demensional table
  • Attribute
  • Represented as columns in a table
  • Schema
  • The name of a relation and the set of attributes
    of this relation
  • A database schema consists of one of more
    relation schemas
  • Tuple
  • A row in the table
  • Domain
  • Each attribute of a relation is an elementary
    type, thus each tuple must be atomic

5
Relational - Cont
  • Converting Subclass Structures to Relations
  • E/R-Style Conversion
  • Object-Oriented Approach
  • Null Values
  • Functional Dependencies (FD)
  • An unique-value constraint in a relational schema
    design
  • Keys of Relations
  • A(1), ..., A(n) is a key of relation R if
  • no distinct tuples agree on all of the ...
  • no other subset of determines all other
    attributes of R, must be minimal
  • If more than one key, then designate one of the
    keys as the primary key
  • Superkey Composed by a set of attributes

6
Relational - Cont
  • Design of Relational Database Schemas
  • Anomalies
  • Redundancy Many repeated information in each
    tuple
  • Update Anomalies One tuple update does not
    cause other tuples to be updated
  • Deletion Anomalies Lose other information as a
    side effect when a set of values becomes empty
  • Decomposing Relations
  • Splitting the attributes of R to make the schemas
    of two new relations
  • Boyce-Codd Normal Form (BCNF)
  • The left side of every nontrivial FD must be a
    superkey
  • Recursively decompose into smaller sets of tables
    until they all satisfy the BCNF rule
  • Must be able to join all set of tables into the
    original tuple

7
Relational - Cont
  • Third Nomal Form (3NF)
  • A relation R is in 3NF if whenever A(1)...A(n)
    -gtB is a nontrivial FD, either A(1)...A(n) is a
    superkey, or B is a member of some key
  • May allow minimal redundancy in the end
  • Multivalued Dependencies (MVD)
  • A statement that two sets of attributes in a
    relation have sets of values that appear in all
    possible combinations
  • Example of MVD in p.118 figure 3.29
  • No BCNF violation
  • Fourth Normal Form (4NF)
  • A relation R is in 4NF if whenever
    A(a)..A(n)-gt-gtB(1)..B(m) is a nontrivial MVD,
    A(1)...A(n) is a superkey
  • Can be decomposed similar to the BCNF
    decomposition algorithm

8
Other Data Models
  • Review of Object-Oriented Concepts
  • Type System
  • Record structures
  • Collection types
  • Reference types
  • Classes and Objects
  • Contain attributes and methods
  • Object Identity
  • OID
  • Methods
  • Functions within a class
  • Hierarchies
  • Superclass and subclass relationship
  • Introduction to ODL
  • Object-Oriented Design
  • Three main properties attributes, relationships,
    and methods

9
Other Data Models - Cont
  • Class Declarations
  • e.g. class ltnamegt ltlist of propertiesgt
  • Attributes in ODL
  • can be simple or complex type
  • Relationships in ODL
  • e.g. in Movie relationship SetltStargt stars
  • Inverse Relationships
  • e.g. in Star relationship SetltMoviegt starredIn
  • Methods in ODL
  • signatures
  • Types in ODL
  • Atomic types
  • integer
  • float
  • character
  • character string
  • boolean
  • enumerations

10
Other Data Models - Cont
  • Structure types
  • e.g. class name
  • Structures - similar to a tuple of values
  • Collection types
  • Set - distinct unordered elements
  • Bag - unordered elements
  • List - ordered elements
  • Array - e.g. Arrayltchar,10gt
  • Dictionary - e.g. Dictionary(Keytype, Rangetype)
  • Additional ODL Concepts
  • Multiway Relationships in ODL
  • ODL supports only binary relationships
  • Use several binary, many-one relationship instead
  • Subclasses in ODL
  • Inherits all the properties of its superclass

11
Other Data Models - Cont
  • Multiple Inheritance in ODL
  • Needs to resolve name conflicts from multiple
    superclass
  • Keys in ODL
  • Optional because of the existence of an OID
  • Can be declared with one or more attributes
  • ODL Designs to Relational Designs
  • Many issues
  • Possibly no key in ODL
  • Relational can't handle structure and collection
    types directly
  • Convert any ODL type constructor can lead to a
    BCNF violation
  • Representing ODL Relationships
  • one relation for each inverse pairs

12
Other Data Models - Cont
  • Object-Relational Model
  • O/R Features
  • Structured types for attributes
  • Methods
  • Identifiers for tuples (like OID)
  • References
  • Compromise Between Pure OO and Relational
  • Semistructured Data
  • "Schemaless" - the schema is attached to the data
    itself
  • Represented in a collection of nodes
  • Interior node describes the data with a label
  • Leaf node contains the data of any atomic type
  • Legacy-database problem

13
Relational Algebra
  • Algebra of Relational Operations
  • Bags rather than sets can be more efficient
    depending on the operation such as an union of
    two relations which contain duplicates
  • Components
  • Variables that stand for relations and Constants
    which are finite relations
  • Expressions of relational algebra are referred to
    as queries
  • Set Operations on Relations
  • R u S, the union of R and S (distinct)
  • R n S, the intersection of R and S (distinct)
  • R - S, the difference of R and S, is the set of
    elements that are in R but not in S different
    from S - R
  • Tuples must be in the same order and attribute
    types must be the same

14
Relational Algebra - Cont
  • Projection
  • The projection operator is used to produce from a
    relation R a new relation that has only some of
    R's columns
  • e.g. Views
  • Selection
  • The selection operator produces a new relation R
    with a subset of R's tuples the tuples in the
    resulting relation are those that satisfy some
    condition C that involves the attributes of R
  • e.g. SQL select statement
  • Cartesian Product
  • A cross-product of two sets R and S
  • e.g. R x S if R has 2 tuples and S has 3
    tuples, the result has 6 tuples
  • Natural Joins
  • Use matching attributes

15
Relational Algebra - Cont
  • Theta-Joins
  • Join two relations with a condition denoted by
    'C'
  • Combining Operations to Form Queries
  • Multiple queries can be combined into a complex
    query
  • e.g. AND, OR, (), ...
  • Renaming
  • Control the names of the attributes used for
    relations that are constructed by applying
    relational-algebra operations
  • Dependent and Independent Operations
  • Some relational expression can be "rewritten" in
    a different expression
  • e.g. R n S R - (R - S)
  • e.g. Glass half-empty or half-full?

16
Relational Algebra - Cont
  • Extended Operators of Relational Algebra
  • Duplicate Elimination
  • This operator converts a bag to a set
  • e.g. SQL keyword DISTINCT
  • Aggregation Operators
  • SUM(), AVG(), MIN(), MAX(), COUNT()
  • Grouping
  • This operator allows us to group a relation
    and/or aggregate some columns
  • Extended Projection
  • This allows expressions as part of an attribute
    list
  • Sorting Operator
  • This operator is anomalous, in that it is the
    only operator in our relational algrebra whose
    result is a list of tuples, rather than a set
  • Outerjoin
  • Takes care of the dangling tuples denote with
    special "null" symbol

17
SQL
  • Simple Queries in SQL
  • select A(...) from R where C
  • Projection Selection in SQL
  • select title, length from Movie where studioName
    'Disney' and year 1990
  • String Comparison
  • Bit Strings (bit data)
  • e.g. B'011' or X'7ff'
  • Lexicographic order
  • e.g. 'A' lt 'B' 'a' lt 'b' 'a' gt 'B'?
  • Depends on encoding scheme or standard
  • (e.g. ASCII, UTF8)
  • LIKE keyword
  • "s like p" denotes where s is a string and p is a
    pattern
  • Special character
  • Dates and Times
  • DATE '2002-02-04'
  • TIME '190030.5'

18
SQL - Cont
  • Null Values
  • 3 cases unknown, inapplicable, withheld
  • NOT a constant
  • Test expression for IS NULL
  • Truth-Value "unknown"
  • Pitfalls regarding nulls
  • e.g. select from Movie
  • where length lt 120 or length gt 120
  • Ordering the Output
  • ORDER BY ltlist of attributesgt
  • Can be ASC or DESC, default is ASC
  • Queries with gt 1 Relation
  • e.g. select name from Movie, MovieExec where
    title 'Star Wars' and producerC cert
  • Disambiguating Attributes
  • select MovieStar.name, MovieExec.name from
    MovieStar, MovieExec where MovieStar.address
    MovieExec.address

19
SQL - Cont
  • UNION, INTERSECT, EXCEPT Keywords
  • Same logic as the set operators of u, n, and -
  • Subqueries
  • Can return a single constant or relations in the
    WHERE clause
  • Can have relations appear in the FROM clauses
  • Scalar Value
  • An atomic value that can appear as one component
    of a tuple (e.g. constant, attribute)
  • e.g. select name from MovieExec where
    cert(select producerC from Movie where
    title'Star Wars')
  • Conditions Involving Relations
  • If R is a relation, then EXISTS R is a condition
    that is true if R is not empty

20
SQL - Cont
  • s IN R is true if s is equal to one of the values
    in R s NOT IN R is true if s is equal to no
    value in R
  • s gt ALL R is true if s is greater than every
    value in unary relation R
  • s gt ANY R is true if s is greater than at least
    one value in unary relation R
  • EXISTS, ALL, ANY operators can be negated by
    putting NOT in front of the entire expression
  • Subqueries in FROM Clauses
  • Can substitute a R in the FROM clause with a
    subquery
  • e.g. select name from MovieExec, (select
    producerC from Movie, StarsIn where title
    movieTitle and year movieYear and starname
    'Harrison Ford'

21
SQL - Cont
  • Cross Joins
  • Known as Cartesian product or just product
  • e.g. Movie CROSS JOIN StarsIn
  • Natural Joins
  • The join condition is that all pairs of
    attributes from the two relations having a common
    name are equated, and no other conditions
  • One of each pair of equated attributes is
    projected out
  • Outerjoins
  • e.g. MovieStar NATURAL FULL OUTER JOIN MovieExec
  • Full-Relation Operations
  • Eliminating Duplicates
  • Use the DISTINCT keyword in SELECT
  • Performance consideration

22
SQL - Cont
  • Duplicates in U, I, and D
  • By default, UID operations convert bags to sets
  • Use keyword ALL after UNION, INTERSECT EXCEPT
    keywords to prevent the elimination of duplicates
  • Grouping and Aggregation in SQL
  • Use the special GROUP BY clause
  • Aggregation Operators
  • SUM, AVG, MIN, MAX are used by applying them to a
    scalar-valued expression, typically a column
    name, in a SELECT clause
  • COUNT() is used to counts all the tuples in R
    that is constructed from the FROM clause and
    WHERE clause of the query
  • HAVING Clause
  • Use in conjunction with GROUP BY to narrow the
    aggregated list

23
SQL - Cont
  • Database Modifications (DML)
  • Insertion
  • Insert into R(A1...An) values (V1,...Vn)
  • e.g. insert into Studio(name) values('S1')
  • Can insert multiple tuples with subquery
  • e.g. insert into Studio(name) select distinct
    studioName from Movie where studioName not in
    (select name from studio)
  • Deletion
  • Delete from R where ltconditiongt
  • Can delete multiple tuples with 1 delete
    statement depending on ltconditiongt
  • Updates
  • Update R set ltnew-value assignmentgt where
    ltconditiongt
  • Can update multiple tuples with 1 update
    statement depending on ltconditiongt

24
SQL - Cont
  • DDL in SQL
  • Data Types
  • Character strings, fixed or variable length
  • Bit strings, fixed or variable length
  • Boolean true, false, unknown
  • Integer or int shortint
  • Floating-point numbers
  • e.g. decimal(n,d) where n is total number of
    digits with d is the decimal point from the
    right
  • 1234.56 can be described as decimal(6,2)
  • Dates and times can be represented by the data
    types DATE and TIME respectively
  • Table Declarations
  • Use the keywords CREATE TABLE followed by the R
    name and list of As and their types
  • e.g. create table MovieStar(name char(30),
    address varchar(256), gender char(1), birthday
    DATE)

25
SQL - Cont
  • Modifying Relation Schemas
  • Drop table MovieStar
  • Alter table MovieStar add phone char(16)
  • Alter table MovieStar drop birthdate
  • Default Values
  • Use the DEFAULT keyword to set default values for
    a column
  • e.g. alter table MovieStar add phone char(16)
    default 'unlisted'
  • Indexes
  • Allow faster access to data
  • e.g. create index YearIndex on Movie(year)
  • Can be one or more attributes
  • e.g. create index KeyIndex on Movie(title,year)
  • Delete the index using drop index statement
  • Selection of Indexes
  • Selection vs IUD performance

26
SQL - Cont
  • View Definitions
  • View does not contains any physical data
  • "virtual relation"
  • Declaring Views
  • Create view ltview-namegt as ltview-definitiongt
  • Querying Views
  • Use the normal select syntax with a view name in
    place of the table name
  • Renaming Attributes
  • Can map table attribute name from the base table
    to a new name in a view definition
  • Modifying Views
  • Updatable views are useful in special cases
    selective IUDs

27
Constraints Triggers
  • Keys Foreign Keys
  • Primary Keys
  • Each relation can have only one primary key
  • Primary key attribute(s) can not be NULL
  • Two ways to specify the primary key
  • 1) create table MovieStar(
  • name char(30) primary key,
  • address varchar(255),
  • gender char(1),
  • birthdate date)
  • 2) create table MovieStar(
  • name char(30),
  • address varchar(255),
  • gender char(1),
  • birthdate date,
  • primary key(name, birthday))
  • Unique Keys
  • Each relation can have gt1 unique keys
  • Declared the same way as primary key

28
Constraints Triggers - Cont
  • Enforcing Key Constraints
  • During insertion or update to the relation
  • Foreign-Key
  • The "referenced" attribute(s) must be declared
    unique or the primary key for their relation it
    must not have a NULL value
  • create table Studio(
  • name char(30) primary key,
  • address varchar(255),
  • presC int references MovieExec(cert)
  • )
  • create table Studio(
  • name char(30) primary key,
  • address varchar(255),
  • presC int,
  • foreign key (presC) references
    MovieExec(cert)
  • )

29
Constraints Triggers - Cont
  • Maintaining Referential Integrity
  • Reject Violating Modifications (Default)
  • Insert or update Studio tuple whose presC value
    is not NULL and is not the cert component of any
    MovieExec tuple
  • Delete a MovieExec tuple and its cert component
    appears as the presC component of one or more
    Studio tuples
  • Update a MovieExec tuple cert value but the old
    cert is the value of presC of some movie studio
    in Studio
  • Cascade Policy
  • When deleting the MovieExec tuple for the
    president of a studio, then it will delete the
    referencing tuple from Studio
  • By changing the cert value for a MovieExec tuple
    from c1 to c2 and there was some Studio tuple
    with c1 as the value of its presC component,
    then it will update this presC component to have
    the value c2

30
Constraints Triggers - Cont
  • Set-Null Policy
  • Can handle the delete and update problem by
    setting the presC to NULL
  • e.g. create table Studio (
  • name char(30) primary key,
  • address varchar(255),
  • presC int references MovieExec(cert)
  • on delete set null
  • on update cascade
  • )
  • Deferring Checking of Constraints
  • Do selective insert to default the presC to null
  • Insert tuple into MovieExec with new cert
  • Update the Studio tuple with matching presC
  • Use keyword DEFERRABLE and DEFERRED to delay the
    checking until the whole tranaction is
    "committed"
  • Reverse the DEFERRED case with keyword IMMEDIATE

31
Constraints Triggers - Cont
  • Constraints on Attributes and Tuples
  • Not-Null Constraints
  • Use the NOT NULL keywords in create table
    statement for any attribute
  • Attribute-Based Constraints
  • Use the CHECK keyword in create table statement
  • Limit the value for an attribute
  • e.g. gender char(1) check (gender in ('F','M'))
  • Tuple-Based Constraints
  • Use the CHECK keyword in create table statement
  • Can compose of complex expression of multiple
    attributes
  • Constraints Modification
  • Naming Constraints
  • In order to change, it must have a name
  • Use the CONSTRAINT keyword

32
Constraints Triggers - Cont
  • Altering Constraints on Tables
  • Can use ALTER TABLE to add or drop a constraint
  • Can use SET CONSTRAINTS to set it for deferred or
    immediate
  • Schema-Level Constraints and Triggers
  • Assertions (General Constraint)
  • A boolean-valued SQL expression that must be true
    at all times
  • create assertion ltnamegt check (ltconditiongt)
  • e.g.
  • create assertion RichPres check (not exists
    (select from Studio, MovieExec where presC
    cert AND netWorth lt 10000000))
  • Event-Condition-Action Rules (ECA Rules)
  • Triggers are awakened by certain events
  • The "action" will be preform only if C true

33
Constraints Triggers - Cont
  • Triggers in SQL
  • create trigger NetWorthTrigger
  • after update of netWorth ON MovieExec
  • referencing
  • old row as OTuple,
  • new row as NTuple
  • for each row
  • when (OTuple.netWorth gt NTuple.netWorth)
  • update MovieExec
  • set netWorth OTuple.netWorth
  • where cert NTuple.cert
  • Default is "for each statement"
  • Besides update, can use insert and delete
  • Action can be "before" or "after" the event
  • Use BEGIN...END for multiple statements
  • Instead-Of Triggers
  • Not part of SQL-99
  • Replace event with new defined operations
  • Very powerful when used on a view

34
System Aspects of SQL
  • SQL Programming Environment
  • Host language Embedded SQL
  • v
  • Preprocessor
  • v
  • Host language Function calls
  • v
  • Host-language compiler lt SQL Library
  • v
  • Object-code program
  • Impedance Mismatch Problem
  • Different data model between SQL statements and
    programming langauges
  • SQL/Host Language Interface
  • Use EXEC SQL keywords in front of an SQL
    statement
  • Use shared (host) variables for SQL stmt
  • Check SQLSTATE for SQL errors

35
System Aspects of SQL - Cont
  • The DECLARE Section and Its Usage
  • Shared variables are declared between two
    embedded SQL statements.
  • e.g.
  • EXEC SQL BEGIN DECLARE SECTION
  • char studioName50, studioAddr256
  • char SQLSTATE6
  • EXEC SQL END DECLARE SECTION
  • A shared variable can be used within the SQL
    statement by placing a colon in front it.
  • e.g.
  • EXEC SQL INSERT INTO
  • Studio(name, address)
  • VALUES (studioName, studioAddr)
  • Single-Row Select Statement
  • e.g.
  • EXEC SQL SELECT netWorth
  • INTO presNetWorth
  • FROM Studio, MovieExec
  • WHERE presC cert AND
  • Studio.name studioName

36
System Aspects of SQL - Cont
  • Cursors
  • Allow programs to "fetch" multiple rows from a
    relation
  • Here are the steps for using a cursor
  • EXEC SQL DECLARE ltcursorgt CURSOR FOR ltquerygt
  • EXEC SQL OPEN ltcursorgt
  • EXEC SQL FETCH FROM ltcursorgt INTO
    ltlist-of-variablesgt
  • If SQLSTATE is "02000", then goto close ltcursorgt
    otherwise fetch next row
  • EXEC SQL CLOSE ltcursorgt
  • Row Modification with Cursor
  • Use the WHERE CURRENT OF keywords
  • e.g.
  • EXEC SQL DELETE FROM MovieExec
  • WHERE CURRENT OF execCursor
  • EXEC SQL UPDATE MovieExec
  • SET netWorth 2 netWorth
  • WHERE CURRENT OF execCursor

37
System Aspects of SQL - Cont
  • Concurrent Update of Tuple
  • Use keywords INSENSITIVE CURSOR to ignore new
    changes which may affect the current cursor
  • Use Keywords FOR READ ONLY to signal that this
    cursor does not allow any modification
  • Scrollable Cursors
  • Allow a set of movements within a cursor
  • Dynamic SQL
  • Flexibility to enter SQL statement at run time
  • Use EXEC SQL EXECUTE IMMEDIATE or
  • ( EXEC SQL PREPARE ... and
  • EXEC SQL EXECUTE ... )
  • e.g.
  • EXEC SQL BEGIN DECLARE SECTION
  • char query
  • EXEC SQL END DECLARE SECTION
  • / Allocate memory pointed to by query
  • and fill in the SQL statement /
  • EXEC SQL EXECUTE IMMEDIATE query

38
System Aspects of SQL - Cont
  • Procedures Stored in the Schema
  • Persistent Stored Modules (PSM)
  • Can build module to handle complex computations
    which cannot be expressed using SQL
  • PSM Functions Procedures
  • CREATE PROCEDURE ltnamegt (ltparamgt)
  • local declarations
  • procedure body
  • Procedure parameter can be input-only,
    output-only, or both
  • CREATE FUNCTION ltnamegt (ltparamgt) RETURNS lttypegt
  • local declarations
  • function body
  • Function parameter can only be input as PSM
    forbids side-effects in functions

39
System Aspects of SQL - Cont
  • Statements in PSM
  • Call statement
  • CALL ltproc namegt (ltarg listgt)
  • e.g. EXEC SQL CALL Foo(x, 3)
  • RETURN ltexpressiongt
  • DECLARE ltnamegt lttypegt
  • SET ltvariablegt ltexpressiongt
  • BEGIN ... END
  • IF ltconditiongt THEN
  • ltstatement listgt
  • ELSEIF ltconditiongt THEN
  • ltstatement listgt
  • ELSEIF
  • ...
  • ELSE ltstatement listgt
  • END IF
  • SELECT ltattrgt INTO ltvargt FROM lttablegt
  • WHERE ltconditiongt
  • LOOP ltstatement listgt END LOOP

40
System Aspects of SQL - Cont
  • FOR ltloop namegt AS ltcursor namegt CURSOR FOR
  • ltquerygt
  • DO
  • ltstatement listgt
  • END FOR
  • Support WHILE and REPEAT loops
  • Exception Handler in PSM
  • DECLARE ltwhere to gogt HANDLER FOR ltcondition
    listgt ltstatementgt
  • ltwhere to gogt can be
  • CONTINUE - executing the handler statement and
    then execute the next statement after the one
    which cause the exception
  • EXIT - execute the handler statement and then
    control leaves the BEGIN...END block in which the
    handler is declared
  • UNDO - same as EXIT except that any changes to
    the DB or local variables that were made by the
    statements of the block are "undone"

41
System Aspects of SQL - Cont
  • SQL Environment
  • Schema
  • A collection of tables, views, assertions,
    triggers, PSM modules, etc
  • CREATE SCHEMA ltnamegt ltdeclarationsgt
  • Use SET SCHEMA to change schema name
  • Catalog
  • A collection of schemas
  • CREATE CATALOG ltcatalog namegt
  • Use SET CATALOG to change the current catalog
  • Cluster
  • A collection of catalogs
  • Can be view as a set of all catalogs accessible
    to a user
  • Client/Server
  • Both client and server can be on the different or
    the same machine

42
System Aspects of SQL - Cont
  • Connection
  • CONNECT TO ltserver namegt AS ltconnection namegt
    AUTHORIZATION ltname and passwordgt
  • SET CONNECTION ltnamegt
  • DISCONNECT ltnamegt
  • Call-Level Interface (CLI)
  • In C, each CLI program must include sqlcli.h
    where it contains all the function, structure,
    constant, and type definitions
  • 4 kinds of records SQLHENV, SQLHDBC, SQLHSTMT,
    and SQLHDESC.
  • Use SQLAllocHandle(hType, hIn, hOut)
  • Processing Statements
  • Use SQLPrepare(sh, st, sl) SQLExecute(sh)
  • or use SQLExecDirect(sh, st, sl)
  • Use SQLFetch(sh) from a query result

43
System Aspects of SQL - Cont
  • Use SQLBindCol(sh, colNo, colType, pVar, varSize,
    varInfo) for column binding
  • Can use SQLGetData(...) in place of
    SQLBindCol(...) to extract data from a query
  • Passing Parameters to Query
  • e.g.
  • SQLPrepare(myStmt, "INSERT INTO Studio(name,
    address) VALUES (?, ?)", SQL_NTS)
  • SQLBindParameter(myStmt, 1, ..., studioName,
    ...)
  • SQLBindParameter(myStmt, 2, ..., studioAddr,
    ...)
  • SQLExecute(myStmt)
  • Transactions in SQL
  • Serializability
  • Multiple selects followed by multiple updates to
    the same tuple e.g. chooseSeat()
  • Use locks to handle this problem

44
System Aspects of SQL - Cont
  • Atomicity
  • Single user transaction may have multiple updates
    to different tables e.g. transfer from account A
    to account B
  • Only "commit" after all the changes are made
  • Transaction
  • A collection of one or more operations on the
    database that must be executed atomically
  • Use START TRANSACTION to begin
  • Use SQL COMMIT to commit
  • Use SQL ROLLBACK to abort and undo the changes
    prior to the start of the transaction
  • Can set the transaction to READ ONLY
  • Dirty Reads (Uncommitted Reads)
  • Data read that were "dirty" or uncommitted
  • Isolation Levels
  • Serializable, uncommitted read, committed read,
    and repeatable-read

45
System Aspects of SQL - Cont
  • Security User Authorization in SQL
  • Privileges
  • SQL defines nine types of privileges SELECT,
    INSERT, DELETE, UPDATE, REFERENCES, USAGE,
    TRIGGER, EXECUTE, and UNDER
  • Authorization Checking
  • First at connect time
  • Second at statement time
  • Additional checks with modules
  • Grant Revoke
  • GRANT ltprivilege listgt ON ltdatabase elementgt TO
    ltuser listgt
  • Allow other user to perform certain actions
  • REVOKE ltprivilege listgt ON ltdatabase elementgt
    FROM ltuser listgt
  • Disallow a previously granted privilege

46
Data Storage
  • Megatron 2002 Database System
  • Store relation in ASCII text file
  • Store the schema also in ASCII file
  • Obvious problems
  • Tuple layout on disk is not flexible any small
    change may shuffle the whole file
  • Searching is expensive must read the whole file
  • Query-processing is by brute force nested loop
    to examine all possibilities
  • No memory buffering, every query requires direct
    access to disk
  • No concurrency control
  • No reliability e.g. no crash recovery
  • The Memory Hierarchy
  • Cache Memory
  • Fast access to and from processor or I/O
    controller

47
Data Storage - Cont
  • Main Memory
  • Random access (RAM)
  • Both OS and applications reside in RAM
  • Virtual Memory
  • Allows each application to have their own private
    memory space which mapped to physical memory
    (RAM) or disk memory
  • A page is a memory block used by main memory
    to/from disk
  • Secondary Storage
  • Much slower than main memory
  • Two type of disk I/O
  • Disk read means moving a block from disk to main
    memory
  • Disk write means moving a block from main memory
    to disk
  • Most DBMS will manage disk blocks itself, rather
    than relying on the OS file manager

48
Data Storage - Cont
  • Volatile and Nonvolatile Storage
  • Main memory is typically volatile thus when the
    power is off, the content is gone
  • Flash memory are nonvolatile but it is very
    expensive and currently not used in main memory
  • An alternative is to use "RAM disk" combine with
    a battery backup to the power supply
  • Disks
  • Disk Components
  • Head, platter (2 surfaces each), cylinder,
    tracks, sectors, gap
  • Disk Controller
  • Controls the movement of the disk head(s) to a
    specific track and preforms reads and writes
  • Tranfers data to and from main memory

49
Data Storage - Cont
  • Effective Use of Secondary Storage
  • CS studies of algorithm often assumes that the
    data are always in main memory this is not a
    valid assumption for DBMS
  • I/O Model of Computation
  • Dominance of I/O cost
  • If a block needs to be moved between disk and
    main memory, then the time taken to perform the
    read/write is much larger than the time for
    manipulating that data in main memory thus the
    I/O time is a good approximation of the total
    time
  • Similar to Big O notation for algorithm study
  • Sorting Data in Secondary Storage
  • If we need to sort 1.64 billion bytes and a disk
    block is configured to handle 16384 bytes, then
    100000 blocks are required to read each tuple
    once from disk
  • Quicksort is one of the fastest algorithm but its
    assumption is all entries are in memory

50
Data Storage - Cont
  • Two-Phase, Multiway Merge-Sort (TPMMS)
  • Consists of 2 phases
  • Phase 1 Sort main-memory-sized pieces of the
    data, so every record is part of a sorted list
    that just fits in the availabe main memory the
    results are a set of sorted sublists on disk
    which we merge in the next phase
  • Phase 2 Merge all the sorted sublists into a
    single sorted list
  • Example
  • If we have 100 MB of main memory using 16384 size
    block sorting 1.64 billion bytes, we can fit 6400
    blocks at a time in main memory thus the
    results from phase 1 will have 16 sorted sublists
  • If merge two sublists at a time, we need 8 disk
    I/O's performed on it
  • The better approach is to read the first block
    from each of the sorted list into main-memory
    buffer. Find smallest element into a output
    buffer and flush/reload when necessary.

51
Data Storage - Cont
  • Accelerating Access to Secondary Storage
  • TPMMS example in 11.4.4 assumed that data was
    stored on a single disk and the blocks were
    chosen randomly
  • There are certainly room for improvement with the
    following methods with their advantages and
    disadvantages
  • Cylinder-Based Organization
  • Can reduce disk block access time in phase one of
    TPMMS by more than 95
  • Excellent for applications that has only one
    process accessing the disk and block reads are
    grouped in logical sequences
  • Not useful when reading random blocks
  • Multiple Disks
  • Increase both group and random access time
  • Same disk access can't be parallel
  • Can be expensive since single large disk is
    usually more cost effective than multiple smaller
    disks with the same capacity

52
Data Storage - Cont
  • Mirroring
  • Reduce access time for read/write requests
  • Built-in fault tolerance for all applications
  • Must pay for 2 disks for the capacity of only 1
  • Elevator Algorithm
  • Reduce read/write access time when the blocks are
    random
  • The average delays for each request can be high
    for any high-traffic system
  • Prefetching/Double Buffering
  • Greatly improve access when the blocks are known
    or grouped together.
  • Require extra main-memory buffers
  • No help when accesses are random
  • Disk Failures
  • Intemittent Failures
  • An attempt to read or write a sector failed, but
    successful after n number of retries

53
Data Storage - Cont
  • Checksum
  • Widely used method to detect media errors
  • Use a collection of bits to calculate a fixed
    number when the recalculation failed, then a
    media error is the likely cause
  • Detect errors but does not fix them
  • Stable Storage
  • Similar to the disk mirroring except that this is
    achieved at the software/application level
  • Keep an extra "delta" copy of the data to prevent
    media error and possible data corruption caused
    by power failure
  • Disk Crash Recovery
  • Redundant Arrays of Independent Disks-RAID
  • Redundancy Technique - Mirroring
  • Known as RAID level 1
  • When one of the disk failed, then the other
    "mirroring" disk will become the main disk

54
Data Storage - Cont
  • Parity Blocks
  • Known as RAID level 4
  • Use only 1 redundant disk no matter how many data
    disks it may support
  • Utilizing the modulo-2 sum for parity checks
  • Too many disks can cause the redundant disk to
    perform poorly since each disk write in any n
    disks can cause the check-sum bits to change
  • RAID 5
  • Improve the RAID 4 approach by sharing the
    redundant disk workload into all n disks
  • Multiple Disk Crash
  • RAID 6 use error-correcting codes such as
    Hamming code
  • Use a combination of data and redundant disks to
    determine how many of each are required to
    prevent concurrent failures or "n" disks

55
Representing Data Elements
  • Data Elements and Fields
  • Relational Database Elements
  • Since a relation is a set of tuples, and tuples
    are similar to a record/structure in C or C, we
    may imagine that each tuple will be stored on
    disk as a record.
  • Objects
  • An object is like a tuple with its instance
    variables are attributes.
  • Data Elements
  • INTEGER
  • Usually 2 or 4 bytes long
  • FLOAT
  • Usually 4 or 8 bytes long
  • CHAR(n)
  • Fixed length denoted by n

56
Representing Data Elements - Cont.
  • VARCHAR(n)
  • Variable length with n as the maximum
  • Two ways to represent varchar
  • Length plus content
  • Null-terminated string
  • Dates and Times
  • Date is usually represented as char(n)
  • Time is represented as varchar(n) because of the
    support for fractional of seconds
  • Bits
  • Can pack 8 bits into a byte, use an 8 bits
    boundary meaning rounded into the next byte.
  • Enumerated Types
  • Using a byte to represent each item, thus can
    have 256 different values

57
Representing Data Elements - Cont.
  • Records
  • Building Fixed-Length Records
  • Can concatenate the fields to form a record
  • Be aware of the 4 and 8 bytes boundary depending
    the HW and OS therefore must organize data
    accordingly
  • Record Headers
  • Also known as the record descriptor
  • Information about the record such as length,
    timestamp, record id, record type, etc.
  • Packing Fixed-Length Records into Blocks
  • Using a block header followed by multiple records

58
Representing Data Elements - Cont.
  • Representing Block and Record Addresses
  • Client-Server Systems
  • The server's data lives in a database address
    space. The address space can refer to blocks and
    possibly to offsets within the block.
  • Physical Addresses
  • Byte strings that can determine the location
    within secondary storage system where the block
    or record can be found. Information such as
    hostname, cylinder number, track number, block
    number, and offset from the beginning of a record
    within a block.
  • Logical Addresses
  • Can be view as a flat model where all the records
    are in logical sequence in memory.
  • Use a mapping table to map logical to physical
    addresses.

59
Representing Data Elements - Cont.
  • Logical and Structured Addresses
  • Why logical addressing
  • Movement of data can be done by changing the
    logical to physical mapping table rather than
    moving the actual data itself.
  • Structured addressing
  • Using a key value and the physical address of a
    block can easily locate a record
  • Can be view as a form of "hashing"
  • Fast lookup if the each record is fixed-length
  • Offset table
  • Keeping an offset table as part of a block header
    can handle variable length record with fast
    lookup.
  • Allow easy movement of data as one of the main
    advantage of logical addressing.

60
Representing Data Elements - Cont.
  • Pointer Swizzling
  • It means to translate the embedded pointers from
    secondary (database address) to main memory
    (virtual address).
  • A pointer usually consists of a bit indicating
    whether the pointer is currently a database
    address or a (swizzled) memory address and the
    actual database or memory pointer.
  • Automatic swizzling means when we load a block,
    all its pointers and addresses are put into the
    translation table if not already existed.
  • Anthoer approach is to translate the pointer only
    when it is being used.
  • Programmer can control pointer swizzling by using
    a look-ahead logic (e.g. prefetch).

61
Representing Data Elements - Cont.
  • Pinned Records and Blocks
  • A block in memory is pinned if it cannot be
    written back to disk safely.
  • Can view this as a constraint or dependency from
    other pointers in another block.
  • Variable-Length Data and Records
  • Variable-Length Fields Record
  • Must keep the length and offsets of the variable
    length fields.
  • One simple but effective method is to put all the
    fixed-length fields in the beginning and then
    follow by the variable-length fields.
  • Repeating Fields Record
  • Use the same method as above but can move the
    repeating fixed-length fields to another block.

62
Representing Data Elements - Cont.
  • Variable-Format Records
  • In certain situation, records may not have a
    fixed schema. The fields or their order are not
    completely determined by the relation.
  • Use tagged fields to handle such cases.
  • We stored attribute or field name, the type of
    the field, and the length of the field.
  • Very similar to the SQLDA definition and the
    bind-in and bind-out operations of SQL
  • Spanned Records
  • When a record can't fit into a block and must be
    broken up into multiple blocks, it is called
    spanned.
  • Spanned records require extra header information
    to keep track of their fragments.

63
Representing Data Elements - Cont.
  • Binary Large Objects (BLOBS)
  • Can hold large audio and image files
  • Stored in a sequence of blocks but also can be
    striped across multiple disk for faster retrieval
  • Retrieval is usually done in small chunks
  • Record Modifications
  • Insertion
  • If order is not important, then just add it to
    the end of the free space within a block.
  • If order is important, then we must slide the
    data to fit the new record. In this case, the
    offset table implementation can help reduce the
    actual movement of data.
  • If the block is full, then either a) find space
    on a "nearby" block and keep the forwarding
    address or b) use a chain of overflow blocks.

64
Representing Data Elements - Cont.
  • Deletion
  • When a record is deleted, the system must reclaim
    its space.
  • If using an offset table, then it should shuffle
    the free space to a central location
  • An alternative is to keep track of a link-listed
    of free space (e.g. the freelist).
  • When the record is spanned or flowed to a
    different block (nearby or overflow), we may need
    to do a "reorganizing" of all the blocks.
  • To avoid dangling pointers, we can replace some
    of the records with tombstones (dummy record
    indicating a dead end).
  • Update
  • If the record is fixed-length, then there is no
    effect on the storage.
  • Otherwise we have the same problems as insertion
    or deletion of variable-length fields.

65
Index Structures
  • Indexes on Sequential Files
  • Usually have a data file and an index file. A
    data file is a sorted file. An index file
    contains only keys and pointers related to the
    data file.
  • Sequential Files
  • A file which contains records that were sorted by
    the keys defined by its index.
  • Dense Indexes
  • Every key from the data file is represented in
    the index.
  • The index entry is small compare to an record
    entry. Thus we may be able to keep the index
    file content in memory, rather than read from the
    index file.
  • Since the keys are sorted, we can use binary
    search to find the key (K). Search time is about
    log n (base 2).
  • Sparse Indexes
  • Hold only a subset of the dense indexes
  • Use much less space but slower search time.

66
Index Structures - Cont.
  • Multiple Levels of Index
  • If an index itself cover many blocks, then a
    binary search will still need to do many disk
    I/O's to get to the correct record. We can solve
    this problem by putting an index on the index.
  • The outer level of the index will be more sparse
    compared to the inner level.
  • Indexes With Duplicate Search Keys
  • If the index is not unique, we can have multiple
    records with the same search key.
  • An efficient approach is to have only one record
    in the dense index for each search key K. Then
    find the record within the sorted sub-list.
  • Managing Indexes During IUDs
  • In Chapter 12, recall that different methods were
    discussed to handle IUDs of fixed or variable
    length records, similar logic applies to index
    files as well.

67
Index Structures - Cont.
  • Here is a list of actions on the sequential file
    which affect the index file
  • Action Dense Sparse
  • --------------------------------------------------
    -----------
  • Create ltgt overflow block none none
  • Delete ltgt overflow block none none
  • Create ltgt sequential block none insert
  • Delete ltgt sequential block none delete
  • Insert record insert update(?)
  • Delete record delete update(?)
  • Slide record update update(?)
  • Empty overflow block has no effect because sparse
    index keep track of only the primary blocks, not
    overflow blocks.
  • IUD may or may not cause the sparse index to
    change depending whether it was the sparse key or
    if any record was slided.

68
Index Structures - Cont.
  • Secondary Index
  • This is a dense index, usually with duplicates.
  • Does not require the underlying file to be sorted
    on the search key like in primary index.
  • The keys in the secondary index are sorted.
  • The pointers in one index block can go to many
    different data blocks.
  • Clustered file structure can help manage the
    many-one relationship. An example of this is the
    number of columns or indexes within a table.
  • Indirection in Secondary Indexes
  • Using buckets in this index scheme to avoid
    duplicates in the higher level.
  • e.g. SELECT title FROM Movie WHERE studioName
    'Disney' AND year 1995
  • If studioName is the primary index and year is
    the secondary index, then the number of tuples
    which satisfy both condition will be reduced
    significantly.

69
Index Structures - Cont.
  • Document Retrieval and Inverted Indexes
  • The WWW has brought many new requirements for
    document retrieval and pattern match. This
    results in newer and better search engine as time
    passes.
  • e.g. Search all the documents which contain the
    words "database", "programming", and
    "implementation".
  • A relational view of the Doc search
  • A document may be thought of as a tuple in a
    relation. Each attribute/word can be represent
    as a bit and set to true if the Doc has at least
    one match. Use a secondary index on each of the
    attributes of Doc but only keep entries which has
    the search-key value TRUE. Instead of creating a
    separate index for each attribute, the indexes
    are combined into one, called inverted index.
  • Each inverted index will point us to the bucket
    entry where we can "join" the each list of
    pointer
Write a Comment
User Comments (0)
About PowerShow.com