What is a Database - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

What is a Database

Description:

'A set of associations between one or more entity type.' ( Connolly & Begg, 2002) Examples: ... ISBN, PIN, Drivers Licence Number, Course Code... Foreign Keys ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 51
Provided by: uobcommun
Category:
Tags: database | keys

less

Transcript and Presenter's Notes

Title: What is a Database


1
What is a Database?
  • Definition A database is a structure that can
    store information about multiple types of
    entities, the attributes of those entities, and
    the relationships among the entities. (Pratt
    Adamski, 2003)
  • Collection of valuable data that exists over a
    period of time
  • Relationships exist between the data
  • Structured
  • Managed by a database management system (DBMS)

2
DBMS
  • DBMS Database Management System
  • Program that enable the creation and maintenance
    of databases
  • Many different functions including control of
    access and design tools
  • May contain more than one databases
  • Used to provide decision support and transaction
    processing

3
Advantages of a DBMS
  • Reduction of redundancies and errors within the
    data
  • Easier to extract information
  • The system is more flexible
  • Security is improved
  • Integrity can be improved
  • Data model needs to be produced

4
Disadvantages of a DBMS
  • Takes longer to access the data
  • Need special knowledge
  • Applications may be dependant to specific DBMS
    versions
  • Initial cost may be large.

5
Characteristics
  • A good DBMS has the following
  • Data integrity
  • Security
  • Concurrency control
  • Transaction processing
  • Recovery functions

6
Database Lifecycle
7
Database Life Cycle
  • System Definition
  • Identify the requirements that a system must meet
    to satisfy the needs of end users, including
    input, processing, storage, output, control, etc.
  • Database Design
  • Conceptual design
  • Analyse overall data requirements needed for
    business function
  • Develop preliminary conceptual data model (ER)
  • Logical DB design
  • Transform the conceptual data model into
    relational model
  • Map conceptual design to database schema
  • Identify data integrity and security requirements
  • Populate repository
  • Physical DB design
  • Performance issues decide on physical
    organisation of data, DBMS, external programs,
    hardware, operating system, data communication
    network, etc

8
Database Life Cycle
  • Implementation and Loading
  • Code database processing programs
  • Convert data from prior systems
  • Install database and import data
  • Complete database training materials and
    documentation
  • Testing and Evaluation
  • A set of activities conducted with the intent of
    finding errors in software
  • Identify the correctness, completeness and
    quality of developed programs
  • Maintenance and Evolution
  • Analyse database and DB application to ensure
    that user and system requirements are met
  • Audit security and disaster recovery procedures
  • Fix errors in DB and DB applications

9
Data Modelling
  • Important step in the development of the DB
  • Good model
  • Completeness, accuracy
  • No redundancy
  • Data reusability
  • Stability, flexibility, simplicity
  • Communication effectiveness

10
Entity-Relationship Model
  • E-R model
  • Visual representation of the structure

Relationship
11
ER Terminology
  • Entity
  • Real work class of facts
  • Named using singular words
  • Definitions used to clarify
  • Relationship
  • A set of associations between one or more entity
    type. (Connolly Begg, 2002)
  • Examples
  • A Staff member works at a Branch
  • A Branch has Staff members

works at
has
12
ER Terminology
  • Cardinality
  • One-to-One
  • One-to-Many
  • Many-to-Many
  • Participation
  • Mandatory or optional
  • A customer may not have an invoice
  • An invoice has to be associated with a customer
  • Represented by circles and bars

13
Example Relationship
14
Attribute
  • State of an entity
  • Correspond to columns in a table
  • Book
  • ISBN
  • Author
  • Name
  • Publisher

15
Primary Key
  • Attribute(s) that are distinct for each entry
  • Primary key is UNIQUE
  • Examples Student Number, Book ISBN, PIN,
    Drivers Licence Number, Course Code...

16
Foreign Keys
  • A foreign key attribute set is a primary key
    attribute in one table that is repeated as a
    foreign key in another table
  • A non-key attribute in one relation that also
    appears as a primary key in another relation

17
What is Normalization?
  • Reorganisation and evaluation of table structures
  • Produces a set of stable, well structured tables.
  • Remove repeating/redundant data

18
Why Normalize?
  • Remove redundant data
  • Reduce update anomalies
  • Reduce delete anomalies
  • Reduce insert anomalies

19
Normal Forms
  • UNF a table that contains one or more repeating
    groups
  • 1NF A relation is in 1NF is all value are
    atomic and no rows are repeated.
  • 2NF A relation is said to be in 2NF if it is in
    1NF and every non-key attribute is fully
    functionally dependent on the primary key
  • 3NF A relation is said to be in 3NF if it is in
    2NF and every non-key is non-transitively
    dependent on the primary key
  • All attributes in every table must be determined
    by the key, the whole key, and nothing but the
    key

20
Un-normalized Data
  • Grade-Report
  • sNumber sName sAddress uNumber
    uTitle iName iRoom sMark
  • 38214 Bright 123 Smith St IS350
    Database CODD B104 A
  • 38214 Bright 123 Smith St IS465 Sys
    Anal KEMP B213 C
  • 69173 Smith 35 Main Rd IS465 Sys
    Anal KEMP B213 A
  • 69173 Smith 35 Main Rd PM300 Proj
    Mgt LEWIS D317 B
  • 69173 Smith 35 Main Rd QM440 OpSys KEMP B213
    A
  • What entities are present?
  • What fields do they have?
  • Can the table be decomposed further? State the
    reason.
  • Decompose the table into 3NF

21
SQL (Structured Query Language)
  • Developed by IBM in late 70s for DB2
  • SEQUEL
  • Used to
  • Build
  • Alter
  • Query

22
Data Manipulation Language
  • Used to retrieve, add, modify and delete
  • Commands
  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • ANSI vs Various DBMS implementations

23
SELECT STATEMENT
  • SELECT column(s)
  • FROM table(s)
  • WHERE IS, NOT, NULL, , ltgt, gt , lt, IN, NOT IN,
    BEETWEEN, LIKE
  • GROUP BY column(s)
  • HAVING COUNT
  • ORDER BY column(s) ASC, DESC

24
SQL Functions
  • Five functions
  • AVG
  • MAX
  • MIN
  • SUM
  • COUNT
  • MS-Access functions
  • DATE()
  • LCASE/UCASE
  • MID
  • LTRIM/RTRIM
  • Etc

25
Database Development Activities
  • Six main phases
  • Enterprise modelling
  • Conceptual data modelling
  • Logical database design
  • Physical database design and creation
  • Database implementation
  • Database maintenance

26
Data and Database Administration
  • Two factors are driving the changes in the data
    administration and DB administration
  • the availability of more technologies and
    platforms that must be managed concurrently
  • The increased pace of business changes

27
Main Activities Performed by DBA
  • Physical data modelling and meta-data management
  • Space planning, creation and upkeeping
  • Database object maintenance and security
  • Application design and execution problem
    identification
  • Backing up, recovering, loading, and moving data
  • Pro-actively monitoring the database and taking
    corrective actions

28
Growth Change
  • What happens when the database grows?
  • Analyse space utilization and performance
  • Additional space allocated
  • Change in structure
  • Check if already available (where?)
  • Y use current data
  • N add new data-types or relationships
  • Change in usage
  • Move to faster devices
  • Change placement (clustering)
  • Change contents

29
Six Types of Information Systems
  • Transaction Processing Systems (TPS)
  • Management Information Systems (MIS)
  • Decision Support Systems (DSS)
  • Executive Information Systems (EIS)
  • Expert Systems (ES)
  • Geographical Information Systems (GIS)

30
Organisations
  • Organisation classification
  • Machine Bureaucracy
  • Divisionalised Bureaucracy
  • Entrepreneurial structure
  • Professional
  • Ad hoc racy

31
Distributed DBMS
  • Distribute Databases
  • DDB is a single logical database that is
    physically divided among computers at several
    sites on a network.
  • Distribute Database Management Systems
  • DDBMS governs the storage and processing of
    logically related data over interconnected
    computer systems in which both data and
    processing functions are distributed among
    several sites.

32
Distributed DBMS Characteristics
  • A collection of logically related shared data
  • Data is split into a number of fragments
  • Fragments may be replicated
  • Fragments/replicates are allocated to sites
  • The sites are linked by a communication network
  • The data at each site is under the control of a
    DBMS
  • The DBMS at each site can handle local
    applications
  • Each DBMS participates in at least one global
    application

33
Distributed Processing Vs Distributed Databases
  • Distributed processing
  • The ability to have several computers working
    together in a network, where each processor runs
    different activities
  • E.g. A centralized database that can be accessed
    over a computer network
  • Distributed processing does not require
    distributed database
  • Distributed database requires distributed
    processing
  • Both require network to connect all components

34
Distributed Database Design
  • Data partitioning
  • Horizontal partitioning
  • Division of relation into subsets of tuples
  • Each stored in different node
  • Represents a SELECT statement with a WHERE clause
  • Vertical partitioning
  • Division of relation into attribute subsets
  • Each located at different node
  • Equivalent of a PROJECT statement
  • Mixed
  • Combination of horizontal and vertical
  • Table divided horizontally, each having subset of
    attributes

35
Distributed Database Design
  • Data replication
  • Storage of data copies at multiple sites served
    by a computer network
  • Replication conditions
  • Fully replicated multiple copies of all
    partitions at multiple sites
  • Partially replicated multiple copies of some
    partitions at multiple sites
  • Factors for data replication decision
  • Database size
  • Usage frequency
  • costs

36
Distributed Database Design
  • Location transparency
  • One of the major characteristics of DDBMS, which
    states that users do not need to be aware of the
    location of data in a distributed databases
  • Replication transparency
  • One of the major characteristics of DDBMS. The
    steps taken by the DDBMS to update the various
    copies of data should be done behind the scenes
    users should be unaware of the steps
  • Fragmentation transparency
  • Users should not be aware of the fragmentation.
    If users are unaware of fragmentation, the DDBMS
    has fragmentation transparency

37
Transaction processing terms
  • A transaction is an atomic unit of work. i.e.
    changes to the database that must be done
    together.
  • For example, in discharging a patient from a
    hospital the system needs to update tables to
    indicate his bed is free and needs to update
    tables to store his bill. Doing one without the
    other would be incomplete.

38
Transaction processing terms
  • A program that changes data in tables is called
    transaction.
  • Single user vs Multi user. DBMS may allow more
    than one user to update data stored in the DB.
    However users may update the same data at the
    same time i.e. Concurrently
  • Concurrency control is a term that describes
    procedures used to ensure correct interleaving of
    transactions

39
Transaction processing
  • Begin transaction
  • Read, Write
  • End transaction
  • Commit transaction
  • Rollback
  • Redo
  • Undo

40
Concurrency Control
  • A program that changes data in tables (as an
    atomic unit of work) is called transaction
  • Concurrency control (CC) controls flow in a
    multi-user database
  • CC is managing and maintaining data integrity
  • CC is related to transactions

41
Locks
  • Two types of locks
  • Read / Shared others allowed to read locked
    data but cannot update. Placing a shared lock
    prevents another user for placing exclusive lock
    on that record set
  • Write / Exclusive prohibits other transaction
    from accessing data (reading). A transaction
    should put a write lock on record set when is
    about to update that record set. Placing write
    lock prevents users from placing any type of lock
    on that record set
  • LOCK TABLE statement
  • LOCK TABLE tableName IN SHARE/EXCLUSIVE MODE

42
Deadlocks
  • Also referred to as Deadly Embrace
  • Locking introduces this problem
  • Example
  • Application A puts read lock on table X
  • Application B puts read lock on table X
  • A requests write lock to X (wait)
  • B requests write lock to X (wait)

43
Deadlock
  • Results when two or more transactions have locked
    a common resource, and each waits for the other
    to unlock that resource.
  • Unless the DBMS intervenes (by maintaining a
    matrix of usage), the transactions will wait
    indefinitely.

44
Decision Support Systems (DSS)
  • Methodology designed to extract information from
    data accumulated over time
  • Arrangement of computerized tools to assist
    decision making
  • Used at all levels tailored to focus on
    specific areas or problems
  • Provides ability to conduct ad hoc queries

45
Data Warehouse
  • Data Warehouse (DW) is a broad based, shared
    database for management decision making that
    contains data that has been accumulated over time
  • Data must be high quality, aggregated, often
    denormalised and is not necessarily absolutely
    current
  • The Data Warehouse is an integrated,
    subject-oriented, time-variant, non-volatile
    database that provides support for
    decision-making.

46
On-Line Analytical Processing (OLAP)
  • Decision support methodology based on viewing
    data in multiple dimensions
  • OLAP is well suited for querying and multi time
    period trend analysis
  • OLAP concepts
  • Drill down (yearly-gtmonthly-gtweekly)
  • Slice (subset of the data e.g. products-gtshoes)
  • Rotate (interchange the dimensions)

47
Data Mining
  • DSS tools reactive
  • Data-mining proactive
  • Automatically search for anomalies and possible
    relationships

48
03P3 Final Exam
  • Theory test
  • When
  • Tuesday, 22 November 2005
  • Reading commences 920am
  • Writing Commences 930am
  • Duration 2 hours
  • Where Exam Hall
  • Exam timetable
  • http//www.ballarat.edu.au/aasp/student/admin/he/f
    inal_exam_timetable_210905.pdf

49
03P3 Final Exam
  • Question 1- Normalization
  • Functional dependency
  • 1NF, 2NF and 3NF
  • Decompose tables to normalized form
  • Maintain relationship between tables
  • Question 2 - ER design
  • Entities
  • Relationships (resolve many-to-many)
  • Attributes
  • Cardinalities
  • keys

50
03P3 Final Exam
  • Question 3 - Short answered questions
  • Important points
  • Short answers
  • Read question carefully
  • Question 4 SQL
  • Select
  • Insert
  • Delete
  • Update
  • Create
  • Etc
Write a Comment
User Comments (0)
About PowerShow.com