Database - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Database

Description:

Database An introduction to database concepts and vocabulary Ubiquity of Databases Mauchly and Eckert designed the ENIAC to perform calculations (shell trajectories). – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 50
Provided by: Informati375
Category:

less

Transcript and Presenter's Notes

Title: Database


1
Database
  • An introduction to database
  • concepts and vocabulary

2
Ubiquity of Databases
  • Mauchly and Eckert designed the ENIAC to perform
    calculations (shell trajectories).
  • After the ENIAC was built, it was used to do
    thermonuclear chain reaction calculations.
  • But when Mauchly and Eckert went into business,
    their first customer was the census bureau.
  • And ever since computers have played an important
    role in filing and record keeping.
  • Suffice it to say
  • Databases are very important.
  • Databases are all around us.

3
Mauchly Eckert and early programmers of the
ENIAC
4
What do we want?
  • Desired Features of our database
  • Storage
  • We want to store data efficiently, have it
    centralized (or at least seemingly centralized).
  • Centralization (integration) are subject to
    bottlenecks and single-point-of-failure issues.
  • Retrieval
  • We want to have the data at our fingertips when
    we want it.
  • Querying
  • We want to ask various questions about the data
    (and get answers in a timely manner).
  • (These desires are to some extent in conflict.)

5
Automating the procedure
  • We would like to have the computer perform the
    tedious aspects of such tasks.
  • An outdated approach would be to use a file-based
    system, that is, to have the data stored in
    various (flat, simple text) files and write a
    program that reads the files, parses the
    information, does the required searching,
    sorting, correlating, etc.

6
Entities
  • Even in the file-based approach, one must
    identify units of information that will be
    contained in a single file. These are known as
    the entities.
  • An entity is somewhat similar to an object in
    programming, it collects data that belongs
    together in some immediate way.
  • Entities also separate the data into distinct
    units.
  • Database entities often reflect real
    objects/entities (persons, buildings, courses,
    etc.)

7
Fields
  • The lower-level pieces of data gathered together
    to form an entity are known as fields or
    attributes or properties.
  • The Person entity might consist of fields like
    FirstName, LastName, JobType, SocSecNum, etc.
  • Fields are analogous to properties of an object.
  • Fields have a type (Text, Number, Yes/No, Memo,
    Date/Time, etc.) which indicate how the
    information is to be stored and interpreted.

8
Relationship
  • The various entities may be distinct, but they
    are not completely disconnected.
  • E.g. a Customer places an order
  • An association between two entities is known as a
    relationship.
  • The Customer-places-Order relationship was
    realized in Access by using the Lookup Wizard to
    ensure that the two tables had a common field
    (CustomerID).

9
ER Diagram
  • One can visualize the entities and their
    relationship using an Entity-Relationship (ER)
    diagram.
  • The entities are represented by rectangles.
  • The relationships are represented by arrows
    between the rectangles.
  • The arrow may include a verb to capture the
    nature of the relationship (as well as other
    notations).

10
ER Diagram Example
Customer
CustomerID CustomerFirstName
Places
Order
OrderID ShippingCost
Is part of
Item
ItemID ItemDescription
11
File-based Systems
  • In a file-based approach, there would be a file
    corresponding to each entity.
  • (There may be more files than entities since some
    relationships are realized through their own
    tables/files.)
  • These files must be located, read, parsed. The
    data is then used to initialize some variables
    and/or objects which are then analyzed (searched,
    sorted, etc.) by the remainder of the program.

12
Details, details
  • The programmer must have information about the
    data files. For example
  • where they are to be found
  • the order in which the fields occur
  • the length of the fields and/or the delimiter
    used
  • Changing the length of a field or adding a field
    may require that all of the corresponding
    programs be rewritten.
  • Such features of the file-based approach are
    called program-data dependence.

13
Automating the Automation
  • Since
  • Reading is reading
  • Parsing is parsing
  • Searching is searching
  • Sorting is sorting
  • Why have programmers continually repeating these
    tedious tasks?
  • Automate and/or generalize the process.
  • These are some of the aspects of a database
    management system (DBMS).

14
Specific Info in Database
  • The generalized routines for reading, parsing,
    searching, sorting etc. are in the DBMS.
  • But information specific to a particular case
    (number of fields, their type, size and so on) is
    still required. This data is placed together with
    the actual data in the database.

15
Meta-data
  • This data about the data is known as meta-data.
  • Meta a prefix meaning after, along with or
    beyond
  • The meta-data describes the actual data, and so
    databases are sometimes called self-describing.
  • Related terms include data dictionary, system
    catalog and schema.

16
Meta-data Open a database file in Notepad
Some actual data
One can see theres more to this file than just
customer data.
17
Layers
  • The inclusion of the meta-data (the
    self-describing aspect of a database) allows a
    separation of the data from the processing,
    providing program-data independence.
  • Another way to say this is that there is a
    separation between the database (specific) and
    the database management system (generic).

18
Database/DBMS Distinction
Database Raw-data and meta-data
DBMS
User
Application
Users and applications interact with a database
only through the DBMS.
19
Pros of Database Approach
  • Control of data redundancy
  • Data consistency
  • More info from same data
  • Sharing of data
  • Improved data integrity
  • Improved security
  • Enforcement of standards
  • Economy of scale
  • Balancing of conflicting requirements
  • Improved accessibility and responsiveness
  • Improved maintenance through data independence
  • Increased concurrency
  • Improved backup and recovery services

20
Pros
  • Control of data redundancy and consistency
  • If the same data is entered more than once, it is
    said to be redundant.
  • An obvious point is that this wastes space.
  • If the data is updated, it must be updated in
    several places or the data will be inconsistent.
  • Relationships are realized through repeated data,
    but one tries to use something like an ID (a
    name might change but an ID does not have to).
  • (Redundancy reduction and query simplicity can be
    at odds, sometimes one sacrifices redundancy in
    order to make querying easier, e.g. in data
    mining. )

21
Pros (Cont.)
  • More information, sharing of data and
    standardization
  • Because databases facilitate querying, they can
    yield more information.
  • A database approach often centralizes
    (integrates) the records of different
    departments, making more (raw) data and
    information available to the users
  • Integration often leads to standardization,
    consistent naming schemes, consistent report
    formats, etc.

22
Pros (Cont.)
  • Improved data integrity
  • An old computing axiom says garbage in, garbage
    out (GIGO). If the raw data is bad, so too is the
    resulting information.
  • In the database approach, one can apply
    constraints to help ensure that the data is
    reliable.
  • Accesss lookup table for the foreign keys is an
    example. A foreign key is supposed to match an
    entry from another table, the lookup table helps
    ensure that.
  • We also saw that we could Enforce Referential
    Integrity.
  • We also mentioned masks, which are another
    integrity check.

23
Pros (Cont.)
  • Improved security
  • Part of the meta-data can be used to authenticate
    users who are allowed to access the data.
  • Different users may have different access
  • Data is often not entered directly into a Table
    using the DataSheet but by using Views and/or
    Forms, which can hide sensitive data from certain
    users.

24
Pros (Cont.)
  • Economy of scale
  • A benefit of an organization centralizing
    (integrating) its record-keeping efforts is the
    money applied to individual departments is
    pooled.
  • Not only is duplication of effort reduced or
    eliminated, but so too is duplication of hardware
    and software.
  • Balancing of conflicting requirements
  • Integration can lead to a resolution or at least
    a balancing of different departments, which may
    have conflicting goals.

25
Pros (Cont.)
  • Improved data accessibility and responsiveness
    and increased productivity
  • Because nitty-gritty details (reading, parsing,
    sorting, searching, etc.) are built into the
    DBMS, the database staff work at a higher level
    closer to the users, responding to their
    particular needs.
  • Again with fewer details to attend to, more work
    can be accomplished.

26
Pros (Cont.)
  • Improved maintenance through data independence
  • Change of a fields type or size or introduction
    of a new field changes only the database and not
    the DBMS.
  • This layering yields independence which
    simplifies maintenance. Changing the database
    does not require changing the DBMS, which was not
    the case in the file-based approach.

27
Pros (Cont.)
  • Increased concurrency
  • The DBMS can handle multi-users using and even
    updating the database.
  • There are built-in mechanisms to prevent two
    users from changing the data in conflicting ways.
  • Improved backup and recovery services
  • Backing up and recovering the database may be
    handled by the DBMS (that is, they are
    integrated services) rather than externally.

28
Cons of Database Approach
  • Complexity
  • Size
  • Cost of DBMS
  • Additional hardware costs
  • Cost of conversion
  • Performance
  • Higher impact of failure

29
Cons
  • Complexity and Size
  • Because so many features have been integrated
    into the DBMSs, they have become complicated
    software packages. One must understand these
    features to utilize them properly.
  • Integrating information from various departments
    makes the database more complicated. Good design
    is crucial.
  • Integration of features into the DBMS and data
    into the database means that both may become
    quite large.

30
Cons (Cont.)
  • Cost OF DBMSs and the hardware
  • Again the size and complexity of the software
    means that such packages are expensive.
  • The larger, more complex software requires more
    powerful hardware to run on.
  • It also requires a knowledgeable, well-trained
    (hopefully high paid) staff.

31
Cons (Cont.)
  • Conversion cost
  • Legacy system
  • Performance
  • More complexity may slow down some tasks.
  • Higher impact of failure.
  • Integrating (centralizing) the information can
    mean that everything is lost at once.

32
Things in the database environment
  • In addition to the data, theres
  • Hardware that stores and manipulates the data
  • Software to
  • Interface with the hardware
  • (actually the operating system which
    interfaces with the BIOS which interfaces
    with the hardware)
  • Provide the data with structure
  • Interface with the user and/or applications
  • People

33
Hardware
  • Could be
  • A single PC
  • A mainframe and terminals
  • A network of computers

34
A scenario
Database
Database Server
Network Server
Network Server
Network Server
Client
Client
Client
Client
Client
Client
35
Client-Server
  • The client-server model is a way for transactions
    to take place.
  • The transaction is viewed as a service.
  • The client requests the service.
  • The server provides the service.
  • For example, to query a networked database
  • A client would request the network server(s) to
    connect it to the database server
  • The database server queries the database
  • The result is passed from database server to
    network server to client.
  • The client-server terminology can be applied to
    both software and hardware.

36
Front-end and Back-end
  • In large-scale client-server interaction, there
    may be many intermediate client-server
    interactions (e.g. the network servers become
    clients of the database server).
  • The software and hardware near the beginning of
    the transaction (initial request) is called
    front-end while that near the ultimate providing
    of the service is known as back-end.
  • In the analogy of getting a meal at a
    restaurant, the waiter is front-end and the cook
    is back-end.

37
Software
  • The bulk of the software is contained in the
    database management system (DBMS). It handles
    everything from storage and structure to security
    and integrity.
  • There may also be application software that
    interfaces with the DBMS.
  • The DBMS allows one to interface with the
    database on a higher level.

38
Prescriptive vs. Descriptive
  • In a file-based approach, ones program is a
    step-by-step procedure explicitly determining how
    a question will be answered
  • Read this file, parse it this way, create these
    objects, search them this way
  • This approach is sometimes called prescriptive
  • Prescription originally meant a set of
    instructions for preparing and/or taking a drug,
    only later did the word become synonymous with
    the drug itself.

39
Prescriptive vs. Descriptive (Cont.)
  • In the database approach, most of the
    nitty-gritty, step-by-step instructions are
    hidden in the DBMS and the user need only
    describe the data (the meta-data, the
    self-describing database) and describe what he or
    she wants from the data.
  • This approach is sometimes called descriptive.

40
Language Generations
  • People talk about generations of programming
    languages or the level of a language.
  • A first generation language (1GL) is machine
    code, that is, a binary representation of
    instructions (e.g. 11001101)
  • A second generation language (2GL) is assembly
    language, that is, mnemonics for machine code
    (e.g. STA 13)
  • A third generation language (3GL) is a high-level
    language, which includes most compiled languages,
    such as Fortran, C, BASIC, Java, etc. (e.g. int
    a 13)
  • A fourth generation language (4GL) is used to
    develop database applications. They are designed
    to be closer to natural language.

41
SQL
  • SQL (Structured Query Language), pronounced S-Q-L
    or See-Quel, has become the standard language for
    relational databases.
  • SQL is part third generation and part fourth
    generation.

SQL
SQL
SQL
SQL
SQL
42
SQL is the sequel to SEQUEL
  • The original version was called SEQUEL and was
    developed at IBM in the mid-70s.
  • However, Oracle Corporation was the first company
    to use SQL in a commercial product in 1979.

43
Whats it made of?
  • SQL has 3 components
  • Data Definition Language (DDL)
  • The part that allows you to establish the
    structure of the database
  • Data Manipulation Language (DML)
  • The part that allows you to enter data, update
    data and ask questions of the data (queries)
  • Data Control Language (DCL)
  • The part that allows you to add security features
    (e.g. user authentication), concurrency
    (multi-user) features, recovery features, etc.

44
The People Whos involved with this database
anyway?
  • Data Administrator (DA)
  • Oversees data resources.
  • More of a hands-off role.
  • Deals with other managers.
  • Sets policies.
  • Handles budgets.
  • Plans for future.

45
Whos involved with this database anyway? (Cont.)
  • Database administrator (DBA)
  • More hands-on and more technical than the Data
    Administrator (DA)
  • Oversees hardware and software design,
    implementation and maintenance
  • Responsible for security and integrity
  • Ensures users have appropriate accessibility
  • Etc.

46
Whos involved with this database anyway? (Cont.)
  • Database Designer
  • Logical database designer
  • Identifies entities, fields, relationships
  • Applies high-level constraints including the
    business rules
  • E.g. A Simpsons database might have a business
    rule that there must be between 10 and 30
    episodes in a complete season
  • Physical database designer
  • Actually creates tables
  • Implements constraints
  • Introduces security measures
  • Etc.

47
Whos involved with this database anyway? (Cont.)
  • Application Developer
  • After the overall structure of the database is
    laid out and implemented, the application
    developer considers the more individual needs,
    such as what software does the payroll department
    need
  • Application may involve third-generation or
    fourth generation languages or a combination
  • E.g. a Visual Basic program could use an SQL
    statement to query a database

48
Whos involved with this database anyway? (Cont.)
  • End-Users
  • Naïve
  • Has little to no database knowledge
  • Uses applications that simplify interaction with
    the database
  • Cashier scanning an items barcode
  • Sophisticated
  • Knows something to a lot about databases
  • May use SQL to update or query database

49
References
  • Database Systems Rob and Coronel
  • Database Systems, Connolly and Begg
  • SQL for Dummies, Taylor
  • http//www.metacard.com/wp1a.html
  • http//www.oracle.com/glossary/index.html?axx.html
  • Concepts of Database Management, Pratt and Adamski
Write a Comment
User Comments (0)
About PowerShow.com