Title: Database
1Database
- An introduction to database
- concepts and vocabulary
2Ubiquity of Databases
- Mauchly and Eckert designed the ENIAC to perform
calculations (shell trajectories). - After the ENIAC was built, it was used to do
thermonuclear chain reaction calculations. - But when Mauchly and Eckert went into business,
their first customer was the census bureau. - And ever since computers have played an important
role in filing and record keeping. - Suffice it to say
- Databases are very important.
- Databases are all around us.
3Mauchly Eckert and early programmers of the
ENIAC
4What do we want?
- Desired Features of our database
- Storage
- We want to store data efficiently, have it
centralized (or at least seemingly centralized). - Centralization (integration) are subject to
bottlenecks and single-point-of-failure issues. - Retrieval
- We want to have the data at our fingertips when
we want it. - Querying
- We want to ask various questions about the data
(and get answers in a timely manner). - (These desires are to some extent in conflict.)
5Automating the procedure
- We would like to have the computer perform the
tedious aspects of such tasks. - An outdated approach would be to use a file-based
system, that is, to have the data stored in
various (flat, simple text) files and write a
program that reads the files, parses the
information, does the required searching,
sorting, correlating, etc.
6Entities
- Even in the file-based approach, one must
identify units of information that will be
contained in a single file. These are known as
the entities. - An entity is somewhat similar to an object in
programming, it collects data that belongs
together in some immediate way. - Entities also separate the data into distinct
units. - Database entities often reflect real
objects/entities (persons, buildings, courses,
etc.)
7Fields
- The lower-level pieces of data gathered together
to form an entity are known as fields or
attributes or properties. - The Person entity might consist of fields like
FirstName, LastName, JobType, SocSecNum, etc. - Fields are analogous to properties of an object.
- Fields have a type (Text, Number, Yes/No, Memo,
Date/Time, etc.) which indicate how the
information is to be stored and interpreted.
8Relationship
- The various entities may be distinct, but they
are not completely disconnected. - E.g. a Customer places an order
- An association between two entities is known as a
relationship. - The Customer-places-Order relationship was
realized in Access by using the Lookup Wizard to
ensure that the two tables had a common field
(CustomerID).
9ER Diagram
- One can visualize the entities and their
relationship using an Entity-Relationship (ER)
diagram. - The entities are represented by rectangles.
- The relationships are represented by arrows
between the rectangles. - The arrow may include a verb to capture the
nature of the relationship (as well as other
notations).
10ER Diagram Example
Customer
CustomerID CustomerFirstName
Places
Order
OrderID ShippingCost
Is part of
Item
ItemID ItemDescription
11File-based Systems
- In a file-based approach, there would be a file
corresponding to each entity. - (There may be more files than entities since some
relationships are realized through their own
tables/files.) - These files must be located, read, parsed. The
data is then used to initialize some variables
and/or objects which are then analyzed (searched,
sorted, etc.) by the remainder of the program.
12Details, details
- The programmer must have information about the
data files. For example - where they are to be found
- the order in which the fields occur
- the length of the fields and/or the delimiter
used - Changing the length of a field or adding a field
may require that all of the corresponding
programs be rewritten. - Such features of the file-based approach are
called program-data dependence.
13Automating the Automation
- Since
- Reading is reading
- Parsing is parsing
- Searching is searching
- Sorting is sorting
- Why have programmers continually repeating these
tedious tasks? - Automate and/or generalize the process.
- These are some of the aspects of a database
management system (DBMS).
14Specific Info in Database
- The generalized routines for reading, parsing,
searching, sorting etc. are in the DBMS. - But information specific to a particular case
(number of fields, their type, size and so on) is
still required. This data is placed together with
the actual data in the database.
15Meta-data
- This data about the data is known as meta-data.
- Meta a prefix meaning after, along with or
beyond - The meta-data describes the actual data, and so
databases are sometimes called self-describing. - Related terms include data dictionary, system
catalog and schema.
16Meta-data Open a database file in Notepad
Some actual data
One can see theres more to this file than just
customer data.
17Layers
- The inclusion of the meta-data (the
self-describing aspect of a database) allows a
separation of the data from the processing,
providing program-data independence. - Another way to say this is that there is a
separation between the database (specific) and
the database management system (generic).
18Database/DBMS Distinction
Database Raw-data and meta-data
DBMS
User
Application
Users and applications interact with a database
only through the DBMS.
19Pros of Database Approach
- Control of data redundancy
- Data consistency
- More info from same data
- Sharing of data
- Improved data integrity
- Improved security
- Enforcement of standards
- Economy of scale
- Balancing of conflicting requirements
- Improved accessibility and responsiveness
- Improved maintenance through data independence
- Increased concurrency
- Improved backup and recovery services
20Pros
- Control of data redundancy and consistency
- If the same data is entered more than once, it is
said to be redundant. - An obvious point is that this wastes space.
- If the data is updated, it must be updated in
several places or the data will be inconsistent. - Relationships are realized through repeated data,
but one tries to use something like an ID (a
name might change but an ID does not have to). - (Redundancy reduction and query simplicity can be
at odds, sometimes one sacrifices redundancy in
order to make querying easier, e.g. in data
mining. )
21Pros (Cont.)
- More information, sharing of data and
standardization - Because databases facilitate querying, they can
yield more information. - A database approach often centralizes
(integrates) the records of different
departments, making more (raw) data and
information available to the users - Integration often leads to standardization,
consistent naming schemes, consistent report
formats, etc.
22Pros (Cont.)
- Improved data integrity
- An old computing axiom says garbage in, garbage
out (GIGO). If the raw data is bad, so too is the
resulting information. - In the database approach, one can apply
constraints to help ensure that the data is
reliable. - Accesss lookup table for the foreign keys is an
example. A foreign key is supposed to match an
entry from another table, the lookup table helps
ensure that. - We also saw that we could Enforce Referential
Integrity. - We also mentioned masks, which are another
integrity check.
23Pros (Cont.)
- Improved security
- Part of the meta-data can be used to authenticate
users who are allowed to access the data. - Different users may have different access
- Data is often not entered directly into a Table
using the DataSheet but by using Views and/or
Forms, which can hide sensitive data from certain
users.
24Pros (Cont.)
- Economy of scale
- A benefit of an organization centralizing
(integrating) its record-keeping efforts is the
money applied to individual departments is
pooled. - Not only is duplication of effort reduced or
eliminated, but so too is duplication of hardware
and software. - Balancing of conflicting requirements
- Integration can lead to a resolution or at least
a balancing of different departments, which may
have conflicting goals.
25Pros (Cont.)
- Improved data accessibility and responsiveness
and increased productivity - Because nitty-gritty details (reading, parsing,
sorting, searching, etc.) are built into the
DBMS, the database staff work at a higher level
closer to the users, responding to their
particular needs. - Again with fewer details to attend to, more work
can be accomplished.
26Pros (Cont.)
- Improved maintenance through data independence
- Change of a fields type or size or introduction
of a new field changes only the database and not
the DBMS. - This layering yields independence which
simplifies maintenance. Changing the database
does not require changing the DBMS, which was not
the case in the file-based approach.
27Pros (Cont.)
- Increased concurrency
- The DBMS can handle multi-users using and even
updating the database. - There are built-in mechanisms to prevent two
users from changing the data in conflicting ways.
- Improved backup and recovery services
- Backing up and recovering the database may be
handled by the DBMS (that is, they are
integrated services) rather than externally.
28Cons of Database Approach
- Complexity
- Size
- Cost of DBMS
- Additional hardware costs
- Cost of conversion
- Performance
- Higher impact of failure
29Cons
- Complexity and Size
- Because so many features have been integrated
into the DBMSs, they have become complicated
software packages. One must understand these
features to utilize them properly. - Integrating information from various departments
makes the database more complicated. Good design
is crucial. - Integration of features into the DBMS and data
into the database means that both may become
quite large.
30Cons (Cont.)
- Cost OF DBMSs and the hardware
- Again the size and complexity of the software
means that such packages are expensive. - The larger, more complex software requires more
powerful hardware to run on. - It also requires a knowledgeable, well-trained
(hopefully high paid) staff.
31Cons (Cont.)
- Conversion cost
- Legacy system
- Performance
- More complexity may slow down some tasks.
- Higher impact of failure.
- Integrating (centralizing) the information can
mean that everything is lost at once.
32Things in the database environment
- In addition to the data, theres
- Hardware that stores and manipulates the data
- Software to
- Interface with the hardware
- (actually the operating system which
interfaces with the BIOS which interfaces
with the hardware) - Provide the data with structure
- Interface with the user and/or applications
- People
33Hardware
- Could be
- A single PC
- A mainframe and terminals
- A network of computers
34A scenario
Database
Database Server
Network Server
Network Server
Network Server
Client
Client
Client
Client
Client
Client
35Client-Server
- The client-server model is a way for transactions
to take place. - The transaction is viewed as a service.
- The client requests the service.
- The server provides the service.
- For example, to query a networked database
- A client would request the network server(s) to
connect it to the database server - The database server queries the database
- The result is passed from database server to
network server to client. - The client-server terminology can be applied to
both software and hardware.
36Front-end and Back-end
- In large-scale client-server interaction, there
may be many intermediate client-server
interactions (e.g. the network servers become
clients of the database server). - The software and hardware near the beginning of
the transaction (initial request) is called
front-end while that near the ultimate providing
of the service is known as back-end. - In the analogy of getting a meal at a
restaurant, the waiter is front-end and the cook
is back-end.
37Software
- The bulk of the software is contained in the
database management system (DBMS). It handles
everything from storage and structure to security
and integrity. - There may also be application software that
interfaces with the DBMS. - The DBMS allows one to interface with the
database on a higher level.
38Prescriptive vs. Descriptive
- In a file-based approach, ones program is a
step-by-step procedure explicitly determining how
a question will be answered - Read this file, parse it this way, create these
objects, search them this way - This approach is sometimes called prescriptive
- Prescription originally meant a set of
instructions for preparing and/or taking a drug,
only later did the word become synonymous with
the drug itself.
39Prescriptive vs. Descriptive (Cont.)
- In the database approach, most of the
nitty-gritty, step-by-step instructions are
hidden in the DBMS and the user need only
describe the data (the meta-data, the
self-describing database) and describe what he or
she wants from the data. - This approach is sometimes called descriptive.
40Language Generations
- People talk about generations of programming
languages or the level of a language. - A first generation language (1GL) is machine
code, that is, a binary representation of
instructions (e.g. 11001101) - A second generation language (2GL) is assembly
language, that is, mnemonics for machine code
(e.g. STA 13) - A third generation language (3GL) is a high-level
language, which includes most compiled languages,
such as Fortran, C, BASIC, Java, etc. (e.g. int
a 13) - A fourth generation language (4GL) is used to
develop database applications. They are designed
to be closer to natural language.
41SQL
- SQL (Structured Query Language), pronounced S-Q-L
or See-Quel, has become the standard language for
relational databases. - SQL is part third generation and part fourth
generation.
SQL
SQL
SQL
SQL
SQL
42SQL is the sequel to SEQUEL
- The original version was called SEQUEL and was
developed at IBM in the mid-70s. - However, Oracle Corporation was the first company
to use SQL in a commercial product in 1979.
43Whats it made of?
- SQL has 3 components
- Data Definition Language (DDL)
- The part that allows you to establish the
structure of the database - Data Manipulation Language (DML)
- The part that allows you to enter data, update
data and ask questions of the data (queries) - Data Control Language (DCL)
- The part that allows you to add security features
(e.g. user authentication), concurrency
(multi-user) features, recovery features, etc.
44The People Whos involved with this database
anyway?
- Data Administrator (DA)
- Oversees data resources.
- More of a hands-off role.
- Deals with other managers.
- Sets policies.
- Handles budgets.
- Plans for future.
45Whos involved with this database anyway? (Cont.)
- Database administrator (DBA)
- More hands-on and more technical than the Data
Administrator (DA) - Oversees hardware and software design,
implementation and maintenance - Responsible for security and integrity
- Ensures users have appropriate accessibility
- Etc.
46Whos involved with this database anyway? (Cont.)
- Database Designer
- Logical database designer
- Identifies entities, fields, relationships
- Applies high-level constraints including the
business rules - E.g. A Simpsons database might have a business
rule that there must be between 10 and 30
episodes in a complete season - Physical database designer
- Actually creates tables
- Implements constraints
- Introduces security measures
- Etc.
47Whos involved with this database anyway? (Cont.)
- Application Developer
- After the overall structure of the database is
laid out and implemented, the application
developer considers the more individual needs,
such as what software does the payroll department
need - Application may involve third-generation or
fourth generation languages or a combination - E.g. a Visual Basic program could use an SQL
statement to query a database
48Whos involved with this database anyway? (Cont.)
- End-Users
- Naïve
- Has little to no database knowledge
- Uses applications that simplify interaction with
the database - Cashier scanning an items barcode
- Sophisticated
- Knows something to a lot about databases
- May use SQL to update or query database
49References
- Database Systems Rob and Coronel
- Database Systems, Connolly and Begg
- SQL for Dummies, Taylor
- http//www.metacard.com/wp1a.html
- http//www.oracle.com/glossary/index.html?axx.html
- Concepts of Database Management, Pratt and Adamski