Title: Distributed Databases
1 Chapter 12
Database Administration
(With Modifications)
2Once upon a time, the typical IS Organization
appeared as
CEO
Why ???
- Information Systems were applied where they were
most needed
- Other standardized, routine applications
3As information became used for more purposes and
across more functions, the IS Organization
changed
CEO
- Information Systems were applied everywhere
- Information Systems were recognized as an
Organizational Resource
4Basic Definitions
- A high-level function that is responsible for the
overall management of data resources in an
organization
- Database Administrator (DBA)
- A technical function that is responsible for the
physical database design and such issues as
security enforcement and database performance
- A administrative function that is responsible for
assuring that organizational applications meet
the enterprise goals
5Data Administration Functions
- Explicit statement of goals, objectives, and
targets
- Goal To Support Cost-Effective Use of the
computer environment
- Objective To improve sharing of information
across organizational units
- Target Linking of all departmental databases
within 2 years
- Written Statement of actions to be taken for a
certain activity
- In the event of a database failure, the DBA will
-
- Explicit statement of conventions to be followed
in data usage
- All table names will be prefaced by their
physical location
- All fields containing age, weight, . Will
contain the data type short
6Data Administration Functions
- Development of the Organizations IT Strategy
- Must correspond to the Organizations Business
Strategy
- E.g., Consider the Difference between UTEP and
Harvard
- Development of the enterprise model
- Top-Down versus Bottom-Up Viewpoint
- Development of cost/benefit model
- Targets must be measurable
- Design of the database environment
- Centralized, distributed, Decentralized?? How??
- Develop the data administration plan
- A lower-level plan for database implementation,
maintenance and growth
7Data Administration Functions
- Define and model data requirements
- Define and model business rules
- Define operational requirements
- Maintain corporate data dictionary
- The department, the business subunit, the
corporation?
- Procedures MUST be established in advance
8Data Administration Functions
- Information Systems are political entities
- The DA must sell their arguments
- Recall the Systems Trinity
- The Manager The person in charge of the
functional department
- The System Developer The person developing the
system
- The User The person who will use the system
- Lack of Top management support
- It is the DAs job to make sure that ALL
stakeholders are happy
9Data Administration Functions
- Managing the Data Repository
- Used by the DA to manage the information-processin
g environment
- Contain metadata that describes the
organizations data and data processing resources
- Replacing Data Dictionaries (simple data-element
documentation tools)
- Provides information about
- What users must know what
- What automated CASE tools that are used to
specify and develop information systems
- All Applications that access and manipulate data
- DBMS that maintain the repository and update
system privileges, passwords, and other
information
10Database Administration Functions
- Selection of Hardware and Software
- Difficult to keep abreast of current technology
- Difficult to predict future changes
- Emphasis on established off-the-shelf products
- Managing Data Security and Privacy
- Establishment of user privileges
- Complicated by use of distributed systems
- Maintaining data relationships
11Database Administration Functions
- We must assume that a database will eventually
fail
- Establishment of procedures
- How often should the data be back-up?
- What data should be backed-up more frequently?
- Who is responsible for the back-ups?
- Application of proven strategies for
reinstallation of database after crash
12Shared Administration Activities
Logical Design
- External Model Design (Subschemas)
- Physical Design/Construction
- Design Integrity Controls
- Establish Security Controls
- Supervise Database Loading
- Develop Programming Standards
- Establish Back-up/Recovery Procedures
13Shared Administration Activities
- Operations and maintenance
- Monitor database performance
- Tune and reorganize databases as needed
- Enforce standards and procedures
Support Users
- Implement Change-Control Procedures
- Plan for growth and change
- Evaluate new technologies
14Data Warehouse Administration
- New function due to increased use of data
warehousing
- (Massively) Integrated decision support databases
from various sources
- Emphasis on integration and coordination of data
and metadata from multiple databases
1. Support decision-oriented applications
2. Manage data warehouse (exponential) growth
2. Establish service level agreements
15Data Dictionaries and Repositories
- Documents data and metadata elements of a database
- System-generated database that describes all
database objects
- Stores metadata describing data and data
processing resources
- Information Repository Dictionary System (IRDS)
- A software tool managing and controlling access
to the Information Repository
16Data Dictionaries and Repositories
- Components of the repository system architecture
A schema of the repository information
Software that manages the repository objects
Where repository objects are stored
17Database performance tuning
- Setting installation parameters
- Choosing background processes
- Deciding who gets what and when
- How to distribute heavily accessed files
- Modification of SQL code in applications
18Database Security
- Protection of data against accidental or
intentional loss, destruction, or misuse
- Increased difficulty due to internet access and
client-server technologies
19Threats to Data Security
- Establishment of firewalls
- Be careful of disgruntled employees
- Loss of Privacy (Personal data)
- Loss of Confidentiality (Corporate data)
20Threats to Data Security
- Data may be compromised due to database crashes
- Improper recovery can be costly
- Through Sabotage/Data Misplacement
21Managing Data Security
- Minimization of user data entry
- Only certain values can be entered
- Echoing of input to user for confirmation
22Managing Data Security
- Views are not only useful, but can also restrict
user access to data
- Recall our Physician/Patient Database View
CREATE VIEW drugs_given AS SELECT physname,
patient.name, illness.name, prescription.drugcod
e FROM physician, patient, treatment, illness,
prescription WHERE physician.physid
patient.physid AND patient.patid
treatment.patid AND treatment.illcode
illness.illcode AND treatment.drugcode
prescription.drugcode ORDER BY physname
- The user might be restricted from using the view
- The user might be restricted from seeing the
views code
(And hence seeing the physical relationships)
23Managing Data Security
24Managing Data Security
- The Conceptual Model
- Only the datasets with common attributes and
their statistics are made available
- No data manipulation language is allowed to merge
and intersect populations
2. Query Restriction
- Query-set Size controls (large only)
- Number of over-lapping entities among successive
queries
- Clustering individual entities in mutually
exclusive subsets
25Managing Data Security
3. Output Perturbation
- Queries made on actual data
- Output perturbed so that statistical
characteristics remain but individual data is
non-sensical
4. Data Perturbation
- The entire database is first perturbed
- All statistical relations are maintained in the
perturbed dataset
- User allowed to make all queries on the perturbed
data set (individual data entities show no
relationship to the real data)
26Managing Data Security
- Problem Passwords are flawed
- Sometimes easy to determine
- User write them down and they get copied
- Automatic logon scripts make it unnecessary to
enter them manually
- Unencrypted passwords travel the internet
- Goal Verify User Identity
27Managing Data Security
- Randomly Assigned Passwords
28Managing Data Security
- Encryption (The Second Oldest Profession)
- The earliest recorded use of cryptography is 1900
BC in Egypt.
- The scribes who sketched the hieroglyphs telling
the story of the life of Khnumhotep II in the
town of Menet Khufu used a substitution cipher to
encrypt the names and titles of individuals in
the story.
29Managing Data Security
- Encryption (The Second Oldest Profession)
- The Original symbols are substituted for other
symbols
- Plain Text ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher Text XYZABCDEFGHIJKLMNOPQRSTUVW
30Managing Data Security
- Pretty Good Privacy (PGP)
- Should the Government have the right to a Master
Key?
- Target of 3-year investigation that he violated
export laws
31Database Recovery
- Mechanisms for restoring a database quickly and
accurately after loss of damage
- Recovery Facilities/Components
1. Back-up Facilities
- Periodic back-up copies of the entire database
2. Journalizing Facilities
- To maintain audit trails of transactions and logs
of database changes
3. Checkpoint Facilities
- When the DBMS temporarily halts all activities
and synchronizes all files and journals
4. Recovery Manager
- A DBMS component that restores the database to a
correct condition and restarts processing
activities
32Database Recovery
Backup Facility Automatic periodic duplication
of entire Database
- Before and after images of records that have been
changed
DBMS
Journalizing Facility Logging of Transactions
and Database Changes
- Logging of every transaction along with
timestamps
33Database Recovery
- Periodic/On Demand Facilities
Checkpoint Facility The processing is stopped
and database synchronized
Recovery Manager Upon crash, the database is
rebuilt using the Database backup, DB Change log,
and Transaction Log
34Database Recovery
- How long between backup (hourly, daily, weekly)
is a policy determined by the DA
- Frequent back-ups increase reliability BUT each
takes some time
- Back-ups should be stored off-site
- Database shut down during back-up
- More secure BUT transactions delayed
- Selected portion of database is shut down during
back-up
- Not as disruptive BUT more complicated
35Database Recovery
- Every transaction is stored to the transaction
log as well as the database
- Record of essential data for each transaction
processed against the database
- Before-Images of records (before transaction)
- After-Images of records (After modification)
36Database Recovery
(Recap)
DBMS
Transaction
Effect of transaction added to current database
Copy of record affected by transaction stored
Copy of transaction stored
(In case of database failure)
37Database Recovery
- At some specified point in time (by the DA) the
DBMS refuses all transactions
(The system is in a Quiet state)
- The database and the transaction logs are
synchronized
DBMS
Transaction
38Database Recovery
- Module of DBMS that restores the database to a
correct position when a failure occurs
Why do databases Fail?
- The transaction terminates abnormally due to
human error, input of invalid data, loss of
transmission, hardware failure, deadlock, etc.
- Incorrect, but valid, data entered
- E.g., incorrect account number, customer payment
- E.g., Power loss, operator error, systems
software failure
- The database is NOT damaged
- The database is lost, destroyed, or can not be
read
- Often due to disk failure
39Database Recovery
- Recovery and Restart Procedures
- 2 mirror-image databases maintained
- All transactions stored/updated in both databases
- Upon failure, the database is switched for the
mirror image
- Generally stored across distributed databases
- Does not protect against power failures or
catastrophes
40Database Recovery
- Recovery and Restart Procedures
- The previous transactions are reprocessed (up to
the point of the failure) against the backup copy
of the database
- The most recent copy of the database is mounted
and the latest transactions rerun
- May take considerable time to reprocess
- Resequencing errors may occur
41Database Recovery
- Recovery and Restart Procedures
- Backward Recovery (Rollback)
- Unwanted changes are undone through the use of
Before images (in the Database Change Log)
42Database Recovery
- Recovery and Restart Procedures
- Forward Recovery (Rollforward)
- After images (in the Database Change Log) are
applied to the Database Backup
43Database Recovery
What Strategy should be applied?
- That depends on the type of failure
- Alternative Rollforward (To a state just prior
to the abort)
- Preferred First correct data (if possible) then
rollback and rollforward with corrected data
- Alternative Compensating transactions (debit
then re-credit)
- System Failure (Database intact)
- Alternatives (1) Rollback (2) Restart from
Checkpoint
- Preferred Suicide (unless you can Switch)
- Alternatives (1) Rollforward (2) Reprocess
transactions
44Transaction Management
- A logical unit of work that must be either
entirely completed or aborted
- No intermediate states are acceptable.
- Most real-world database transactions are formed
by two or more database requests.
- A database request is the equivalent of a single
SQL statement in an application program or
transaction
- A transaction that changes the contents of the
database must alter the database from one
consistent database state to another.
- To ensure consistency of the database, every
transaction must begin with the database in a
known consistent state.
45Transaction Management
- A transaction is a SINGLE (indivisible),
invisible, logical unit of work
- A database request and ALL related operations
MUST be completed
- If ALL requirements are not, the transaction is
aborted
- A transaction must be PERMANENT
- When a transaction is completed, it has reached
(and must remain) in a permanent state
- Once in a permanent state, it can not be lost
- Even if the database fails, the transaction
remains
46Transaction Management
- Each concurrent transaction is treated as thought
they were received and executed in a serial (one
after the other) fashion
- This is true even in a multi-user or distributed
database
- If transactions do occur simultaneously, one is
assigned precedence over the other
- Data/Information provided/updated by a
transaction can not be used by another (later
transaction) until the first transaction is
complete (i.e., accepted)
47Transaction Management
- Suppose that we wish to withdraw items from
inventory
Bye!
IF we sell 25 Erasers
1. Find the part Number
2. Read the number onhand
3. If the number onhand is lt 25, ABORT the
transaction
4. If the number onhand is gt 25, calculate the
new number onhand quantity
500 - 25 475
5. Enter (update) the new number onhand quantity
(The DBMS will update the Transaction log and
Database Change Log)
48Transaction Management
- The SQL Commands needed are (sort-of)
straight-forward
SELECT onhand FROM inventory
WHERE part 02
UPDATE inventory SET onhand 475
OR Maybe
SELECT onhand FROM inventory
WHERE descrip Erasers
UPDATE inventory SET onhand onhand - 25
COMMIT
49Transaction Management
Why did you say sort-of ??
- Notice we didnt check to see if there were 25
Erasers available
- If there were not, we could not complete the
transaction
How do we do that ??
- That is why we are going to learn SQL/PL
(Structured Query Language/Programming Language)
50Transaction Management
- Of course, even simple transactions are sometimes
problematic
- Suppose that Dr. Mary Smith (physid 123456789)
Transfers all her patients to Dr. Von Bulow
(physid 374659201)
UPDATE patient SET patient.physid
374659201 WHERE patient.physid 123456789
- Will NOT be accepted unless we first enter the
command
INSERT INTO patient VALUES (374659201, Von
Bulow, Klaus, )
51Transaction Management
- Consider the following Statement
A credit card transaction is a ternary
relationship between a customer, a merchant, and
a bank
Given 1 merchant and 1 bank, how many customers?
Given 1 customer and 1 bank, how many Merchants?
Mandatory?
Mandatory?
Given 1 customer and 1 Merchant, how many banks?
Which Makes the relationship?
Mandatory?
52Transaction Management
- Assume that the following attributes apply
Too Simple?
Probably!!
53Transaction Management
- Our actual tables might appear as
Table Transaction
Table Customer
Table Merchant
Table Bank
54Transaction Management
- A few Activities need to be carried out
- When a transaction takes place, all of the
attributes in the associative entity TRANSACTION
must be recorded
- At the same time, the customers CreditLim and
balance must be checked
- If CredLim Balance TransAmt lt 0, the purchase
is denied (Aborted)
- If CredLim Balance TransAmt gt 0, the
purchase is Accepted
- IFF the purchase is accepted
- Customer Balance Must be updated
- MerchantBal Must be updated
- IFF the purchase is denied
- The entire TRANSACTION is deleted
55Transaction Management
(A Quick and Dirty Review)
- The DBMS uses transaction logs (A Table) to keep
track of all transactions on a database
- Intended as an organizational record of
transactions
- Necessary if a ROLLBACK is issued
- Necessary in case of a database failure/crash
- In case of failure, the transaction log is used
to ROLLFORWARD
- Transactions added since the previous COMMIT are
added and COMMITted to the database
56Transaction Management
- Consider a Sample Transaction Log for our
Previous Problem
Assigned by DBMS
NOTE Only Information about affected Tables
Included
57Transaction Management
- If the transaction is aborted, we can rollback
with the transaction log
(Before)
(After)
58Concurrency Control
- In a multi-user environment, simultaneous access
to data can result in interference and data loss
Concurrency Control
- The process of managing simultaneous operations
against a database so that data integrity is
maintained and the operations do not interfere
with each other in a multi-user environment.
59Concurrency Control
- Issues in Concurrency Control
- Assume that there are two individuals sharing a
checking account with a present balance of 400
- One of the individuals deposits 200
- Shortly afterward, one withdraws 150
Balance if Deposit is Lost
Time
Event
Process
Balance
110400
READ BAL
400.00
Deposit
110401
BAL BAL 200
600.00
WRITE BAL
110412
600.00
Withdrawl
111026
READBAL
600.00
400.00
BAL BAL - 150
111112
450.00
250.00
111152
WRITE BAL
450.00
250.00
Inaccurate Balance
60Concurrency Control
- Issues in Concurrency Control
- If A ROLLBACK is to take place, it must occur
BEFORE any New Transactions
- Consider our previous example With the proper
ROLLBACK
Time
Event
Process
Balance
110400
READ BAL
400.00
Deposit
110401
BAL BAL 200
600.00
WRITE BAL
110412
600.00
110449
ROLLBACK
Withdrawl
111026
READBAL
400.00
BAL BAL - 150
250.00
111112
111152
WRITE BAL
250.00
61Concurrency Control
- Issues in Concurrency Control
- Now consider what would occur if the rollback
takes place AFTER the second withdrawl
Time
Event
Process
Balance
110400
READ BAL
400.00
Deposit
110401
BAL BAL 200
600.00
WRITE BAL
110412
600.00
Withdrawl
111026
READBAL
600.00
BAL BAL - 150
450.00
111112
111139
ROLLBACK
111152
WRITE BAL
400.00
62Concurrency Control
- Issues in Concurrency Control
- Occur when a transaction calculates results while
another operation is taking place
Time
Event
Process
Balance
110400
READ BAL
400.00
Deposit
110401
BAL BAL 200
600.00
Withdrawl
BAL BAL - 150
110415
250.00
110432
600.00
WRITE BAL
Withdrawl Occurs while Deposit Update taking place
63Concurrency Control
- Establishes the order in which concurrent
transactions are processed
- Interleaves (meshes) the execution of database
operations to ensure serializability
- Bases actions on time stamping and locking
techniques (to be explained)
- Attempts to Optimize CPU usage by not having the
CPU wait for a WRITE to occur after a READ
- In our previous examples, transactions would be
written to the log, and a read/write would not be
processed until the previous transactions write
was processed
64Concurrency Control
- Most common technique to achieve serialization
- Guarantees exclusive use of data items to a
current transaction
- The Lock denies access (update) to another
transaction until the previous transaction is
committed
- Locks prevent another transaction from reading
inconsistent data
- DBMSs automatically enforce locking procedures
through the use of a Lock Manager
65Concurrency Control
- The level at which the data is locked
- Entire database is locked
- No transaction can access the data until the
previous transaction has been committed
- Preferable for batch operations
- Inadequate for multi-user databases
User A requests data from Table 1
User B requests data from Table 2
(Database Locked)
(Wait ---- Database Locked)
User A Commits or aborts
(Database Unlocked)
User B Transaction initiated
66Concurrency Control
- Only the table accessed by a transaction is locked
- Less restrictive, but still inadequate for
multi-user databases
User B requests data from Table 1
User A requests data from Table 2
(OK ---- Table Available)
(Table 2 Locked)
User C requests data from Table 2
(Wait ---- Table Locked)
User A Commits or aborts
(Table 2 Unlocked)
User C Transaction initiated
67Concurrency Control
- A Page is a pre-specified amount of data (4K, 8K,
etc.) which is read into memory from the database
(stored on the disk)
- Allows for some multi-user transactions, but
requires detailed checking
(i.e., are the records requested by a transaction
being used by a previous transaction)
68Concurrency Control
- ONLY the record requested is locked
- All other records are available for subsequent
transactions
- Generally suitable for most multi-user systems
- Only the individual field accessed is locked
- Excellent for multi-user systems
--- BUT ----
- Requires involved programmatic checking
69Concurrency Control
- All locks are Binary They are either locked or
unlocked
- Regardless of level of granularity, if locked the
data is unavailable to other transactions
- Multiple users can read, but NOT update, data
- If data is S-Locked, an X-Lock (Below) can not be
placed on it
- Exclusive Locks (X-Locks)
- Data can NOT be accessed, even for reading, by
other users
- If X-Locked, no other lock type can be placed on
it
70Concurrency Control
- Impasse resulting from two or more transactions
locking the same data at the same time
- Each must wait for the other to unlock the data
- Assume 2 people share a checking account and both
try to withdraw money from an ATM at the same
time
71Concurrency Control
- When accessed, all records necessary are X-Locked
- Other users must wait for the records to be
released
- Deadlock Detection/Resolution
- The DBMS periodically scans for deadlocks
- If detected, one of the transactions is
backed-out
- Any transactions made during the deadlock are
aborted
- When he resources become unlocked, the process is
restarted
(Note that this requires additional Computer
Resources)
72Concurrency Control
- UNIQUE, MONOTONIC (i.e., increasing) time applied
to each transaction
- One time stamp for last read
- One time stamp for last update
(additional record fields required)
- Read time stamp can not precede update time stamp
- Transaction is aborted and rescheduled
- Transaction submitted for processing in order of
time stamp
73Concurrency Control
- Versioning (Optimistic Management)
- Assumes that in most cases the same record will
NOT be accessed concurrently OR will simply be
read
- Each time a record is requested, the DBMS creates
a new record version
- Any changes made are made to the DB version
- The changed version is compared to the original
- If no conflicts exist, the version is accepted
- Otherwise, the changes are aborted, and the
system is rolled-back
74Concurrency Control
- Versioning (Optimistic Management)
- Consider our previous example
Check Against Original
Withdraw 300 New Balance 400
Check Against (new) Original
75 ??? Any Questions ???
76(No Transcript)