Information Resources Management - PowerPoint PPT Presentation

About This Presentation
Title:

Information Resources Management

Description:

Single site for data. Very Large databases. Operations performed simultaneously ... Data fragmented by site - no replication. Query (in Pgh) ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 89
Provided by: KevinSt4
Category:

less

Transcript and Presenter's Notes

Title: Information Resources Management


1
Information Resources Management
  • April 17, 2001

2
Agenda
  • Administrivia
  • Database Architectures

3
Administrivia
  • Homework 8

4
Database Architectures
  • Centralized
  • Client-Server
  • Parallel - single site
  • Distributed - multiple sites

5
Database Architectures
Centralized (Parallel)
Distributed
Client-Server
Function
Data
6
Centralized
  • PC, Mini, or Mainframe
  • Single Database
  • Single Database Manager
  • One or More Users
  • Data and Function in One Place

7
Client-Server
  • PCs to Mainframes to Minis
  • PC to PC
  • Mainframe to Mainframe
  • Use Desktop Processing Power
  • Better User Interface
  • Greater Functionality
  • Retain Centralized Control of Data

8
Client-Server Basic Model
Request
Server
Client
Client
Result
Client
Client
Client
9
Servers
  • Supercomputer
  • Mainframe
  • Mini
  • PC Server
  • All retain all data

10
Client-Server Architecture
Data
Function
Thin Client
Fat Client
Server (Back-End)
Client (Front-End)
11
Functionality
  • Presentation
  • I/O Processing
  • Validation
  • Business Rules
  • Application Logic
  • Data Management
  • Validation
  • Error Handling

12
Thin Client
  • Presentation Services Only
  • Accept Input
  • Format Output
  • Display
  • Server does all processing

13
Fat Client
  • Presentation
  • Validation
  • Application Logic - Programs
  • Data Management
  • Send SQL to Server
  • Server is just DBMS

14
In Between Client
  • Client
  • Presentation
  • Some Application Logic
  • Server
  • Some Applicaton Logic
  • Data Management and Services

15
Benefits of Client-Server
  • Use Local Processing Power
  • Better User Interface
  • Some Functionality if System Down
  • Use Sunk Costs of PCs
  • Support Reengineering
  • Support Intranets
  • Flexibility, Scalability, Customizeability

16
Challenges of Client-Server
  • Cost of (Upgraded) PCs
  • Network Reliance
  • Distributing Application Updates
  • Management of Complex System
  • Problem Identification Resolution
  • Application Partitioning

17
Other Client-Server Architectures
  • Traditional is Two-Tiered (client-server)
  • Three-Tiered
  • Client-Application Server-DB Server
  • (PC - Mini - Mainframe)
  • (PC - PC Server - Mainframe)
  • Beyond Three
  • PC - PC Server - Web Server - Mini - Mainframe

18
Client-Server vs. Distributed
  • Client-Server Application Distribution
  • Distributed Data Distribution
  • Often, client-server is used to refer to
    either application distribution or data
    distribution or both.

19
Middleware
  • What if
  • Multiple databases (sources) need to be accessed
    from a single client?
  • Different kinds of clients?
  • Mix of clients and servers?
  • Want to take advantage of existing base of
    applications (legacy systems)?

20
Middleware
  • Fat Clients just send SQL transactions
  • Other types of transactions may be needed based
    on the server (system)

21
Middleware
Software that shields applications from the
complexity of the operating environment.
Client
Client
Client
Middleware
System (Legacy)
System (Legacy)
22
Types of Middleware
  • Transaction Process (TP) Monitor
  • Database Middleware
  • Remote Procedure Call (RPC)
  • Message-Oriented Middleware (MOM)
  • Object-Request Brokers
  • (CORBA - ORB)

23
TP Monitor
  • Synchronous - sender must wait
  • Queuing
  • Message Delivery
  • Insured Delivery
  • Either Direction

24
Database Middleware
  • Variety of Clients/Platforms
  • Variety of Servers/DBMSs/Platforms
  • Specific to DB transactions (SQL)

25
Message-Oriented Middleware (MOM)
  • Asynchronous - clients do not wait
  • Queues Queue Management/Recovery
  • Message Delivery
  • Insured Delivery
  • Either Direction
  • (like email or EDI only transactions)

26
Advantages of Middleware
  • Leverage sunk costs (legacy systems)
  • Reduce development cost
  • Reduce development time
  • Increase responsiveness
  • Improve overall systems management
  • Consolidate diffuse information

27
Challenges of Middleware
  • Cost
  • Session management - Transaction state
  • Security
  • Network reliance
  • Diversity of systems - lack of standards
  • Constant technology change
  • Availability of talent
  • Middleware Management

28
Parallel and Distributed
  • Client-Server is an attempt to improve
    performance
  • Reduce time to execute a transaction
  • Parallel
  • Reduce time to get the data
  • Distributed

29
Parallel Systems
  • Single site for data
  • Very Large databases
  • Operations performed simultaneously

30
Parallel Database Architecures
  • Shared Memory
  • Shared Disk
  • Shared Nothing
  • Hierarchical

31
Shared Memory
P
M
P
P
32
Shared Memory
  • Advantages
  • Extremely efficient communications
  • Disadvantages
  • Max of 32/64 processors
  • Bus becomes bottleneck

33
Shared Disk
P
M
P
M
P
M
34
Shared Disk
  • Advantages
  • No bus bottleneck
  • Fault tolerance provided
  • Disadvantages
  • Disk access becomes bottleneck

35
Shared Nothing
P
M
P
M
P
M
36
Shared Nothing
  • Advantages
  • No disk bottleneck
  • Highly scaleable
  • Disadvantages
  • High communication overhead/cost
  • Between processors
  • To another processors data

37
Hierarchical
P
M
P
M
38
Hierarchical
  • Advantages
  • Best of all worlds
  • Disadvantages
  • Worst of all worlds
  • Some high communcation overhead/cost
  • Between subsystems
  • Complexity

39
Distributed Databases
  • Client-Server - distribute functionality
  • What about distributing data?

40
Distributed Databases
  • Overview
  • Distributed Storage
  • Distributed Queries
  • Distributed Transactions
  • Multidatabase (Middleware)

41
Distributed Databases
  • Multiple locations
  • Single logical database
  • Several physical databases
  • Network connections

42
Advantages
  • Sharing across locations
  • Local control
  • Availability

43
Challenges
  • Development costs
  • People Equipment
  • Testing
  • Problem identification resolution
  • Technical expertise
  • Network dependence
  • Increased processing overhead

44
Distributed Data Storage
  • Replication
  • Fragmentation
  • Both

45
Replication
  • Data is repeated
  • Spectrum of options available
  • Temporary replication of specific rows
  • Replicate infrequently changed data
  • Replicate by site
  • Central site - all / each local site - their data
    only
  • Full replication
  • Everything everywhere

46
Concerns with Replication
  • Availability needed
  • Amount of parallelism in reads
  • Overhead of updates
  • Keeping replicas updated
  • Conflicting updates

47
Fragmentation
  • Partitioning
  • Divide data into subsets based on need
  • Have to be able to pull back together to get
    original tables

48
Fragmentation
  • Horizontal
  • by rows
  • specified conditions
  • Vertical
  • by column
  • each requires primary key (or created key)
  • Mixed
  • by row and column

49
Fragmentation Replication
  • Repeat as necessary
  • Replicate fragments
  • Fragment replicas
  • Dont lose track of what you have and where it
    is!

50
Network Transparency
  • Distributing data should not require that the
    user know where or how its been distributed.
  • The database should be seen as a single entity no
    matter how fragmented and replicated it becomes.

51
Network Transparency
  • Some DBMSs are starting to provide this level of
    functionality so transparency exists even at the
    program level, but in many cases this
    transparency must be programmed into the
    applications.
  • It must always be designed into the database.

52
Distributed Queries
  • How do you query data that is everywhere?

53
Effeciency vs. Overhead
  • Splitting the query apart
  • Keeping track of the data/locations
  • Making sure everything gets executed
  • Putting the results back together
  • Generating network traffic
  • Handling partial results

54
Distributed Queries
  • Full replication can avoid the overhead
  • Huge increase in update overhead
  • Parallel execution no longer possible
  • Additional costs of replication

55
Example
  • 5 sites - NY, Pgh, Chicago, Dallas, Los Angeles
  • Data fragmented by site - no replication
  • Query (in Pgh)
  • SELECT Name, Max (Salary) from Employee

56
Option 1 - High Bandwidth
  • 1. Have all sites send their full employee tables
    to Pgh.
  • 2. Build a temporary employee table.
  • 3. Run the query against this table.

57
Option 2 - Not so High Bandwidth
  • 1. Examine the query and determine it can be run
    separately at each location and the results
    combined.
  • 2. Submit just the query to each location.
  • 3. Wait for the results from each city.
  • 4. As results return, build a temporary table (5
    rows only).
  • 5. Find the max using the temporary table.

58
Distributed Transactions
  • Transaction Types
  • Coordinators
  • Commit Protocols
  • Concurrency Controls
  • Deadlocks

59
Transaction Types
  • Local - transaction only needs local data
  • Global - transaction uses non-local data
  • My global becomes someone elses local
  • Either type of transaction must still have ACID
    properties - global is the concern

60
System Structure
  • Things to do
  • 1. Process local transactions
  • (transaction manager)
  • 2. Process and track global transactions
  • (transaction coordinator)

61
Global Processing
  • 1. Recognize as global
  • 2. Break up transaction
  • 3. Distribute pieces
  • 4. Assemble results
  • 5. Coordinate termination
  • 6. Handle problems

62
Coordinator of Coordinators
  • Coordinate among sites
  • Detect problems
  • Attempt to fix
  • Share status with others

63
Coordinator Failure
  • Backup Coordinator
  • receives all messages - maintains state
  • monitors coordinator
  • automatically takes over if coordinator down
  • avoids delays - increases overhead
  • Election
  • highest pre-assigned number

64
Commit Protocols
  • Two-Phase
  • Three-Phase
  • All sites must commit or all sites have to
    rollback
  • Replicated data only

65
Two-Phase Commit
  • Phase 1
  • Send PREPARE to all sites
  • Sites respond READY or ABORT
  • Phase 2
  • If all sites READY,
  • COMMIT locally - Send COMMITs
  • If not READY or time expires
  • ROLLBACK locally - Send ROLLBACK

66
Two-Phase Commit
Coordinator
Site
Site
Site
Site requests commit
67
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Send PREPARE - all sites
68
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Sites respond READY
69
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
COMMIT locally
70
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send COMMIT - all sites
71
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Site responds ABORT or does not respond
72
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
ROLLBACK locally
73
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send ROLLBACK - all sites
74
Site Failure - Recovery
  • COMMIT and ROLLBACK as normal
  • If READY only
  • Check with coordinator or other sites
  • Either COMMIT or ROLLBACK
  • If no one found, ROLLBACK

75
Coordinator Failure
  • Ask the sites
  • If one has COMMIT, then REDO
  • If one has ROLLBACK, then UNDO
  • If one doesnt have READY, UNDO
  • If all READY only
  • Coordinator must decide
  • Sites must wait and locks are held
  • Blocking occurs

76
Three-Phase Commit
  • Phase 1
  • Sent PREPARE
  • Sites respond READY or ABORT
  • Phase 2
  • If all sites READY, send PRECOMMIT
  • Else, ROLLBACK
  • Sites must ACKNOWLEDGE
  • Phase 3
  • If at least K sites ACKNOWLEDGE, send COMMIT

77
Coordinator Failure
  • Three-Phase Commit prevents blocking
  • If coordinator fails
  • New coordinator is selected
  • Sites queried to determine status
  • New coordinator resumes

78
Network Partitioning
  • Network split creates two separate networks
  • Each half selects a coordinator
  • Coordinators make independent decisions
  • Result could be different decisions
  • Resolution of network problem may create need to
    resolve database problems

79
Concurrency Control
  • Single Lock Manager
  • Multiple Lock Managers

80
Single Lock Manager
  • One site for all locking
  • All other sites must go to it
  • Can read from anywhere
  • Updates must be to all copies
  • Advantages Simple, Easy deadlock detection
  • Disadvantages Bottleneck, Vulnerability

81
Simple Multiple Lock Mgrs
  • Each site locks a unique partition of the data
  • non-replicated data
  • Advantages Fairly simple, reduced bottlenecks
  • Disadvantages Complicated deadlock detection

82
Majority Protocol
  • Each site locks its own data
  • replication possible
  • Request owner for lock on data that isnt local
  • When multiple owners, n/2 1 (majority) must
    provide the lock
  • Advantages No bottlenecks
  • Disadvantages More messages sent, Complicated
    deadlock detection, More deadlocks (each gets 1/2)

83
Biased Protocol
  • Reduced form of Majority Protocol
  • For a READ, only need any single lock
  • For a WRITE, need all locks
  • Advantages No bottle necks, Reduced traffic
  • Disadvantages Update traffic, Deadlocks

84
Primary Copy
  • Site designated to hold primary copy
  • Multiple sites
  • Replicated Data
  • All locks through that site
  • Advantages Fairly simple, reduced bottlenecks
  • Disadvantages Vulnerability, Complicated
    deadlock detection

85
Other Than Locking
  • Timestamps
  • Centralized generation
  • Local generation
  • Timestamp tests determine ability to read or write

86
Deadlocks Distributed Data
  • Centralized
  • One Site
  • Distributed
  • Centralized - same advantages and disadvantages
    as other centralized control (database or locking)

87
Distributed Deadlock Detection
  • Each site tracks all transactions accessing its
    own data
  • Dummy transaction for transactions that
    originated here but are executing elsewhere
  • If deadlock found that includes dummy transaction
  • Must send deadlock information to other sites
  • They check for deadlock
  • May have to pass on to another site

88
Homework 9
  • Continuuing with the Carnegie Library
  • Client/Server
  • Distrributed Database
Write a Comment
User Comments (0)
About PowerShow.com