Title: Modern Trends in Databases
1Modern Trends in Databases
- Database Systems Lecture 17
2Last Week
- Today database Security and Integrity
- Aspects of security
- Access to databases
- Making sure the correct data goes in.
- 1) Privileges
- 2) Views
- 3) Integrity constraints
3The Penultimate Lecture
- GOOD Modern Databases
- Distributed DBs
- Web-based DBs
- Multimedia DBs
4Distributed Databases
- Distributed database management system (DDBMS)
- A DBMS (or set of them) to control the databases
- Communication software to handle interaction
between sites
- A distributed DB system consists of several sites
- Sites are connected by a network
- Each site can hold data and process it
- It shouldnt matter where the data is - the
system is a single entity
5Types of Distribution
- There are two basic options with which we will be
concerned when it comes to distribution - Distributed processing
- Distributed data
- With one exception (distributed data,
non-distributed processing), neither of these
necessarily implies the other
6Distributed Processing
New York
London
CLIENT
CLIENT
CLIENT
CLIENT
WIDE AREA NETWORK
CLIENT
CLIENT
CLIENT
CLIENT
Moscow
Beeston
CLIENT
CLIENT
CLIENT
CLIENT
DBMS
CLIENT
CLIENT
CLIENT
CLIENT
7What is a Distributed Database?
- A distributed database system is a collection of
logically related databases that co-operate in a
transparent manner. - There should be location independence
- i.e.- as the user is unaware of where the data is
located it is possible to move the data from one
physical location to another without affecting
the user.
8Distributed Database
New York
London
CLIENT
CLIENT
CLIENT
CLIENT
DBMS
DBMS
WIDE AREA NETWORK
CLIENT
CLIENT
CLIENT
CLIENT
Moscow
Beeston
CLIENT
CLIENT
CLIENT
CLIENT
DBMS
DBMS
CLIENT
CLIENT
CLIENT
CLIENT
9Reasons for Distribution
- Reduced Communication Overhead Most data access
is local, less expensive and performs better. - Improved Processing Power Instead of one server
handling the full database, we now have a
collection of machines handling the same
database. - Removal of Reliance on a Central Site If a
server fails, then the only part of the system
that is affected is the relevant local site. The
rest of the system remains functional and
available.
10Reasons for Distribution
- Expandability It is easier to accommodate
increasing the size of the global (logical)
database. - Local autonomy The database is brought nearer
to its users. This can effect a cultural change
as it allows potentially greater control over
local data .
11Reasons against Distribution
- Complexity (distributed database systems,
especially, are considerably more complex than
centralized or client/server ones) - Security (more opportunities for protection
failure or attack) - Software management costs
- Lack of standards
- Data integrity more difficult to maintain
12Transparency
- To obtain the benefits of distributed data
without incurring added operational complexity,
distributed database systems should be
transparent - What is transparency?
- A transparent distributed database system would
look, to a user, just like a centralized database
system.
13Fragmentation
- When you split data up over separate locations
you have to make a choice - Due you split up the rows of a table, or the
columns of a table? - These are horizontal and vertical fragmentation
respectively.
14Horizontal Fragmentation
New York
account_number
branch_name
balance
Hillside Hillside Hillside
A-305 A-226 A-155
500 336 62
Beeston
account_number
branch_name
balance
Valleyview Valleyview Valleyview Valleyview
A-177 A-402 A-408 A-639
205 10000 1123 750
15Vertical Fragmentation
New York
branch_name
tuple_id
customer_name
Lowman Camp Camp Kahn
1 2 3 4
Hillside Hillside Valleyview Valleyview
Beeston
account_number
tuple_id
balance
500 336 205 10000
A-305 A-226 A-177 A-402
1 2 3 4
16Transactions in Distributed Database Systems
- Transactions in a distributed database system may
be either global or local - Support for global transactions is provided by
the DDBMS (Distributed DBMS) in a true
distributed database system. - This is not a simple task.
17Distributed vs. centralized
- Both have pros and cons
- In other words, if you choose a distributed
database you are spreading out lots of small
headaches rather than having one central migraine.
18TERMINALS
MAINFRAME COMPUTER
NETWORK CONNECTION
PRESENTATION LOGIC
BUSINESS LOGIC
DATA LOGIC
19Client/Server Architecture
- The client/server architecture is a general model
for systems where a service is provided by one
system (the server) to another (the client)
- Server
- Hosts the DBMS and database
- Stores the data
- Client
- User programs that use the database
- Use the server for database access
20Client/Server Architecture
SERVER
CLIENT 1
CLIENT 2
DB
DBMS
CLIENT 3
DATA LOGIC
PRESENTATION LOGIC
BUSINESS LOGIC
21Transactions in Client/Server Systems
- Transactions in a single server environment are
simple the same as in a centralized system - Transactions in a multi-server system are
server-oriented. - That is, a single transaction cannot involve
multiple servers because the servers operate
completely independently of each other
22Web-based Databases
- Database access over the internet
- Web-based clients
- Web server
- Database server(s)
- Web server serves pages to browsers (clients) and
can access database(s)
- Typical operation
- Client sends a request for a page to the web
server - Web server sends SQL to database
- The web server uses results to create page
- The page is returned to the client
23Web-based Databases
Client (Browser)
Web Server
DatabaseServer
24Web-based Databases
- Advantages
- World-wide access
- Internet protocols (HTTP, SSL, etc) give uniform
access and security - Database structure is hidden from clients
- Uses a familiar interface
- Disadvantages
- Security can be a problem if you are not
extremely careful - Interface is less flexible using standard
browsers - Limited interactivity over slow connections
25Microsoft anyone?
Internet Explorer
ASP .NET
MS SQL Server
26Corporate Style
Internet Explorer
JSP
Oracle
27Open Source
Firefox
PHP
PHP is a scripting language originally designed
for producing dynamic web pages. It has evolved
to include a command line interface capability
and can be used in standalone graphical
applications.
MySQL
28Even more choice-o-rama
Firefox
PHP
Perl
Ruby
Python
Python is an interpreted, object-oriented,
high-level programming language for Rapid
Application Development, as well as for use as a
scripting or glue language to connect existing
components together.
PostgreSQL
29Web Based Approaches
- The scripting language generates the query
depending on what you the web user requests. It
then takes the results and formats them into HTML
Javascript.
30Multimedia Databases
- Multimedia DBs can store complex information
- Images
- Music and audio
- Video and animation
- Full texts of books
- Web pages
- They can be used in a wide range of application
areas - Entertainment
- Marketing
- Medical imaging
- Digital publishing
- Geographic Information Systems
31Querying Multimedia DBs
- Metadata searches
- Information about the multimedia data (metadata)
is stored - This can be kept in a standard relational
database and queried normally - Limited by the amount of metadata avalilable
- Content searches
- The multimedia data is searched directly
- Potential for much more flexible search
- Depends on the type of data being used
- Often difficult to determine what the correct
results are
32Metadata Searches
- Example - indexing films we might store
- Title
- Year
- Genre(s)
- Actor(s)
- Director(s)
- Producer(s)
- Keywords (!)
- We can then search for things like
- Films starring Kevin Spacey
- Films directed by Peter Jackson
- Dramas produced in 2000
- We dont actually search the films themselves.
33Metadata Searches
- Advantages
- Metadata can be structured in a traditional DBMS
- Metadata is generally concise and so efficient to
store - Metadata enriches the content
- Disadvantages
- Metadata cant always be found automatically, and
so requires data entry - It restricts the sorts of queries that can be made
34Content Searches
- An alternative to metadata is to search the
content directly - Multimedia is less structured than metadata
- It is a richer source of information but harder
to process
- Example of content based retrieval
- Find images similar to a given sample
- Hum a tune and find out what it is
- Search for features, such as cuts or transitions
in films
35Content-Based Retrieval
QBIC (Query By Image Content) from IBM -
searches for images having similar colour or
layout
http//wwwqbic.almaden.ibm.com/cgi-bin/stamps-demo
36Content-Based Retrieval
- Image retrieval is hard.
- It is often not clear when two images are
similar - Image interpretation is unsolved and expensive
- Different people expect different things
- Do we look for?
- Images of roses
- Images of red things?
- Images of flowers?
- Images of red flowers?
- Images of red roses?