Title: Mariposa: a wide-area distributed database system
1Mariposa a wide-area distributed database system
- Kumar Ramdurgkar.
- CIS 661
2Mariposa Distributed Database Management System
Principal Investigator Prof. Michael Stonebraker
3SECTION 1
4LAN Vs WAN databases
- LAN database management is common most often used
in industries where the data is local to the
installation. - LAN has a single RDBMS source.
- LAN is maintained by a well defined set of rules,
data types, and services. - The difference ?
5WAN Databases
- Many databases interconnected over a WAN
- In WAN there are many sites participating in the
DBMS - Different site administrators.
- Different data types, extensions and service
handling times. - How do we interconnect ?
- What are the issues ?
6Issues and problems
- Network connections and traffic.
- Different load handling capabilities and
service times. - Different data type and extensions.
- A single program acting as a query optimizer will
NOT work - continued
7Issues and problems
- Cost based optimization does not respond well to
site specific type extensions and access
constraints, charging algorithms and time-of-day
constraints. - No proper scaling for LAN algorithms to suite WAN
DBMS - The Solution
8An excellent idea ! MARIPOSA
- UBID !! Have you been there ??
- The Mariposa is a distributed DBMS working on the
economic paradigm of Bidding.
Mariposa was proposed by Michael Stonebraker,
Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam
Sah, Jeff Sidell, Carl Staelin, Andrew
Yu Proposed Nov 1994 Accepted Sept 1995
9Mariposa vision
- Standard approach for distributed data.
- A set of standard guidelines for WAN databases.
- Application of query storage and optimization
using a different perspective. - Scalability and data explosion handling.
- A query optimizer for the WWW ??
- Need to formalize
10WAN Guidelines for Mariposa
- Scalability to a large number of cooperating
sites. - Data mobility.
- No global synchronization of data.
- Total local autonomy and complete control.
- Easily configurable policies for changing the
behavior of Mariposa.
11Mariposa System architecture
- Microeconomic mechanisms.
- All Mariposa clients and servers have a account
with a network bank. - A user allocates a budget in the currency of this
bank to each query. - The goal of the query processing system is to
solve the query within the allotted time by
contracting various Mariposa clients.
12Mariposa Broker mechanism
- Obtain bid pieces for a query from sites.
- Uses a distributed advertising system as over the
usual META DATA mechanisms used in LAN. - The server who has advertised the best time for
the given query wins.
13Scalability
- Site can join Mariposa by buying objects and
advertising services - Site can leave Mariposa by selling objects and by
ceasing to bid. - Hence a highly scalable system.
- Infact the success of Mariposa depends on a large
number of sites participating in the system.
14Storage decisions
- Objects have no notion of home.
- All secondary indices are moved with the objects.
- Avoidance of global sync is simplified because of
the economic paradigm. - Mariposa fosters data mobility and free trade of
objects - Object here means data
15Total local control
- Since each Mariposa site is free to bid on any
business of interest, it has total local
autonomy. - Each site is expected to maximize its individual
profit per unit of operating time and to bid on
those queries that it feels will accomplish this
goal.
16Sounds good any drawbacks ??
- Some queries may not be solvable either because
nobody will bid on them or the minimum bids
exceeds what the client is willing to pay. - A site can refuse to give up objects
- A site may not find buyers for objects that it
wants to sell.
17SECTION 2
18Mariposa Architectural details
- Hardware Flow chart
- Processes (bidding, bid protocols, acceptance,
finding bidders, subquery bidding, network
bidding, splitting and combining) - Code languages (RUSH)
- Mariposa experiments and results
- Conclusions
19Architecture overview
- Client query in SQL3
- Middleware consists of several query separator
and query broker. - Broker and Bidder coded in RUSH.
- Local execution at the site that wins the bid.
- Details
20Architecture details
21Processes Bidding
- Each query Q has a budget B(t) that can be used
to solve the query - The budget is a value the user gives to solve
this query. - Broker receives query plan for Q and tries to bid
and solve each fragment using either the
expensive bid protocol or a cheaper purchase
order protocol.
22Processes Bidding
- Brokers split each query into sub queries and bid
for each sub query - There is a set sequence of sub query execution.
- Finding the right winners is implemented in a
greedy algorithm at the broker.
23Processes Bid Protocols
- The expensive bid protocol has 2 phases
- Broker sends requests and Bidder sends back
triplet value (Ci, Di, Ei) indicating cost Ci for
Delay of Di and expiration of bid is Ei (for Qi) - The broker notifies winners (and losers).
- The purchase order protocol is faster and
involves the Broker sending the query to the site
it is most likely to be processed. There is a
risk that the query might not be processed in the
given time.
24Finding Bidders
- Brokers examine Ad Tables to find out the
servers that are willing to perform the task at
hand. - Using records in an Ad Table the server posts its
bids. - Ad tables typically have the bidding information
for the sample query structures run on that
server.
25Sample Ad Table design
- Not all fields might be used
26Bidding strategies
- Bulk purchase contracts allowing lower than
normal bids (wholesale) - Coupons
- Sale
- Broker intelligence (remember last successful bid
history and try that site query combination again)
27Processes Network Bidding
- Account for network bandwidth.
- Data size comes into the consideration.
- Minimum available bandwidth is calculated from
node to node. - This bandwidth must be reserved to achieve
desired performance. - Mariposa uses Telnet protocols RTIP and RCAP for
network bidding.
28Coding (RUSH language)
- Mariposa provides a low level, very efficient
embedded scripting language and rule system
called Rush - Using Rush, it is straightforward to change
policy decisions one simply modifies the rules
by which these modules are implemented. - The Mariposa architecture is primarily coded in
Rush.
29SECTION 3
- Mariposa experiments and results
30Operational system
- Mariposa operational on Digital Equipment Corp.
Alpha AXP workstations. UC Berkeley, - The basic server engine is that of POSTGRES.
- Implementation of the Rush language itself has
required careful design and performance
engineering. - Requirement of multithreaded network
communication package.
31Experiment setup
- Workstations connected by 10MB/s ethernet
- WAN experiments conducted at night.
- The benchmark database consists of three tables,
R1, R2 and R3. - The workload query is an equijoin of all three
tables SELECT FROM R1, R2, R3 - WHERE R1.u1 R2.u1
- AND R2.u1 R3.u1
32- In the wide area case, the query originates at
Berkeley and performs the join over the WAN
connecting UC Berkeley,UC Santa Barbara and UC
San Diego.
33Timing Results
34Conclusions
- Mariposa, a prototype data management system that
unifies the best features of distributed
operating system and distributed database
management system research. - Distributed query optimization has been
identified as an area that will receive a strong
emphasis and we will also examine how to build a
system that has a rule system at its core.
35Conclusions
- Future work remains in the areas of system
robustness, distributed failure recovery, and
performance assessment.
36References
- Mariposa home
- http//s2k-ftp.cs.berkeley.edu8000/mariposa/index
.html
37Thank you.