Mariposa: a wide-area distributed database system - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Mariposa: a wide-area distributed database system

Description:

LAN database management is common most often used in industries where the data ... Site can leave Mariposa by selling objects and by ceasing to bid. ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 38
Provided by: kum41
Learn more at: https://cis.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Mariposa: a wide-area distributed database system


1
Mariposa a wide-area distributed database system
  • Kumar Ramdurgkar.
  • CIS 661

2
Mariposa Distributed Database Management System
Principal Investigator Prof. Michael Stonebraker
3
SECTION 1
  • Introduction to Mariposa

4
LAN Vs WAN databases
  • LAN database management is common most often used
    in industries where the data is local to the
    installation.
  • LAN has a single RDBMS source.
  • LAN is maintained by a well defined set of rules,
    data types, and services.
  • The difference ?

5
WAN Databases
  • Many databases interconnected over a WAN
  • In WAN there are many sites participating in the
    DBMS
  • Different site administrators.
  • Different data types, extensions and service
    handling times.
  • How do we interconnect ?
  • What are the issues ?

6
Issues and problems
  • Network connections and traffic.
  • Different load handling capabilities and
    service times.
  • Different data type and extensions.
  • A single program acting as a query optimizer will
    NOT work
  • continued

7
Issues and problems
  • Cost based optimization does not respond well to
    site specific type extensions and access
    constraints, charging algorithms and time-of-day
    constraints.
  • No proper scaling for LAN algorithms to suite WAN
    DBMS
  • The Solution

8
An excellent idea ! MARIPOSA
  • UBID !! Have you been there ??
  • The Mariposa is a distributed DBMS working on the
    economic paradigm of Bidding.

Mariposa was proposed by Michael Stonebraker,
Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam
Sah, Jeff Sidell, Carl Staelin, Andrew
Yu Proposed Nov 1994 Accepted Sept 1995

9
Mariposa vision
  • Standard approach for distributed data.
  • A set of standard guidelines for WAN databases.
  • Application of query storage and optimization
    using a different perspective.
  • Scalability and data explosion handling.
  • A query optimizer for the WWW ??
  • Need to formalize

10
WAN Guidelines for Mariposa
  • Scalability to a large number of cooperating
    sites.
  • Data mobility.
  • No global synchronization of data.
  • Total local autonomy and complete control.
  • Easily configurable policies for changing the
    behavior of Mariposa.

11
Mariposa System architecture
  • Microeconomic mechanisms.
  • All Mariposa clients and servers have a account
    with a network bank.
  • A user allocates a budget in the currency of this
    bank to each query.
  • The goal of the query processing system is to
    solve the query within the allotted time by
    contracting various Mariposa clients.

12
Mariposa Broker mechanism
  • Obtain bid pieces for a query from sites.
  • Uses a distributed advertising system as over the
    usual META DATA mechanisms used in LAN.
  • The server who has advertised the best time for
    the given query wins.

13
Scalability
  • Site can join Mariposa by buying objects and
    advertising services
  • Site can leave Mariposa by selling objects and by
    ceasing to bid.
  • Hence a highly scalable system.
  • Infact the success of Mariposa depends on a large
    number of sites participating in the system.

14
Storage decisions
  • Objects have no notion of home.
  • All secondary indices are moved with the objects.
  • Avoidance of global sync is simplified because of
    the economic paradigm.
  • Mariposa fosters data mobility and free trade of
    objects
  • Object here means data

15
Total local control
  • Since each Mariposa site is free to bid on any
    business of interest, it has total local
    autonomy.
  • Each site is expected to maximize its individual
    profit per unit of operating time and to bid on
    those queries that it feels will accomplish this
    goal.

16
Sounds good any drawbacks ??
  • Some queries may not be solvable either because
    nobody will bid on them or the minimum bids
    exceeds what the client is willing to pay.
  • A site can refuse to give up objects
  • A site may not find buyers for objects that it
    wants to sell.

17
SECTION 2
  • Mariposa architecture

18
Mariposa Architectural details
  • Hardware Flow chart
  • Processes (bidding, bid protocols, acceptance,
    finding bidders, subquery bidding, network
    bidding, splitting and combining)
  • Code languages (RUSH)
  • Mariposa experiments and results
  • Conclusions

19
Architecture overview
  • Client query in SQL3
  • Middleware consists of several query separator
    and query broker.
  • Broker and Bidder coded in RUSH.
  • Local execution at the site that wins the bid.
  • Details

20
Architecture details
21
Processes Bidding
  • Each query Q has a budget B(t) that can be used
    to solve the query
  • The budget is a value the user gives to solve
    this query.
  • Broker receives query plan for Q and tries to bid
    and solve each fragment using either the
    expensive bid protocol or a cheaper purchase
    order protocol.

22
Processes Bidding
  • Brokers split each query into sub queries and bid
    for each sub query
  • There is a set sequence of sub query execution.
  • Finding the right winners is implemented in a
    greedy algorithm at the broker.

23
Processes Bid Protocols
  • The expensive bid protocol has 2 phases
  • Broker sends requests and Bidder sends back
    triplet value (Ci, Di, Ei) indicating cost Ci for
    Delay of Di and expiration of bid is Ei (for Qi)
  • The broker notifies winners (and losers).
  • The purchase order protocol is faster and
    involves the Broker sending the query to the site
    it is most likely to be processed. There is a
    risk that the query might not be processed in the
    given time.

24
Finding Bidders
  • Brokers examine Ad Tables to find out the
    servers that are willing to perform the task at
    hand.
  • Using records in an Ad Table the server posts its
    bids.
  • Ad tables typically have the bidding information
    for the sample query structures run on that
    server.

25
Sample Ad Table design
  • Not all fields might be used

26
Bidding strategies
  • Bulk purchase contracts allowing lower than
    normal bids (wholesale)
  • Coupons
  • Sale
  • Broker intelligence (remember last successful bid
    history and try that site query combination again)

27
Processes Network Bidding
  • Account for network bandwidth.
  • Data size comes into the consideration.
  • Minimum available bandwidth is calculated from
    node to node.
  • This bandwidth must be reserved to achieve
    desired performance.
  • Mariposa uses Telnet protocols RTIP and RCAP for
    network bidding.

28
Coding (RUSH language)
  • Mariposa provides a low level, very efficient
    embedded scripting language and rule system
    called Rush
  • Using Rush, it is straightforward to change
    policy decisions one simply modifies the rules
    by which these modules are implemented.
  • The Mariposa architecture is primarily coded in
    Rush.

29
SECTION 3
  • Mariposa experiments and results

30
Operational system
  • Mariposa operational on Digital Equipment Corp.
    Alpha AXP workstations. UC Berkeley,
  • The basic server engine is that of POSTGRES.
  • Implementation of the Rush language itself has
    required careful design and performance
    engineering.
  • Requirement of multithreaded network
    communication package.

31
Experiment setup
  • Workstations connected by 10MB/s ethernet
  • WAN experiments conducted at night.
  • The benchmark database consists of three tables,
    R1, R2 and R3.
  • The workload query is an equijoin of all three
    tables SELECT FROM R1, R2, R3
  • WHERE R1.u1 R2.u1
  • AND R2.u1 R3.u1

32
  • In the wide area case, the query originates at
    Berkeley and performs the join over the WAN
    connecting UC Berkeley,UC Santa Barbara and UC
    San Diego.

33
Timing Results
34
Conclusions
  • Mariposa, a prototype data management system that
    unifies the best features of distributed
    operating system and distributed database
    management system research.
  • Distributed query optimization has been
    identified as an area that will receive a strong
    emphasis and we will also examine how to build a
    system that has a rule system at its core.

35
Conclusions
  • Future work remains in the areas of system
    robustness, distributed failure recovery, and
    performance assessment.

36
References
  • Mariposa home
  • http//s2k-ftp.cs.berkeley.edu8000/mariposa/index
    .html

37
Thank you.
Write a Comment
User Comments (0)
About PowerShow.com