The Porcupine Scalable Mail Server - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

The Porcupine Scalable Mail Server

Description:

Porcupine is designed to be cheap, fast, scalable, and easy to manage. System ... Other services using Porcupine infrastructure. Management Interface ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 17
Provided by: Yasu6
Category:

less

Transcript and Presenter's Notes

Title: The Porcupine Scalable Mail Server


1
The Porcupine Scalable Mail Server
  • Yasushi Saito
  • Eric Hoffman
  • Brian Bershad
  • Hank Levy
  • David Becker
  • Bertil Folliot

http//porcupine.cs.washington.edu/
University of Washington, Department of Computer
Science and Engineering Sep 7, 1998
2
Why Is Mail an Interesting Problem?
  • Cluster research has focused on web services
  • Mail is an example of a write-intensive
    application
  • disk-bound workload
  • reliability requirements
  • failure recovery
  • Mail servers have relied on brute force
    approach to scaling
  • Big-iron file server, RDBMS

3
Goals
  • Use networked PCs to build a fast, scalable and
    easy-to-manage mail server

1 billion messages/day (100x existing systems)
100 million users (10x existing systems) 1000
nodes (50x existing systems)
4
Conventional Mail Servers
  • SMTP/POP front-end hosts
  • Distributed file system for message store
  • Dedicated user DB server

The Internet
User DB Server
NFS Server
NFS Server
5
Problems of Conventional Architecture
  • Hardware expense
  • Dedicated file servers, DBMS
  • Management expense
  • Limited fault tolerance
  • Static configuration
  • Performance
  • Synchronization based on file system mechanisms
  • Slow legacy software

6
Porcupine Mail Server
  • Symmetric function distribution
  • Distribute user database and user mailbox
  • Lazy data management
  • Self-management
  • Automatic load balancing, membership management
  • Graceful Degradation
  • Cluster remains functional despite any number of
    failures

7
Porcupine Architecture
SMTP server
POP server
IMAP server
B
C
A
C
A
B
D
D
Cluster Membership Manager
Hash map
RPC selector
User DB cache
Mailbox spool
User DB
...
...
Node A
Node B
Node Z
Local area network
8
User Management
USER jim
Node A
Node C
SMTP server
POP server
SMTP
POP
hash(jim)3
Membership
Membership
B
C
A
C
A
B
D
D
LAN
RPC selector
Mailbox spool
User DB
User DB cache
Mailbox spool
User DB
jim INBOX??A,D? oldmail??B? john INBOX
??A,C? ...
User DB Cache
9
Node Failure and Recovery
  • Membership protocol runs
  • Hash map reconfigured
  • dead node removed, recovered node added
  • load balanced by assigning approx equal number of
    entries to each node
  • Recovered node scans spool and notifies user DB
    caches about the spool content
  • User DB reconciliation runs optimistically

10
Implementation Status
  • Linux pthread Flick
  • Basic functions implemented (SMTP, POP)
  • Lacks frills (mail address rewriting, filtering)
  • Robust recovery
  • Up to 15 cluster of PCs connected by 100Mb
    Ethernet
  • Porcupine simulator for larger cluster

11
Status Monitor Screen
SMTP, POP, and RPC throughput
Cluster members and their CPU utilization
Status report
12
Performance
  • Questions
  • How does the system scale?
  • How costly is the failure recovery procedure?
  • Two scenarios tested
  • Steady state
  • Node failure
  • Platform
  • 300MHz PII 128MB memory 100Mb Ethernet
  • Linux-2.0.32 glibc2.0.7

13
Scalability
SMTP Sessions /sec
Number of nodes
14
Failure Recovery
  • 8-node cluster. Two nodes fail, later recover

First node fails
Second node fails
SMTP Sessions /sec
Both nodes recover
Time (seconds)
15
Summary and Future Work
  • Porcupine is designed to be cheap, fast,
    scalable, and easy to manage
  • System throughput scales
  • Large-scale membership, RPC
  • Geographically distributed clustering
  • Porcupine as a distributed service workbench

16
Discussion Points
  • Distributed Membership protocol
  • Christians 3/5 phase algorithm vs two-tiered
    broadcast algorithm vs ...
  • Hashing algorithm
  • Centralized hash computation vs consistent
    hashing
  • Is a 1000-node cluster really realizable?
  • Load balancing
  • File system design for high-throughput mail
    server
  • Other services using Porcupine infrastructure
  • Management Interface
Write a Comment
User Comments (0)
About PowerShow.com