Title: The Secret Life of your email server
1The Secret Life of your email server
- Colin Chaplin Bsc (Hons) MCSE
- Technical Architect, Unisys Global Information
Services
2Topics
- Email Server Concepts
- Email Components
- Email Routing
- Design Considerations
- Availability
- Backup and restore
- Sizing
3History
- Multitude of different and non-communicative
systems since 1970s - Internet and SMTP (RFC821 Jonathan Postel) made
email de-facto from 1982 - Massive growth as a serious business and personal
communication tool
4Email System Components
- Client what the end user controls
- Server(s) provides the following services
- Database holds the emails, Communicate with
clients - Mail Transfer Agent (MTA) responsible for
shipping the email to the outside world. Called a
Connector - Directory Tells the MTA where to route emails
and can also allow clients to pick addresses
5The Raw Parts
To/From foreign MTAs
Client Actions receive, retrieve, send email
Email In/Out
6Email Routing across internet
- mailbox _at_ organisation, e.g Colin.Chaplin_at_Unisys.
com - MTA uses DNS to translate organisation into one
(or more) servers it can ship to - unisys.com -gt smtp1.unisys.com -gt 151.176.34.2
- MTA ships email to remote MTA using SMTP protocol
- Remote mail system decides what to do with the
email
7Email Routing Example
Smtp1.unisys.com
Smtp2.unisys.com
Trifle.dessert.local
Smtp3.unisys.com
DNS
Active Directory
8SMTP Communication
- 220 usbb-lacimss1 Trend Micro InterScan Messaging
Security Suite, Version 5.5 ready at Sun, 27 Nov
2005 073610 -0500 - helo trifle.dessert.local
- 250 usbb-lacimss1 Hello 62.49.21.166
- mail fromltcolin_at_chaplin.me.ukgt
- 250 ltcolin_at_chaplin.me.ukgt Sender Ok
- rcpt toltcolin.chaplin_at_unisys.comgt
- 250 ltcolin.chaplin_at_unisys.comgt Recipient Ok
- data
- 354 usbb-lacimss1 Send data now. Terminate with
"." - subject test
- hello
- 123
- How Now Brown Cow
- .
- 250 usbb-lacimss1 Message accepted for delivery
9(No Transcript)
10SMTP Continued
- Simple !
- NO Authentication of Sender (president_at_whitehouse.
gov) - The reason why spam exists
- Only a method for sending email, NOT a method for
retrieving
11Microsoft Exchange
- Industry Standard corporate messaging system
- 5th Version (2003) mature
- Own Client and Protocol (Outlook MAPI) plus
industry standards (POP3, IMAP4, HTTP) - Rich Functionality
- Scalable
- Probably No.1 common business app
12Design Considerations
- Size
- How Many Users (50 100,0000 )
- How much email (50MB per user ?)
- Where are the users?
- Availability
- How much downtime is tolerated (SLA)
- how to mitigate single points of failure
- BACKUP AND RESTORE
13Sizing
- An email server typically demands a high
bandwidth connection with its clients and
connection speed can dictate placement of servers - Political and administrative boundaries can also
dictate placement of servers - Cheaper bandwidth meaning business can run email
from one datacenter
14Less than 500 Users
- Single Server Solution, basic configuration
- Local Backup (Direct Attached Tape Drives)
- Supported occasionally by local techie
15500 Plus Users
- Multiple Mailbox Servers
- Perhaps multiple connectors (MTA) servers
- Perhaps Clustered
- Workgroup Class Backup (small tape robots, etc)
- Supported by support team as part of their duties
161000 100,000 Plus Users
- Multiple Mailbox Servers, very high spec
- Multiple Servers for other features - MTA, Web
Access, FAX-to-email gateway, Blackberry - Enterprise Class Storage and Backup (SAN and Tape
Library) - Cold Standby parts
- Dedicated Support Staff
17Backup and Restore
- Restore time drives many design decisions even
the fastest backup/restore system can take hours
with big databases - Backups typically done nightly to tape,
direct-to-disk, tape robots. Stored offsite - Restore process documented, practised, and tested
regularly - Goal for backup restore is NO email loss and
quick recovery
18The Wonder of Transaction Logs..
- Restore last nights backup
- Logfiles will feed data back into database
19Transactions Logs cont
- Transaction Logs record all changes to database
- If transaction logs are applied to restored
database, Exchange will play forward and bring
database up to the point it was lost - Transaction logs are serial read/writes,
database access is random read/write
20Achieving High Availability
- Resilient Servers redundant fans, disks (RAID)
anything that moves ! - Multiple Servers performing same job
- Clustering (NOT perfect!)
- Proper Design
- Monitoring and Support
215 Minute RAID
- Redundant Array Inexpensive/ Independent Disks
- Split data across multiple disks for resiliency
and/or performance. Hot Swappable - RAID 0 split across two or more disks, no
resilience, but fast (Never use in
high-availability design) - RAID 1 Mirror data across two or more disks
(often called mirroring). ½ of diskspace
wasted in providing resiliency - RAID 5 Compute a checksum and use maths to
figure out what information is on the missing
disk. Wastes at least one disk - More Spindles, more speed !
22Lets store the number 123456 on a RAID set
1
2
3
4
5
6
ALL Data lost !
RAID 0
1
3
5
4
2
1
3
5
4
2
6
6
RAID 5
All OK, all data intact
Split data into chunks Compute a checksum every
nth chunk, where n is the number of disks Write
the chunks to disk, and put the checksum on a
different disk every time we write a line Use
simple maths if we loose a disk
3
1
2
No Maths needed
3
4
7
7 3 4
5
6
11
11 5 6
23Clustering
- 2 or more node servers each connected to common
disk backend (can access the same data) - Virtual Server presented to the network, owned by
one of the machines (the active node) - When one of the other nodes detects the active
node is unavailable, it will take over - Active/Passive Active/Active also possible
24Client PC
Other node detects failure, obtains resources,
and responds to requests to Virtual Server
Active Node will Respond to request to virtual
node
Virtual Server
25Greatso what ?
- Provides a method for complete server resiliency
- Technically possible to have nodes in separate
locations - Applications must be cluster aware (exchange
is) - BUT
- Database is still single point of failure,
perception challenge - Hardware costs double
- Clustering complexity can cause outages !
26Design a server for 1000 people
- Mailbox size 50MB
- Additional part of existing infrastructure
- 4 hours SLA
- On-site computer room lacking in space
- GBit backbone network
- Large Tape Robot System
- Single Site, LAN speed
- 50 x 1000 users 50GB Database size
- Single Server
- Use existing backup system, over LAN
- Exchange 2003 running on Windows 2003 Server,
Mcafee GroupShield for email virus scanning,
NetIQ Appmanager for monitoring
27Typical Spec
- HP Compaq DL380 G4
- 4GB RAM
- 2 Processors
- Onboard SCSI RAID
- 4 x18GB Disks, 2 x 144GB Disks
- Redundant fans, Power supply, network connection
28Design a server for 1000 people
2 x 18 GB RAID 1 C Drive System (Windows)
2 x 18 GB RAID 1 L Drive TransactionLogs
2 x 144 GB RAID 1 N Drive Database
29Questions