Title: PS1 Prototype Systems Design Jan Vandenberg, JHU
1PS1 Prototype Systems DesignJan Vandenberg, JHU
2Engineering Systems to Support the Database Design
- Raw data size
- Index size
- Most end-user operations I/O bound
- Loading/Ingest more cpu-bound, though we still
need solid write performance - Time to do full table scans
- Time to do index scans
- Need to do most work where the data is cant
sling TBs over the network quickly - though we can brute-force past 1 Gbit Ethernet
if necessary
3Fibre Channel, SAN
- Expensive but not-so-fast physical links (4 Gbit,
10 Gbit) - Expensive switch
- Potentially very flexible
- Industrial strength manageability
- Little control over RAID controller bottlenecks
4SATA
- Fast
- Cheap
- Ugly, spooky
-
- Tough to manage
5SAS
- For our purposes, its SATA without the ugliness
- Fast 12 Gbit/s FD building blocks
- Cheap PS1 prototype MD1000 pricing versus Newegg
media costs - Not Ugly IB cables versus rats nest
- Industrial strength manageability pretty
blinking lights and mgmt apps versus downtime
plus white knuckles
6I/O Performance of Dell SAS Systems in the PS1
Prototype
7SAS Performance, Gory Details
8Per-Controller Performance
- Luckily, one controller is fast enough for one
SATA disk box
9Resulting PS1 Prototype I/O Topology
10RAID-5 v. RAID-10?
- Primer, anyone?
- RAID-5 probably feasible with contemporary
controller - though tough to predict real-world effects of
latency - and not a ton of redundancy
- But after we add enough disks to meet performance
goals, we have enough storage to run RAID-10
anyway! - Remember sub-Newegg media costs
11RAID-10 Performance
- Executive summary RAID0/2 for single-threaded
reads, RAID0 perf for 2-user/2-thread workloads.
RAID0/2 writes
12PS1 Prototype Servers
13PS1 Prototype Servers
14Projected PS1 Systems Design
15Backup/Recovery/Replication Strategies
- No formal backup
- except maybe for mydbs, f(costpolicy)
- 3-way replication
- Replication ! backup
- Little or no history
- Replicas can be a bit too cozy must notice
badness before replication propagates it - Replicas provide redundancy and load balancing
- Fully online zero time to recover
- Replicas needed for happy production performance
plus ingest, anyway - Off-site geoplex
- Provides continuity if we lose HI (local or
trans-Pacific network outage, facilities outage) -
- Could help balance trans-Pacific bandwidth needs
(service continental traffic locally)
16Why No Traditional Backups?
- Not super pricey
- but not very useful relative to a replica for
our purposes - Time to recover
- Money no object do traditional backups too!!!
- Synergy, economy of scale with other
collaboration needs (IPP?) do traditional
backups too!!!
17Failure Scenarios
- Easy, zero-downtime
- Disks
- Power supplies
- Fans
- Not so spooky, maybe some downtime and manual
replica cutover - System board (rare)
- Memory (rare and usually proactively detected and
handled via scheduled maintenance) - Disk controller (rare, potentially minimal
downtime via cold-spare controller) - CPU (not utterly uncommon, can be tough and time
consuming to diagnose correctly) - More spooky
- Database mangling by human or pipeline error
- Gotta catch this before replication propagates it
everywhere - Cant replicate too aggressively
- (and so off-the-shelf near-realtime replication
tools dont help us) - Catastrophic loss of datacenter
- Have the geoplex
- but were dangling by a single copy till
recovery complete - but are we still screwed? Depending on colo
scenarios, did we also lose the IPP and flatfile
archive? - Terrifying
18State Diagram for Replicas?
- Loading
- Replicating
- Load balancing
- Failing
- Recovering
- Possibly repeat-loading
19Operating Systems, DBMS?
- Sql2005 EE x64
- Why?
- Why not DB2, Oracle RAC, PostgreSQL, MySQL,
? - (Win2003 EE x64)
-
- Platform rant from JVV available over beers
20Systems/Database Management
- Active Directory infrastructure
- Windows patching tools, methodology
- Linux patching tools, methodology
- Monitoring
- Staffing requirements
21Facilities/Infrastructure Projections for PS1
- Cooling
- Rack space
- Network ports
- (plus AD/WSUS/monitoring infrastructure above)
22Operational Handoff to UofH
23Mahalo!(See Ya, Hon!)