PS1 Prototype Systems Design Jan Vandenberg, JHU

About This Presentation

Title:

PS1 Prototype Systems Design Jan Vandenberg, JHU

Description:

Cheap: PS1 prototype MD1000 pricing versus Newegg media costs ... Remember sub-Newegg media costs. 11. RAID-10 Performance ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 24

Provided by: janvand

Category:

more less

Transcript and Presenter's Notes

Title: PS1 Prototype Systems Design Jan Vandenberg, JHU

1
PS1 Prototype Systems DesignJan Vandenberg, JHU
2
Engineering Systems to Support the Database Design

Raw data size
Index size
Most end-user operations I/O bound
Loading/Ingest more cpu-bound, though we still
need solid write performance
Time to do full table scans
Time to do index scans
Need to do most work where the data is cant
sling TBs over the network quickly
though we can brute-force past 1 Gbit Ethernet
if necessary

3
Fibre Channel, SAN

Expensive but not-so-fast physical links (4 Gbit,
10 Gbit)
Expensive switch
Potentially very flexible
Industrial strength manageability
Little control over RAID controller bottlenecks

4
SATA

Fast
Cheap
Ugly, spooky
Tough to manage

5
SAS

For our purposes, its SATA without the ugliness
Fast 12 Gbit/s FD building blocks
Cheap PS1 prototype MD1000 pricing versus Newegg
media costs
Not Ugly IB cables versus rats nest
Industrial strength manageability pretty
blinking lights and mgmt apps versus downtime
plus white knuckles

6
I/O Performance of Dell SAS Systems in the PS1
Prototype
7
SAS Performance, Gory Details

SAS v. SATA differences

8
Per-Controller Performance

Luckily, one controller is fast enough for one
SATA disk box

9
Resulting PS1 Prototype I/O Topology

10
RAID-5 v. RAID-10?

Primer, anyone?
RAID-5 probably feasible with contemporary
controller
though tough to predict real-world effects of
latency
and not a ton of redundancy
But after we add enough disks to meet performance
goals, we have enough storage to run RAID-10
anyway!
Remember sub-Newegg media costs

11
RAID-10 Performance

Executive summary RAID0/2 for single-threaded
reads, RAID0 perf for 2-user/2-thread workloads.
RAID0/2 writes

12
PS1 Prototype Servers

interconnects

13
PS1 Prototype Servers

14
Projected PS1 Systems Design

15
Backup/Recovery/Replication Strategies

No formal backup
except maybe for mydbs, f(costpolicy)
3-way replication
Replication ! backup
Little or no history
Replicas can be a bit too cozy must notice
badness before replication propagates it
Replicas provide redundancy and load balancing
Fully online zero time to recover
Replicas needed for happy production performance
plus ingest, anyway
Off-site geoplex
Provides continuity if we lose HI (local or
trans-Pacific network outage, facilities outage)
Could help balance trans-Pacific bandwidth needs
(service continental traffic locally)

16
Why No Traditional Backups?

Not super pricey
but not very useful relative to a replica for
our purposes
Time to recover
Money no object do traditional backups too!!!
Synergy, economy of scale with other
collaboration needs (IPP?) do traditional
backups too!!!

17
Failure Scenarios

Easy, zero-downtime
Disks
Power supplies
Fans
Not so spooky, maybe some downtime and manual
replica cutover
System board (rare)
Memory (rare and usually proactively detected and
handled via scheduled maintenance)
Disk controller (rare, potentially minimal
downtime via cold-spare controller)
CPU (not utterly uncommon, can be tough and time
consuming to diagnose correctly)
More spooky
Database mangling by human or pipeline error
Gotta catch this before replication propagates it
everywhere
Cant replicate too aggressively
(and so off-the-shelf near-realtime replication
tools dont help us)
Catastrophic loss of datacenter
Have the geoplex
but were dangling by a single copy till
recovery complete
but are we still screwed? Depending on colo
scenarios, did we also lose the IPP and flatfile
archive?
Terrifying