PS1 Prototype Systems Design Jan Vandenberg, JHU - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

PS1 Prototype Systems Design Jan Vandenberg, JHU

Description:

Cheap: PS1 prototype MD1000 pricing versus Newegg media costs ... Remember sub-Newegg media costs. 11. RAID-10 Performance ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 24
Provided by: janvand
Category:

less

Transcript and Presenter's Notes

Title: PS1 Prototype Systems Design Jan Vandenberg, JHU


1
PS1 Prototype Systems DesignJan Vandenberg, JHU
2
Engineering Systems to Support the Database Design
  • Raw data size
  • Index size
  • Most end-user operations I/O bound
  • Loading/Ingest more cpu-bound, though we still
    need solid write performance
  • Time to do full table scans
  • Time to do index scans
  • Need to do most work where the data is cant
    sling TBs over the network quickly
  • though we can brute-force past 1 Gbit Ethernet
    if necessary

3
Fibre Channel, SAN
  • Expensive but not-so-fast physical links (4 Gbit,
    10 Gbit)
  • Expensive switch
  • Potentially very flexible
  • Industrial strength manageability
  • Little control over RAID controller bottlenecks

4
SATA
  • Fast
  • Cheap
  • Ugly, spooky
  • Tough to manage

5
SAS
  • For our purposes, its SATA without the ugliness
  • Fast 12 Gbit/s FD building blocks
  • Cheap PS1 prototype MD1000 pricing versus Newegg
    media costs
  • Not Ugly IB cables versus rats nest
  • Industrial strength manageability pretty
    blinking lights and mgmt apps versus downtime
    plus white knuckles

6
I/O Performance of Dell SAS Systems in the PS1
Prototype
7
SAS Performance, Gory Details
  • SAS v. SATA differences

8
Per-Controller Performance
  • Luckily, one controller is fast enough for one
    SATA disk box

9
Resulting PS1 Prototype I/O Topology

10
RAID-5 v. RAID-10?
  • Primer, anyone?
  • RAID-5 probably feasible with contemporary
    controller
  • though tough to predict real-world effects of
    latency
  • and not a ton of redundancy
  • But after we add enough disks to meet performance
    goals, we have enough storage to run RAID-10
    anyway!
  • Remember sub-Newegg media costs

11
RAID-10 Performance
  • Executive summary RAID0/2 for single-threaded
    reads, RAID0 perf for 2-user/2-thread workloads.
    RAID0/2 writes

12
PS1 Prototype Servers
  • interconnects

13
PS1 Prototype Servers

14
Projected PS1 Systems Design

15
Backup/Recovery/Replication Strategies
  • No formal backup
  • except maybe for mydbs, f(costpolicy)
  • 3-way replication
  • Replication ! backup
  • Little or no history
  • Replicas can be a bit too cozy must notice
    badness before replication propagates it
  • Replicas provide redundancy and load balancing
  • Fully online zero time to recover
  • Replicas needed for happy production performance
    plus ingest, anyway
  • Off-site geoplex
  • Provides continuity if we lose HI (local or
    trans-Pacific network outage, facilities outage)
  • Could help balance trans-Pacific bandwidth needs
    (service continental traffic locally)

16
Why No Traditional Backups?
  • Not super pricey
  • but not very useful relative to a replica for
    our purposes
  • Time to recover
  • Money no object do traditional backups too!!!
  • Synergy, economy of scale with other
    collaboration needs (IPP?) do traditional
    backups too!!!

17
Failure Scenarios
  • Easy, zero-downtime
  • Disks
  • Power supplies
  • Fans
  • Not so spooky, maybe some downtime and manual
    replica cutover
  • System board (rare)
  • Memory (rare and usually proactively detected and
    handled via scheduled maintenance)
  • Disk controller (rare, potentially minimal
    downtime via cold-spare controller)
  • CPU (not utterly uncommon, can be tough and time
    consuming to diagnose correctly)
  • More spooky
  • Database mangling by human or pipeline error
  • Gotta catch this before replication propagates it
    everywhere
  • Cant replicate too aggressively
  • (and so off-the-shelf near-realtime replication
    tools dont help us)
  • Catastrophic loss of datacenter
  • Have the geoplex
  • but were dangling by a single copy till
    recovery complete
  • but are we still screwed? Depending on colo
    scenarios, did we also lose the IPP and flatfile
    archive?
  • Terrifying

18
State Diagram for Replicas?
  • Loading
  • Replicating
  • Load balancing
  • Failing
  • Recovering
  • Possibly repeat-loading

19
Operating Systems, DBMS?
  • Sql2005 EE x64
  • Why?
  • Why not DB2, Oracle RAC, PostgreSQL, MySQL,
    ?
  • (Win2003 EE x64)
  • Platform rant from JVV available over beers

20
Systems/Database Management
  • Active Directory infrastructure
  • Windows patching tools, methodology
  • Linux patching tools, methodology
  • Monitoring
  • Staffing requirements

21
Facilities/Infrastructure Projections for PS1
  • Cooling
  • Rack space
  • Network ports
  • (plus AD/WSUS/monitoring infrastructure above)

22
Operational Handoff to UofH
23
Mahalo!(See Ya, Hon!)
Write a Comment
User Comments (0)
About PowerShow.com