Yottabytes and Beyond - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Yottabytes and Beyond

Description:

(shared under Creative Commons Attribution Share-alike License incorporated herein by reference) ... lets not forget your personal bit-torrent collection) ... – PowerPoint PPT presentation

Number of Views:182
Avg rating:3.0/5.0
Slides: 48
Provided by: rake3
Category:

less

Transcript and Presenter's Notes

Title: Yottabytes and Beyond


1
Yottabytes and Beyond
  • Demystifying Storage and
  • Building large Storage Networks
  • Part I
  • by Bhavin Turakhia, CEO, Directi
  • bhavin.t_at_directi.com
  • (shared under Creative Commons Attribution
    Share-alike License incorporated herein by
    reference)
  • (http//creativecommons.org/licenses/by-sa/3.0/)

2
Why is storage important?
  • Web 2.0 applications are an extension of your
    Desktop
  • SaaS is here and growing
  • Broadband is a reality
  • Storage costs are dropping
  • Everyone expects near-unlimited storage online
    Youtube, Flickr, Facebook et al are storing your
    life online
  • (.. And yea lets not forget your personal
    bit-torrent collection)

it would take 1400 TB to store your entire life
in video. 5700 TB if you want to know what was
happening around you. Another 73 TB for the audio
files of everything you heard (MP3 quality).
Thats about 6000 TB for a copy of your life
3
Agenda
  • Hard disks
  • SATA, SAS, FC, Solidstate
  • RAID
  • DAS
  • SAN

4
  • Large scale storage requires careful planning

5
  • Choosing your Hard Disk
  • (SATA, FC, SAS, SCSI, Solidstate)

6
Introduction to Hard Drives
  • Basic physical storage unit (aka Physical block
    device)
  • Variables to consider when selecting a drive
  • Type (SAS, SATA, FC)
  • RPM
  • Capacity
  • MTBF (Mean Time between Failures)
  • Life Expectancy

7
Hard Disk types
8
Hard Disk types
9
Hard Disk Conclusions
  • For high IOPs, database applications, low-storage
    requirements you have a choice between FC and
    SAS
  • SAS currently seems like the better option
  • Future SAS standards promise to be faster than FC
    (though it is likely they may remain neck to
    neck)
  • For high-storage requirements (video server, file
    servers, photo storage, archivals, mail servers,
    backup servers) SATA is the way to go
  • One may combine SAS and SATA to reduce average
    cost and achieve your goals especially since
    the backplanes are cross-compatible
  • Readup the spec sheet of the hard drives you plan
    on using for determining specifics

10
Solid State Drives
  • Uses solid state memory to store persistent data
  • Eliminates mechanical parts
  • Useful for creating efficient in-between caches
    or storing small to mid-sized high performance
    databases

11
Solid State Drives
  • References
  • Intro - http//en.wikipedia.org/wiki/Solid_state_d
    isk
  • RAM vs Flash based - http//www.storagesearch.com/
    ssd-ram-v-flash.html
  • SSD based SAN!!! ? - http//www.superssd.com/

12
  • RAID Primer
  • (0, 1, 2, 3, 4, 5, 6, TP, 01, 10, 50, 60)

13
Introduction to RAID
  • allows multiple disks to appear as a single
    contiguous physical block device
  • provides redundancy / high availability
  • A raid group appears as a single physical block
    device

RAID
HD1
HD2
HD1
HD2
14
Comparison of Single RAID Levels
15
Comparison of Single RAID Levels
16
Comparison of Single RAID Levels
17
Comparison of Single RAID Levels
18
Understanding the Parity Penalty
  • RAID 5 and RAID 6 store parity information
    against data for rebuild
  • Single Parity can be calculated using a simple
    XOR
  • eg abcdefghijkl on a 4 disk RAID 5 array
  • If Disk 2 fails then the data B can be
    recalculated as (01000001 XOR 01000011 XOR
    01000000) 01000010 B

12124286429
19
Understanding the Parity Penalty
  • Steps to change B to X on Disk 2
  • Read A, C and P
  • Recalculate P as A XOR X XOR C
  • Write X and P
  • A single update required 3 reads and 2 writes
  • Random writes in RAID 5 and RAID 6 are very very
    expensive

20
Understanding the Parity Penalty
  • Rebuilding in RAID 5 and RAID 6 is expensive
  • The cost increases with increase in number of
    disks
  • As if this isnt enough there is an additional
    penalty
  • All the writes after the computation (ie parity
    and the changed block) must be simultaneous
    (involving a two-phase commit operation)
  • The impact can be marginally reduced through
    write-back caching

21
Comparison of Nested RAID Levels
22
Comparison of Nested RAID Levels
23
Comparison of Nested RAID Levels
24
Nested RAID Misc Notes
  • RAID 10 is faster and better than RAID 01 for
    the same cost
  • RAID 60 is similar to RAID 50 except that the
    striped sets with parity contain dual parity
  • Ideally RAID 10 and RAID 50 will be the only
    nested RAID levels you will use

25
RAID Considerations
  • Select your Stripe Size by empirical testing
  • smaller stripe size increases transfer
    performance, decreases positioning performance,
    and vice versa
  • ideal stripe sizes depend on your application,
    typical data read in a read, sequential vs random
    reads etc
  • Try and select hard drives from separate
    production batches
  • Maintain sufficient Spares in a large array
    (typically 1 per 10-15 disks is sufficient)
  • Use Global spares across RAID groups if your
    controller supports it

26
RAID Considerations
  • Use hardware RAID unless performance is not a
    consideration
  • Especially nested RAID levels or parity based
    RAID consume more CPU cycles and increase
    rebuild time if implemented in software
  • General rule about Controller Cache the higher
    the better
  • Ensure the controller has battery backup to
    retain its cache in case of power failure
  • For internal RAID Controller cards use faster PCI
    buses (PCI-x)

27
  • The Fun starts Lets build our storage system

28
  • Passive Disk Enclosure based Direct Attached
    Storage (PDE based DAS)

29
Passive Disk Enclosure based DAS
  • DAS Direct Attached storage
  • RAID controller inside host machine
  • External chasis is simply a JBOD (Just a Bunch Of
    Disks)
  • (or what Id like to call Passive Disk Enclosure
    or PDE)
  • PDE enables stringing larger number of drives
    together as compared to internal RAID array
  • Eg Dell Powervault MD1000

30
Passive Disk Enclosure based DAS
  • Passive Disk Enclosure can consist of SAS, SATA
    or FC drives
  • Passive Disk Enclosure to RAID Controller
    connectivity can be SAS, FC, SCSI (possibly
    different from the backplane)
  • Multiple PDEs can be daisy chained if they
    support it
  • RAID card is a single point of failure
  • Only one host machine supported
  • Array of disks can be divided into multiple RAID
    groups

31
Passive Disk Enclosure based DAS
  • Array of disks can be divided into multiple
    heterogeneous RAID groups
  • Size and type of a RAID group depends on RAID
    card
  • PDE may have multiple paths to system with
    possibility of multiplexing for increased speed
  • Global spares can be defined on the RAID card
  • Maximum storage size maximum number of PDEs
    that can be daisy chained x size of drives

32
Passive Disk Enclosure based DAS
  • Performance Considerations
  • Drives
  • RAID configuration
  • PDE Interconnect
  • PDE to RAID Card connect
  • RAID card config (cache etc)
  • PCI bus

33
  • Active Disk Enclosure based Direct Attached
    Storage (ADE based DAS)

34
Active Disk Enclosure based DAS
  • ADE Difference - RAID Card is not in the host
    machine but in the enclosure
  • Host machine has a SAS/FC Host Bus Adaptor (HBA)
    depending on ADE to Host connectivity support
  • Some ADEs may support multiple connection
    protocols
  • ADE may support SAS/FC/SATA drives
  • ADE can support daisy-chaining PDEs
  • Eg of ADE Dell MD 3000, Infortrend eonstor
    devices, Nexsan Satabeast and Sataboy etc

35
Active Disk Enclosure based DAS
  • ADE may support dual RAID Controllers
  • RAID Controllers can be used as Active-Active
    (incase of multiple RAID Groups) otherwise as
    Active Passive
  • RAID Controller to HBA connectivity can be
    multiplexed - if supported - for higher
    throughput
  • ADEs are wrongly but commonly referred as SAN
    (SAN device would still be alright)

36
  • Partitioning and Mounting

37
Logical Volumes
  • A RAID Group is a physical unit of storage
  • At the Operating System a Logical Group can be
    created out of multiple RAID Groups
  • Each Logical Group can be further divided into
    Logical Volumes
  • Each Logical Volume represents a mountable block
    device
  • In Linux this is done using LVM
  • In LVM Logical Volumes are resizable

38
  • SAN (Storage Area Network)

39
SAN
  • Multiple host machines connected to an ADE
    through a SAN switch
  • SAN refers to the interconnect Switch ADE
    PDE
  • Switch and HBA can be SAS / FC depending on
    interconnect type supported by ADE
  • ADE would support creation of Volumes
  • These can be mounted onto Client and further
    subdivided

40
SAN
  • Care must be taken to mount each Logical Volume
    onto a single client (unless you are running a
    Clustered File System)
  • This can be achieved by host masking supported by
    ADE and/or the Switch
  • Without careful host masking and mounting data
    corruption can take place

41
SAN
  • Complex SAN configs include multiple hosts and
    multiple ADEs connected to active-active switches
    with multiplexed connections
  • Client hosts can be of heterogeneous operating
    systems
  • (Funnily ADE to PDE paths sometimes are not be
    multiplexed)

42
SAN
  • While this looks complex just think of it as
    removing hard disks from the machine and hosting
    them outside in separate enclosures
  • Each machine mounts an independent partition from
    the SAN

43
SAN
  • Performance Considerations
  • All variables we covered before
  • Switch config
  • Ensure that switch / HBA / interconnect does not
    become the bottleneck and full hdd throughput can
    be utilized

44
Throughput Calculations
  • Hard disk performance Type, RPM etc
  • Data distribution and Type of Data access
  • RAID performance, number of drives, RAID type
  • RAID card performance cache, active-active
    config etc
  • ADE to switch connection speed
  • Switch to HBA connection speed
  • HBA to PCI bus speed

45
  • Thats all Folks
  • Lets go build out our Yottabyte arrays and fill
    em up

Considerably exaggerated hyperbole given that
the combined space of all computers in the world
today (2007) doesnt add up to 1 Yottabyte (2
80 bytes). Infact the entire worlds storage is
projected to hit 988 exabytes (2 60) by 2010
6th Sep 2007 - http//www.networkworld.com/newsle
tters/stor/2007/0903stor2.html Nanotech
breakthrough could put entire YouTube contents on
an iPod-size device
46
Part II sneak preview
  • Complex SAN configurations
  • iSCSI
  • NAS
  • Clustered Storage
  • GFS
  • Backups
  • Storage Monitoring
  • Storage Benchmarking
  • Some Commercial storage vendors

47
Shameless HR Propaganda Slide
  • Directi builds cool Web products
  • Deployed on distributed architecture
  • Using terrabytes of storage
  • Used by millions of users
  • Generating billions of pageviews and
    transactions
  • Spanning every possible software engineering
    technology

http//careers.directi.com http//wiki.directi.c
om http//cosmos.directi.com
Personal Blog http//bhavin.directi.com Mail
bhavin.t_at_directi.com
Write a Comment
User Comments (0)
About PowerShow.com