Title: Storage Architecture 101
1Storage Architecture 101
- Charlie Cassidy
- Cassidy Consulting Group
November 16, 2009
2New Technology Hype Cycle
SATA, iSCSI and Infiniband
SAN, NAS
ATA, SCSI, 100baseT, RAID
Peak of Inflated Expectations
Plateau of Productivity
Slope of Enlightenment
Fibre Channel, 1394
Attention
Trough of Disillusionment
Fails to Establish Compelling End-User Value
Trigger
Time
Source Gartner Group
3Basic Storage and Systems Architecture Concepts
- Network architecture
- Network topologies
- Protocol stacks
- Network components
- Storage architecture
- Storage interconnects
- Storage management
4Network Architecture Concepts
- Network topologies
- Bus
- Star
- Loop
- Physical vs. Logical
- Protocol stacks
- TCP/IP Ethernet
- Network components
- Hub
- Router
- Bridge
- Switch
- Firewalls proxy servers
5Basic Network Topologies
6Bus
- What is it?
- Transmission path where each node picks up all
signals
- Advantages
- Architectural simplicity
- Every node sees everything
- Disadvantages
- Limited scalability
- Speed / scalability tradeoff
- Transmission line effects
- Electrical complexity
- Major uses
- PCI
- SCSI
- Ethernet
7Loop or Ring
- What is it?
- Transmission path where each node connects to two
other devices
- Advantages
- Ease of cabling
- Physically easy to scale
- High aggregate bandwidth if you take advantage of
temporal reuse
- Major uses
- Fiber Channel Arb. Loop
- Token Ring
- Disadvantages
- Need to deal with loop breakage
- Long loops have long latency
8Star
- What is it?
- Also called hub and spoke
- Transmission path where each node is connected to
a central location
- Advantages
- Simple, point-to-point cabling
- Easier electrical interface
- Can remove nodes without disturbing others
- Disadvantages
- Lots of wires
- Wire length
- Cost of hubs/switches
- Major uses
- 10/100 base T (physically, also logically if
switched) - VAXcluster CI
- POTS
9Networking Components
- Hub
- Simple interconnect point for a star
- Switch
- Interconnect point for a star with multiple
internal paths - Operates at datalink level
- Bridge
- Interconnection between two networks at datalink
layer - May do protocol translation (e.g. FC-SCSI bridge)
- Router
- Interconnection between two networks at network
layer - Often combined with switches
10Most Networks Are Combinations
11Protocol Stacks
- Layering of functionality that trades off a
little efficiency for - Simplicity
- Extensibility
- Robustness
- Efficiency often gained back by the gains in
simplicity
Data
Header 4
Data
Header 3
Data
Header 2
Data
Header 1
12Protocol Stack
NFS
FTP SMTP SNMP Telnet
7 Application
XDR
6 Presentation
RPC
5 Session
TCP, UDP
4 Transport
IP
3 Network
Routing
ICMP
FC-4 Upper Mapping
FC-3 Common Services
ARP, RARP
2 Data Link
FC-2 Signaling
Not Specified
FC-1 Encode/Decode
1 Physical
FC-0 Media
OSI
IP
Fiber Channel
13SCSI Architecture
14The SNIA Storage Networking ModelInfrastructure
for Data
- Storage Clients Consumers such as file
systems, DBMS, applications - Virtualization Aggregators (such as LUN
managers, RAID, etc.) and attribute-extenders - Storage Devices Disk Subsystems, Tape
File/Record Subsystem
Storage Interconnect
Block Aggregation
Device Interconnect
Storage Device
Source SNIA web site
15What is a SAN?
File/Record Subsystem
HP/UX
Linux
AIX/ Monterey
Windows NT/2000
Solaris
Storage Interconnect
SAN
Block Aggregation
SAN Services
Device Interconnect
Virtual Array
Disk Array
Tape Library
Tape
Disk Array
Storage Devices
Source SNIA web site
16The SNIA Storage Model
Source SNIA web site
17Storage Architecture Concepts
- Interconnect vs. Architecture
- SAN Storage (or Systems) Area Network
- NAS Network Attached Storage
- RAID Redundant Arrays of Inexpensive (or
Independent) Disks - JBOD - Just a Bunch of Disks
18SAN
- What is it?
- Storage (or System) Area Network
- Usually Fiber Channel (either Cu or Fiber)
- Block addressed storage
- Advantages
- System scalability
- Advanced, flexible availability features (RAID)
- Advanced management features (Storage Resource
Management SW, GUIs)
- Disadvantages
- Very high cost
- Lack of interoperability between vendors
(although this is improving)
- Major players
- EMC, IBM, Sun, Compaq
- Brocade, McData/EMC, Ancor/Qlogic, Gadzoox, Vixel
19NAS
- What is it?
- Network Attached Storage
- Usually Ethernet attached
- File accessed storage
- Advantages
- Lower cost than SAN
- Ubiquitous interoperability w/ Ethernet
- Cross-platform file sharing
- Scalability
- Disadvantages
- Limited scalability for a single file system
(needs SW assist) - Limited flexibility and robustness at low end
(RAID0,1 only or single RAID5 LUN)
- Major players
- Network Appliance, EMC
- Quantum/Snap, Maxtor
20RAID
- What is it?
- Redundant Arrays of Inexpensive (or Independent)
Disks - A way to gang multiple disk drives together to
enhance capacity and availability
- Advantages
- Availability
- Capacity
- Performance (?)
- Disadvantages
- Complexity
- Scalability (being fixed)
- Manageability
- Major players
- EMC, IBM, Sun, CPQ, HP
- A host of smaller VARs (MTI, Data Direct)
21Reliability vs. Availability vs. Lifetime
- Availability expressed as uptime (e.g. 99.9)
- MTBF/(MTBF MTTR)
- Reliability expressed as Mean Time Between
Failures (in hours) - Only applies to a population
- Lifetime is the life expectancy of a device
The Reliability Bathtub Curve
Infant Mortality (Quality)
Wearout
Steady State
MTBF
Lifetime
22Specific Interconnect Technologies
- ATA
- SATA
- SCSI
- USB
- FCAL
- GbE
- Infiniband
- 1394 (Firewire, iLink)
23Storage Interconnect Market Vision
- SERVERS DISAGGREGATE and network storage becomes
the standard attachment and configuration - IP storage will dominate after 2004
- Fibre channel is growing but, will plateau
within 3 years - STORAGE CONNECTS to the fabric, InfiniBand,
Ethernet, SAN - Dedicated-function SERVER AND STORAGE APPLIANCES
ship in the millions
100
90
80
70
60
50
Percent of Total New Server Shipments
40
30
20
10
0
Source Strategic Research, 9/00
24ATA
- What is it?
- AT Attach
- A.k.a. IDE (Integrated Drive Electronics)
- Low cost, ubiquitous HDD interface for PCs
- Advantages
- Lowest cost
- Relatively fast performance for single drive on
modern versions (ATA/66, ATA/100)
- Disadvantages
- Limited scalability (2 drives)
- Lack of performance features (command queuing,
large caches, high RPM motors)
- Major players
- All HDD vendors (Quantum leads on development)
- All PC manufactures
25SATA
- What is it?
- Serial version of ATA interface
- Up to 1 Gb/s
- Advantages
- Low cost
- Easier cabling than ATA
- Disadvantages
- Limited scalability (2 drives)
- Lack of performance features (command queuing,
large caches, high RPM motors) THIS WILL GET
FIXED
- Major players
- Quantum, WD, Seagate, Maxtor
- IBM, Sun
26SCSI
- What is it?
- Small Computer Systems Interface
- Legacy midrange interconnect for disks, tapes,
storage subsystems and other peripherals
(Fishfinders)
- Advantages
- Medium cost
- Scalable up to 15 drives per bus
- Advanced performance features (command queuing,
large caches, high RPM motors)
- Disadvantages
- Higher cost / lower volumes than ATA
- Higher complexity (termination issues, cabling
issues) - Fat, short, expensive cables
- Major players
- IBM, Seagate, Quantum, Fujitsu
- Dell, Compaq, EMC, IBM, HWP
27USB
- What is it?
- Universal Serial Bus
- Modern serial peripheral bus for fishfinders
(printers, scanners, cameras, floppies, some
HDDs)
- Advantages
- Simple
- Low Cost
- Scalable through inexpensive hubs
- Disadvantages
- Slow (USB 1.1)
- Major players
- Driven by Intel
- All major PC manufacturers and peripheral makers
28FCAL
- Advantages
- Scalable to very large configurations
- Same protocol as SAN, more efficient
- What is it?
- Fiber Channel Arbitrated Loop
- Copper version of fiber channel connected in loop
topology
- Disadvantages
- Need to deal with loop breakage (can use switched
star topology to deal with this) - Higher cost and power
- Interoperability has been an issue
- Complexity
- Major players
- Seagate, IBM, Fujitsu
- EMC, Sun, CPQ, IBM, HWP, Dell
- Brocade, Crossroads,
29Infiniband
- What is it?
- Next generation computer room interconnect
specifically targeted at Internet computing - Serial, Up to 1 Gb/s
- Serious competition to FC for next generation SAN
interconnect
- Advantages
- Low cost
- Low overhead
- Marries well to VI architecture to give data
transfer to/from user space
- Disadvantages
- TTM of infrastructure / complete solution
- Major players
- Intel, IBM
- Qlogic, LSI Logic
30Gigabit Ethernet
- What is it?
- Next Generation LAN (1 Gb/s)
- Either block accessed (iSCSI) or file accessed
(NAS) - Another serious contender for next generation
SANs (w/ iSCSI)
- Advantages
- Hardware will be ubiquitous
- Well developed industry infrastructure with
excellent interoperability
- Disadvantages
- TCP/IP stack is inefficient for storage (several
companies looking at doing in ASICs)
- Major players
- Cisco, IBM, Sun, EMC
311394 (Firewire, iLink)
- Advantages
- Low cost
- Isochronous transfers (guaranteed BW for
streaming applications)
- What is it?
- A/V interconnect for digital video, consumer
electronics - 400 Mb/s in 1st generation
- Disadvantages
- Limited success
- Complex architecture with many options (a la
SCSI, FC)
- Major players
- Sony (iLink) , Apple (Firewire), Maxtor
32Key Storage User Concerns
- Management
- Enormous growth, larger assets, multiple assets
- Interoperability
- Heterogeneous server, storage and infrastructure
- Disaster recovery
- Shrinking backup windows, backup consolidation,
network and server congestion
Source SNIA, 2/2001
33Storage Management Costs
Storage Management Costs (/MB/Mo.)
- Lack of management can increase costs 3 to 10x
- Continually adding additional HDDs only works for
small sites - Decentralization is adding to storage costs
- Unix and NT significantly behind enterprise (IBM)
in storage management - Reduced management cost is one of the major goals
of SANs
Without Storage Management
With StorageManagement
IS Personnel Cost
Source Fred Moore, Storage Panorama 2000
34The (Proposed) Answer Virtualization
- Decouple the physical storage from the objects
being managed - Virtual disks have
- Capabilities performance, reliability,
availability - Attributes cost
- Data has
- Needs performance, reliability, availability
- Management becomes matching capabilities and
needs can be automated
35Virtualization
- Can be done in host software, subsystem or by a
special virtualization appliance - Can be in the data path (in-band, symmetric
pooling) or out of the data path (out-of-band,
asymmetric pooling) - Presents a single unified model for storage
- Reduces the number of objects under management
- Can be scaled easily and dynamically