Jerry Held - PowerPoint PPT Presentation

About This Presentation
Title:

Jerry Held

Description:

Title: Jerry Held Author: Analyst Day Last modified by: Ross Created Date: 11/6/1996 7:19:12 PM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 40
Provided by: Anal188
Category:

less

Transcript and Presenter's Notes

Title: Jerry Held


1
(No Transcript)
2
High Performance Communication for Oracle using
InfiniBand
Session id 36568
  • Ross Schibler
  • CTO
  • Topspin Communications, Inc

Peter Ogilvie Principal Member of Technical
Staff Oracle Corporation
3
Session Topics
  • Why the Interest in InfiniBand Clusters
  • InfiniBand Technical Primer
  • Performance
  • Oracle 10g InfiniBand Support
  • Implementation details

4
Why the Interest in InfiniBand
  • InfiniBand is key new feature in Oracle 10g
  • Enhances price/performance and scalability
    simplifies systems
  • InfiniBand fits broad movement towards lower
    costs
  • Horizontal scalability converged networks,
    system virtualization...grid
  • Initial DB performance scalability data is
    superb
  • Network tests done Application level benchmarks
    now in progress
  • InfiniBand is widely supported standard -
    available today
  • OracleDell, HP, IBM, Network Appliance, Sun and
    100 others involved.
  • Tight alliance btw Oracle and Topspin enables IB
    for 10g
  • Integrated tested delivers complete Oracle
    wish list for high speed interconnects

5
System Transition Presents Opportunity
  • Major shift to standard systems - blade impact
    not even factored in yet
  • Customer benefits from scaling horizontally
    across standard systems
  • Lower up-front costs, Granular scalability, High
    availability

6
The Near Future
Server Revenue Mix
18
16
Web Services
14
12
Enterprise Apps
10
Legacy Big Iron Apps
Database Clusters Grids
Share of Revenues
8
6
4
Scale Out
Scale Up
2
0
0-2.9K
3-5.9K
6-9.9K
10-
25-
50-
100-
250-
500-
1M-3M
3M
24.9K
49.9K
99.9K
249.9K
499.9K
999.9K
Price Band
  • Market Splits around Scale-Up vs. Scale-Out
  • Database grids provide foundation for scale out
  • InfiniBand switched computing interconnects are
    critical enabler

7
Traditional RAC Cluster
Application Servers
Oracle RAC
Gigabit Ethernet
Fibre Channel
Shared Storage
8
Three Pain Points
Application Servers
Oracle RAC
Gigabit Ethernet
Scalability within the Database Tier limited by
Interconnect Latency, Bandwidth, and Overhead
Throughput Between the Application Tier and
Database Tier limited by Interconnect Bandwidth,
and Overhead
Fibre Channel
I/O Requirements driven by number of servers
instead of application performance requirements
Shared Storage
9
Clustering with Topspin InfiniBand
Application Servers
Oracle RAC
Shared Storage
10
Removes all Three Bottlenecks
Application Servers
Oracle RAC
InfiniBand provides 10 Gigabit low latency
interconnect for cluster
Application tier can run over InfiniBand,
benefiting from same high throughput and low
latency as cluster
Central server to storage I/O scalability through
InfiniBand switch Removes I/O bottlenecks to
storage and provides smoother scalability
Shared Storage
11
Example Cluster with Converged I/O
  • Ethernet to InfiniBand gateway for LAN access
  • Four Gigabit Ethernet ports per gateway
  • Create virtual Ethernet pipe to each server
  • Fibre Channel to InfiniBand gateway for storage
    access
  • Two 2Gbps Fibre Channel ports per gateway
  • Create 10Gbps virtual storage pipe to each
    server
  • InfiniBand switches for cluster interconnect
  • Twelve 10Gbps InfiniBand ports per switch card
  • Up to 72 ports total ports with optional
    modules
  • Single fat pipe to each server for all network
    traffic

Industry Standard Server
12
Topspin InfiniBand Cluster Solution
Cluster Interconnect with Gateways for I/O
Virtualization
Ethernet or Fibre ChannelGateway modules
Family of switches
Host Channel Adapter With Upper Layer Protocols
Integrated System and Subnet management
  • Protocols
  • uDAPL
  • SDP
  • SRP
  • IPoIB
  • Platform Support
  • Linux Redhat, Redhat AS, SuSE
  • Solaris S10
  • Windows Win2k 2003
  • Processors Xeon, Itanium, Opteron

13
InfiniBand Primer
  • InfiniBand is a new technology used to
    interconnect servers, storage and networks
    together within the datacenter
  • Runs over copper cables (lt17m) or fiber optics
    (lt10km)
  • Scalable interconnect
  • 1X 2.5Gb/s
  • 4X 10Gb/s
  • 10X 30Gb/s

14
InfiniBand Nomenclature
15
InfiniBand Nomenclature
CPU
MemCntlr
System Memory
Host Interconnect
CPU
  • HCA Host Channel Adaptor
  • SM - Subnet manager
  • TCA Target Channel Adaptor

HCA
IB Link
Switch
SM
TCA
Ethernet link
IB Link
IB Link
TCA
FC link
IB Link
16
Kernel Bypass
Kernel Bypass Model
Application
uDAPL
async sockets
User
Sockets Layer
Kernel
TCP/IP Transport
SDP
Driver
Hardware
17
Copy on Receive
Server (Host)
CPU
System Memory
MemCntlr
App Buffer
Host Interconnect
CPU
OS Buffer
NIC
interconnect
18
With RDMA and OS Bypass
Server (Host)
CPU
System Memory
MemCntlr
App Buffer
Host Interconnect
CPU
OS Buffer
HCA
interconnect
19
APIs and Performance
Application
uDAPL
Async I/O extension
BSD Sockets
SDP
TCP
RDMA
IPoIB
IP
1GE
10G IB
0.8Gb/s
20
Why SDP for OracleNet uDAPL for RAC?
  • RAC IPC
  • Message based
  • Latency sensitive
  • Mixture of previous APIs
  •    ? use of uDAPL
  • OracleNet
  • Streams based
  • Bandwidth intensive
  • Previously written to sockets
  • ? use of Sockets Direct Protocol API

21
InfiniBand Cluster Performance Benefits
Network Level Cluster Performance for Oracle RAC
Block Transfer/sec (16KB)
Source Oracle Corporation and Topspin on dual
Xeon processor nodes
InfiniBand delivers 2-3X higher block
transfers/sec as compared to GigE
22
InfiniBand Application to Database Performance
Benefits
Percent
Source Oracle Corporation and Topspin
InfiniBand delivers 30-40 lower CPU utilization
and 100 higher throughput as compared to Gigabit
Ethernet
23
Broad Scope of InfiniBand Benefits
Intra RAC IPC over uDAPL over IB
FC gateway host/lun mapping
Ethernet gateway
OracleNet over SDP over IB
SAN
DAFS over IB
Network
NAS
Application Servers
Shared Storage
Oracle RAC
2x improvement in throughput and 45 less CPU
20 improvement in throughput
3-4x improvement in block updates/sec
30 improvement in DB performance
24
uDAPL Optimization Timeline
Workload
April-August 2003 Gathering OAST and industry
standard workload performance metrics. Fine
tuning and optimization at skgxp, uDAPL and IB
layers
Database
Feb 2003 Cache Block Updates show fourfold
performance improvement in 4-node RAC
CacheFusion
LM
skgxp
Jan 2003 added Topspin CM for improved scaling
of number of connections and reduced setup times
uDAPL
Dec 2002 Oracle interconnect performance
released, showing improvements in bandwidth (3x),
latency(10x) and cpu reduction (3x)
CM
Sept 2002 uDAPL functional with 6Gb/s throughput
IB HW/FW
25
RAC Cluster Communication
  • High speed communication is key
  • must be faster to fetch a block from a remote
    cache than to read the block from disk
  • Scalability is a function of communication CPU
    overhead
  • Two Primary Oracle Consumers
  • Lock manager / Oracle buffer cache
  • Inter instance parallel query communication
  • SKGXP Oracles IPC driver interface
  • Oracle is coded to skgxp
  • Skgxp is coded to vendor high performance
    interfaces
  • IB support delivered as a shared library
    libskgxp10.so

26
Cache Fusion Communication
LMS
Lock request
Shadow processes
to client
RDMA
cache
cache
27
Parallel Query Communication
PX Servers
PX Servers
msg data
to client
data
data
28
Cluster Interconnect Wish List
  • OS bypass (user mode communication)
  • Protocol offload
  • Efficient asynchronous communication model
  • RDMA with high bandwidth and low latency
  • Huge memory registrations for Oracle buffer
    caches
  • Support large number of processes in an instance
  • Commodity Hardware
  • Software interfaces based on open standards
  • Cross platform availability

InfiniBand is first interconnect to meet all of
these requirements
29
Asynchronous Communication
  • Benefits
  • Reduces impact of latency
  • Improves robustness by avoiding communication
    dead lock
  • Increases bandwidth utilization
  • Drawback
  • Historically costly, as synchronous operations
    are broken into separate submit and reap
    operations

30
Protocol Offload OS Bypass
  • Bypass makes submit cheap
  • Requests are queued directly to hardware from
    Oracle
  • Offload
  • Completions move from the hardware to Oracles
    memory
  • Oracle can overlap commutation and computation
    without a trap to the OS or context switch

31
InfiniBand Benefits by Stress Area
Stress Area Benefit
Cluster Network Extremely low latency 10 Gig throughput
Compute CPU kernel offload removes TCP overhead Frees CPU cycles
Server I/O Single converged 10 Gig network for cluster, storage, LAN Central I/O scalability
Stress level varies over time with each
query InfiniBand provides substantial benefits in
all three areas
32
Benefits for Different Workloads
  • High bandwidth and low latency benefits for
    Decision Support (DSS)
  • Should enable serious DSS workloads on RAC
    clusters
  • Low latency benefits for scaling Online
    Transaction Processing (OLTP)
  • Our estimate One IB Link replaces 6-8 Gigabit
    Ethernet links

33
Commodity Hardware
  • Higher capabilities and lower cost than propriety
    interconnects
  • InfiniBands large bandwidth capability means
    that a single link can replace multiple GigE and
    FC interconnects

34
Memory Requirements
  • The Oracle buffer cache can consume 80 of a
    hosts physical memory
  • 64 bit addressing and decreasing memory prices
    mean ever larger buffer caches
  • Infiniband provides
  • Zero copy RDMA between very large buffer caches
  • Large shared registrations moves memory
    registration out of the performance path

35
Two Efforts Coming TogetherRAC/Cache Fusion and
Oracle Net
  • Two Oracle engineering teams working at cluster
    and application tiers
  • 10g incorporates both efforts
  • Oracle Net benefits from many of the same
    capabilities as Cache Fusion
  • OS kernel bypass
  • CPU offload
  • New transport protocol (SDP) support
  • Efficient asynchronous communication model
  • RDMA with high bandwidth and low latency
  • Commodity hardware
  • Working on external and internal deployments

36
Open Standard Software APIsuDAPL and Async
Sockets/SDP
  • Each new communication driver is a large
    investment for Oracle
  • One stack which works across multiple platforms
    means improved robustness
  • Oracle grows closer to the interfaces over time
  • Ready today for immerging technologies
  • Ubiquity and robustness of IP for high speed
    communication

37
Summary
  • Oracle and major system storage vendors are
    supporting InfiniBand
  • InfiniBand presents superb opportunity for
    enhanced horizontal scalability and lower cost
  • Oracle Nets InfiniBand Support significantly
    improves performance for both the app server and
    the database in Oracle 10g
  • Infiniband provides the performance to move
    applications to low cost Linux RAC databases.
    ????

38
A
39
Next Steps.
  • See InfiniBand demos first hand on the show floor
  • Dell, Intel, Netapp, Sun, Topspin (booth 620)
  • Includes clustering, app tier and storage over
    InfiniBand
  • InfiniBand whitepapers on both Oracle and Topspin
    websites
  • www.topspin.com
  • www.oracle.com
Write a Comment
User Comments (0)
About PowerShow.com