Title: Marvel EV7 for OpenVMS: Proof Points from Live Customer Production Systems Tech Update, September 20
1Marvel EV7 for OpenVMS Proof Points from
Live Customer Production SystemsTech Update,
September 2003Steve Lieman, OpenVMS Performance
Group,
2Marvel Performance Characterization Project
- Unique OpenVMS approach
- Proof points
- Live customer systems
- Pre-release based on customer benchmarks
- Early adopter mission critical production systems
- and now mainstream production systems
- First use of proof points with Marvel creates the
foundation and infrastructure for future work
3How much benefit for you???
- How much improvement will you see when you
upgrade your largest most heavily loaded OpenVMS
systems to Marvel EV7?
GS 160 GS1280
4Want even more detail?
- The electronic version of this presentation
contains extensive notes pages for your further
study, reflection, and review.
5Which performance tests inspire the most
confidence for you?
- Chip speed, cache size, memory bandwidth?
- Heavily tuned industry standard tests?
- Customer developed benchmark tests?
- How well do these help you predict the actual
benefit that you will achieve in your situation?
6 Which performance tests inspire the most
confidence for you?
- A Unique OpenVMS alternative to traditional
methods - Production Proof Points
- from live Mission Critical Systems
- A growing series of proof points
- Each backed with detailed extensive hard data
- Taken from early adopters now mainstream users
- Showing before after proofs in detail
- Running applications software similar to your
usage - Bottom Line The unique OpenVMS approach to
performance (using live production proof points)
provides the highest predictive value
7Definition of Headroom
- Headroom helps explain performance on live
customer systems - Predicted height of roofline of maximum
throughput - Actual throughput PLUS estimated spare
capacity - Point of Maximum Throughput happens when load
increases until it levels off, but in recently
upgraded live systems, this does not typically
happen immediately.
8Performance Peace of Mind
- Raising the Roof
- A long-standing OpenVMS tradition
- Marvel EV7 creates an especially strong upward
step - Why is this 25 year long series of systematic
increases in OpenVMS headroom so important a
factor for you to consider? - Why are headroom comparisons between OpenVMS
systems running on older and new servers so
revealing of future value?
94P head-to-head test application Y
Appx 2X more powerful _at_4p
Marvel finishes here
1016P head-to-head test Application Y
More than 3X more powerful _at_16p
11Application Ys SMP Scaling Curve
Throughput compared to linear scaling
Further scaling past 16p likely
12SMP Scaling
EV7 X Curve
EV7 Z Curve
EV7 Y Curve
EV68 Z Curve
13Early VMS on Marvel EV7 Results Look Strong
- Better than Wildfire in every case
- Especially strong for SMP scaling
- Large drop in MPsynch
- Big jump in maximum projected headroom
- Maximum gains from 1.4 X to 3.5 X
14Gains in VMS OS Scaling Greater TPS
Throughput
TPS This varies with CPU model
7.3-1
Linear scaling
7.3
7.2-1H1
Point of Maximum Throughput
of CPUs (this also varies by workload)
15VMS on Marvel EV7 Scaling Gains
Marvel Scaling
Throughput TPS
Marvel linear scaling
Wildfire Linear scaling
Wildfire Scaling
of CPUs (this varies by workload)
161.4 X to 3.5X boost in maximum headroom
More than 2X increase in headroom in this case
GS 160 GS1280
17Comparing the Relative Performance of the ES47 to
the ES45
NOTE Rdb1 Test and RMS1 test are based on VMS
customer workloads
18Upgrade Path for Maxed out ES45 Systems that need
more scaling
- For ES45 systems that have reached their maximum
throughput and capacity, an ES80 or a GS1280 will
prove to be an an excellent and effective upgrade
path.
19Factors determining size of gain
- Current alpha server, current speed CPU
- Number of CPUs
- Type of workload and its SMP scalability
- Mix and intensity of Spinlock usage
- Current operating system version
- Current versions of Oracle, TCPIP, your
application - Current bottleneck or limiting factor
- Best to Focus on Idea of Marvels impact on your
predicted Headroom
20What to Expect with Marvel EV7
- Best server platform ever for VMS
- Best SMP scaling ever for VMS
- Best throughput and headroom ever for VMS
- More VMS applications will get useful scaling
results to 12-16 CPUs and beyond - Excellent out-of-the-box performance with further
opportunities for tuning
21Proof Points of Olympic Proportions
22(No Transcript)
23Background Slides
Passing the Baton
EV68 performance
Upgrade to EV7
24Passing the Baton
What happened with other live production
systems? Lets take a look using data captured
with T4 automated collection viewed with our
internal timeline visualizer (TLViz) Bottom
Line Massive increase in maximum OpenVMS
headroom
25(No Transcript)
26Background Slides
2716 CPU GS1280 Memory Latency
172
136
172
208
172
136
70
136
172
136
172
208
172
244
208
208
Average 170 ns
5 CPUs lt 136 ns
6 CPUs lt 172 ns
5 CPUs lt 244 ns
EV67 GS320 local latency 330 ns remote 960 ns
28Performance Improvements in V7.2-2 and V7.3
- V7.2-2 and V7.3 (and Penguin)
- Dedicated-CPU lock manager
- Process scheduling, idle loop
- MUTEX without SCHED Spinlock
- SYSRESCHED (used by DECthreads and Oracle)
- SYSGETJPI
- MailBox driver
- V7.3
- Fibre fastpath
- SCSI fastpath
29Performance Improvements in V7.3-1
- AST Delivery
- Mailboxes
- RMS Global Buffer Locking
- Reduce IOLOCK8 usage by Fibre/SCSI
- Improved IO Completion for RAMdisk, Mailbox
Shadowing IO - Reduced Balance Slot size
- Timer Queue Processing
- Distributed Interrupts for Fast Path Drivers
- Various NUMA Changes
30Performance Improvements beyond V7.3-1
- LAN
- Fastpath LAN drivers
- Fastpath PEdriver
- TCPIP
- Scaling changes
- Remove WSMAX and BALSETCNT restrictions
- XFC
- Alleviate SMP bottlenecks with very high cache
rates - Continued reduction of SCHED Spinlock usage
31LAN and PE Fastpath
- LAN Drivers
- Move off of IOLOCK8 to LAN device specific
spinlocks - Allow device interrupts to CPUs other than the
primary - PEdriver
- Move off of IOLOCK8 to PE specific spinlocks
- Allow a specific CPU to be chosen for PEdriver
processing
32TCPIP PerformanceCurrent Synchronization
Mechanisms
- Single Threaded
- One user/operation in execution at any instance
- Needed to guarantee synchronization of internal
kernel data structures - True regardless of the number of CPUs or users
- Synchronization achieved using global single
Spinlock IOLOCK8 - Contention with other IOLOCK8 users
- DECnet, LAN drivers, SCS, etc.. Everybody!
33TCPIP PerformanceFuture Synchronization
Mechanisms
- Multiple dynamic spinlocks
- No more IOLOCK8
- Queue KRP (kernel request packet)
- Handled by fork thread on non-primary CPU
- Similar to dedicated lock manager
- Improve concurrency
- Multiple concurrent network I/O
34(No Transcript)