Edward Jones IS Capacity Planning and Performance Management - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Edward Jones IS Capacity Planning and Performance Management

Description:

... 1 subsystem IDMS 5 regions, 15 million run units/day RRDF replication in DB2 and IDMS to Tempe Responsibilities Assure system performance and scalability. – PowerPoint PPT presentation

Number of Views:287

Avg rating:3.0/5.0

Slides: 30

Provided by: RickPr9

Category:

more less

Transcript and Presenter's Notes

Title: Edward Jones IS Capacity Planning and Performance Management

1
Edward Jones IS Capacity Planning and Performance
Management

Jim Poletti
October 23, 2007

2
About Edward Jones. . .

Full service investment firm
10,000 branches US, Canada, UK
1 "broker" and 1 branch office administrator per
branch
Land-line WAN DSL or T1
St Louis datacenter is hub for most traffic
Tempe datacenter primarily DR for mainframe
21,000 users signed on to CICS at high-water

3
IS Capacity Planning Performance Management

Rich Unnerstall (Director Data Center
Operations)
Art Morlock (Department Leader)

Jim Poletti (MF Performance Analyst)
Gerry Oliver (MF Performance Analyst)
Greg Volk (Network Performance Analyst)
Rick Pranger (Open Systems Performance Analyst)
Dwayne Allen (Open Systems Performance Analyst)
Tom Siech (Load Tester)
Brandy Brown (Load Tester)

4
St. Louis Mainframe Hardware

All LPARs run on 1 physical mainframe
IBM Z9 2094-707 3516 MIPs Z/os 1.7
80 GB memory
40 TB DASD EMC Raid -1 and -7, 5 Ms
Older symmetrix replacing with DMX-4
Data replication to Tempe using SRDF

5
CPU by LPAR
6
Production Environment/LPAR

1 LPAR (no data-sharing SYSPLEX yet)
25 CICS regions 19 AORs, 5 TORs,1 FOR
32 Million CICS transactions/day 7 million user
"enters"
DB2 1 subsystem
IDMS 5 regions, 15 million run units/day
RRDF replication in DB2 and IDMS to Tempe

7
Responsibilities

Assure system performance and scalability.
Provide capacity planning support for purchasing
decisions.
Tune the mainframe hardware "till the wheels come
off", then buy capacity.
Hotline, war room participation.
Performance Testing.

8
Early Morning "System Checks"

Check system "barometers" from yesterday
Check performance graphs and reports
CICS transactions Volume, CPU, Response
LPAR CPU
Memory
DASD
DB2
IDMS
Development response time TSO, compiles

9
Houston, we have a problem !

Go into detective mode
Start at high level, look at service classes
within LPAR for abnormalities

10
Daily Workload Statistics
For 930-1030 on Wed, Oct 17,
2007 Compared to Prior 4
Wednesdays
Service CPU CPU Change Real Real
Class Util Util in Change Memory Memory
17-Oct Prior 4 CPU CPU Gb Prior 4
Wednesdays Util Wednesdays
BAT_HOT 0.3 0.3 0 -8 7.6 8.6
BAT_1 1.6 1.5 0.1 5 20.1 15.7
BAT_2 3.6 3.6 0 1 52 126.2
CICS_1 11.8 11.2 0.6 6 1490 1490
CICS_2 33.4 34.5 -1.2 -3 2037 2246
CICS_3 0.6 0.8 -0.2 -27 315.5 352.5
DB2_HI 1.6 1.8 -0.2 -11 6648 6636
DB2_LO 0.6 0.6 -0.1 -11 21.9 25.5
IDMS 11.3 11.9 -0.6 -5 1390 1398
MQSERIES 0.3 0.2 0.1 35 775 418.7
NEWWORK 0 0 0 -44 0
11
Dig deeper into details of the workload
Program SUM CPU CICS DB2 CPU DB2 DB2 Pct Resp Resp
Name Time CPU Time Change CPU Time Change Time Time
930 to Time Prior 4 CPU Time Prior 4 DB2 Prior 4
1030 Per Weds Per Weds Weds
Tran Tran
CMSOC300 884 0.0025 0.0025 1 0.0021 0.0021 1 0.076 0.078
DFHMIRS 424 0.0006 0.0006 -2 0 0 . 0.031 0.034
MYDOC016 391 0.0072 0.0075 -3 0.006 0.0062 -3 0.301 0.314
PRTOC515 284 0.0141 0.0145 -3 0.0102 0.0104 -3 0.189 0.21
BRHOC053 190 0.0008 0.0008 1 0.0006 0.0006 1 0.011 0.012
PRTOC630 188 0.0111 0.0116 -4 0.0053 0.0056 -5 0.07 0.077
CMSOC320 187 0.0052 0.0052 1 0.0048 0.0048 1 0.149 0.153
CHSOC120 133 0.0025 0.0025 -2 0.0006 0.0006 -2 0.052 0.057
CMSOC330 95 0.006 0.0059 2 0.0058 0.0057 2 0.182 0.184
BRIOC022 93 0.001 0.001 0 0 0 1 0.018 0.019
IAAOC222 91 0.0156 0.0156 0 0.0116 0.0116 0 0.482 0.485
PRTOC001 84 0.005 0.005 0 0.0019 0.0019 0 0.074 0.08
12
Once problem is found, find cause

Run strobe on CICS or batch job.
Ask if program was changed.
Was a system parm changed?
Lurking problem surfaced when user patterns
changed
Did a new system go in?

13
Recommend change to fix problem

Code fix
Parameter change
SQL or IDMS call change
Run workload different time smooth peaks
Redesign database or add index
Completely shutdown workload
If you don't know how to fix it, ask others

14
It helps to make performance recommendations if

You were a programmer in a previous life
You were a DBA in a previous life
Knowledgeable in MVS,CICS, DASD etc.

15
Integrity matters

Be right, study before you speak
Go for tuning that gives a payback
If the workload isn't measurable, put in
mechanisms to measure it before doing the tuning
change
Do some PR work - Send tuning results to
programmer and their management

16
Mainframe tools

SAS
MXG
Strobe
Jones built performance repositories
Our performance website
RMF 3
Omegamon

17
Capacity Managements Prime Objective When Do
We Run Out?

When do we need more of a resource?
How much lead time do you need?
Approval cycle
Floor space
Vendor Delivery Time
Installation Time
Acceptable Risk

18
Forecasting Processes
Business Forecasts
Performance and Workload Data Repositories
Workload Models
Performance Prediction
Resource Utilization Models
Resource Utilization Trends
Validate, Assess and Revise
19
Performance Tuning

We continually tune hardware and software, as
well as their interrelationships, to improve the
performance of systems.
Shares ownership across multiple departments.
Very highly iterative never done!
Why
Direct positive impact upon end user experience.
Tuning ? cost avoidance.

20
Performance Tuning How do we improve programs?

Divide and Conquer
Which program in a batch job takes the longest?
Which program uses the most CPU?
Profile Code
Tune infrastructure (including
network).
Prioritize process

21
Performance Tuning
Identify Opportunities for Improvement aka
"Hawgs" and "Dawgs".

Which programs are slowest (Dawgs)?
Which programs use the most resources (Hawgs)?
Which programs are used the most?
Business criticality How important are they to
the business?

22
Performance Data Repositories

We maintain many performance data repositories
these tend to be collections of statistics not
detail data.
For example, we will not retain CICS transaction
detail, but we will calculate counts of
transactions by region by transaction name as
well as average, maximum and percentile
statistics for a variety of variables and
intervals.
SAS is our primary tool.

23
Performance Data Repositories Data Sources

CICS by day, by tran
DASD Type 74 by day, by LPAR, by VOLSER
Jones application instrumentation
MVS level by day, by LPAR
IDMS- by day, by program
DB2 by day, by tran
Service and report classes by day, by service
class
Proc summary, proc append

24
Business Metrics and Workloads

Business Metrics typically use different time
frames than workload metrics.
Business doesnt forecast in terms of megabytes
of DASD, cpu seconds used, interactive sessions,
concurrent users or paging rates.
They refer to branches, IRs, customers, trades,
purchases, , payments, visits, exorbitant cost
of IT,

25
Loved Ones Sorry, all apps are not equal

What is the business importance of the
application / workload?
If there are diverse workloads on a system it is
necessary to prioritize the work to ensure that
the work is processed in an order that reflects
its business priority.
To understand priorities you have to understand
the business.
Capacity planning activities should also ensure
that when work is constrained, the highest
priority work is favored.

26
Performance testing

Jones has clone environment of production
Use Loadrunner tool to generate transactions
Think time adjustable
A few hundred users is usually enough
All major system enhancements are loaded tested

27
Load Testing Objectives

Is End User Performance acceptable?
Will the introduction of these new features
threaten the health of other applications?
How does response resource utilization compare
to current production levels?
Reproduce and troubleshoot production problems.
Will we need to add capacity?
In stress testing we measure response times at
production peak load and 5x production peak.
Often identify 'Break Points' to watch for in
production.

28
Interaction with Availability

A badly performing application is effectively the
same as the application being unavailable.
Capacity and Availability Management share common
goals / tools and complement each other.
Capacity Management needs to be aware of
Availability techniques deployed, such as
mirroring, load balancers or clustering, in order
to plan accurately for Capacity.

29
Questions

Write a Comment

User Comments (0)