P1254156803PjIxq

About This Presentation

Title:

P1254156803PjIxq

Description:

Tuesday, May 20 and Thursday, May 22; Session 250 ... OpenVMS tends to dynamically move Lock Mastership duties to the node with the ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 158

Provided by: keithp7

Category:

more less

Transcript and Presenter's Notes

Title: P1254156803PjIxq

1
(No Transcript)
2
RMS Block 1

Keith Parris Systems/Software Engineer
HP
Tuesday, May 20 and Thursday, May 22 Session 250

3
Detecting and Solving Performance Bottlenecks
Using Locking Data
4
Background

OpenVMS system managers have traditionally looked
at performance in 3 areas
CPU
Memory
I/O
But in OpenVMS clusters, the Distributed Lock
Manager is another important element involved in
performance, and while it can sometimes
complicate matters, it can also provide some
valuable clues in identifying performance
bottlenecks in various areas

5
Distributed Lock Manager Monitoring
6
Available Tools

MONITOR
MONITOR LOCK, DLOCK, RLOCK, MODES, CLUSTER, etc.
SHOW CLUSTER/CONTINUOUS
Availability Manager (or DECamds), particularly
Lock Contention
Performance data collectors/analyzers
T4 / Tlviz
HP Perfdat
DECps (Unicenter Performance Management for
OpenVMS)
Perfcap (PAWZ)
ECP / TDC
SDA and SDA Extensions
SHOW LOCK, SHOW RESOURCE, etc.
Extensions LCK, CNX, SPL, etc.
V6 Freeware CD directory KP_LOCKTOOLS
LOCKTIME.COM, LOCK_ACTV.COM, LCKQUE.COM

7
MONITOR LOCK
OpenVMS Monitor
Utility LOCK
MANAGEMENT STATISTICS
on node XYZB11
1-JUN-2000 091638.82
CUR AVE MIN MAX
New ENQ Rate 4362.00
4010.50 3659.00 4362.00 Converted ENQ
Rate 690.00 636.66 583.33
690.00 DEQ Rate
4363.33 4011.83 3660.33 4363.33
Blocking AST Rate 6.00 4.83
3.66 6.00 ENQs Forced To Wait
Rate 45.00 39.83 34.66
45.00 ENQs Not Queued Rate 0.66
1.16 0.66 1.66 Deadlock
Search Rate 0.00 0.00
0.00 0.00 Deadlock Find Rate
0.00 0.00 0.00 0.00
Total Locks 419770.00 419767.50
419765.00 419770.00 Total Resources
180201.00 180194.00 180187.00 180201.00
8
MONITOR DLOCK
OpenVMS Monitor
Utility DISTRIBUTED LOCK
MANAGEMENT STATISTICS
on node XYZB11
1-JUN-2000 091727.77
CUR AVE MIN MAX
New ENQ Rate (Local) 902.00
808.00 528.66 993.33
(Incoming) 2302.00 2149.00 2017.66
2302.00 (Outgoing)
962.66 950.11 824.33 1063.33
Converted ENQ Rate (Local) 198.33 199.66
149.33 251.33
(Incoming) 168.66 222.44 168.66
284.33 (Outgoing) 181.00
113.22 75.66 181.00 DEQ Rate
(Local) 899.00 807.11 528.00
994.33 (Incoming)
2300.66 2148.88 2018.66 2300.66
(Outgoing) 971.00 952.77
824.33 1063.00 Blocking AST Rate (Local)
0.00 0.33 0.00 0.66
(Incoming) 1.33 0.88
0.00 1.33
(Outgoing) 5.66 6.11 3.33
9.33 Dir Functn Rate (Incoming) 16.33
16.66 14.00 19.66
(Outgoing) 49.00 24.77 9.66
49.00 Deadlock Message Rate
0.00 0.00 0.00 0.00
9
MONITOR RLOCK
OpenVMS Monitor
Utility DYNAMIC LOCK
REMASTERING STATISTICS
on node KEITH
6-MAY-2002 184235.07
CUR AVE MIN
MAX Lock Tree Outbound Rate 0.00
0.00 0.00 0.00 (Higher
Activity) 0.00 0.00
0.00 0.00 (Higher LCKDIRWT)
0.00 0.00 0.00 0.00
(Sole Interest) 0.00 0.00
0.00 0.00 Remaster Msg Send Rate
0.00 0.00 0.00 0.00
Lock Tree Inbound Rate 0.00
0.00 0.00 0.00 Remaster Msg
Receive Rate 0.00 0.00 0.00
0.00
10
MONITOR CLUSTER
Statistic CURRENT OpenVMS Monitor
Utility 4-APR-2001 175718
CLUSTER STATISTICS
CPU MEMORY
CPU Busy
0 25 50 75 100Memory In Use
0 25 50 75 100
----------------
---------------- XYZB14 57
XYZB15 48
XYZB13 57
XYZB16 48 XYZB18
48 XYZB22
48 XYZB22 42
XYZB21 47
XYZV 40
XYZB18 46 XYZB11
40 XYZV
42 -------------------------
-------------------------------------------------
----- DISK
LOCK
I/O Operation Rate 0 25 50 75
100Tot ENQ/DEQ Rate 0 125 250 375 500
----------------
---------------- 256DPA23 1312
XYZB14 22297
256DPA14 161
XYZB13 5814
256DPA26 125
XYZB11 1844
256DPA122 78
XYZB20 1452
256DPA3500 77
XYZV 888
DSA6002 66
XYZB21 857

11
MONITOR SCS (example /ITEMM_RECEIVED)
OpenVMS Monitor
Utility SCS
STATISTICS on
node XYZB11
4-APR-2001 180457.44 Message Receive Rate
CUR AVE MIN MAX
XYZB11 0.00
0.00 0.00 0.00 HS100B
1.33 3.90 1.33
7.33 XYZB21 85.00
120.84 38.66 189.33 HS101B
2.33 2.37 0.66
3.33 HS103B 7.66
8.61 5.33 15.00 HS102B
0.33 0.33 0.00
0.66 XYZB14
169.33 237.17 97.00 400.33
XYZB22 11.00 14.51
2.00 33.00 XYZB16
4.00 5.09 0.00 14.00
XYZB18 46.00
18.46 1.00 46.00 XYZB15
65.00 77.53 30.66
135.33 SYZB13 59.00
93.90 47.66 159.00 XYZV
31.66 56.87 29.00
114.00 XYZB19
5.00 22.70 0.00 49.66 XYZB12
128.66 61.78
33.66 128.66
12
Monitoring tools

SHOW CLUSTER/CONTINUOUS
ADD CONNECTIONS
ADD CR_WAITS

13
SHOW CLUSTER/CONTINUOUS
View of Cluster from system ID 12345 node
NODE01 6-MAY-2002 144516 -------------
-----------------------------------------------
------ SYSTEMS MEMBERS
CONNECTIONS COUNTERS -----------------
-----------------------------------------------
-- NODE SOFTWARE STATUS
LOC_PROC_NAME CON_STA CR_WAITS
--------------------------------------------
-------------------- NODE01 VMS E7.3
MEMBER SCSDIRECTORY LISTEN
MSCPDISK
LISTEN
MSCPTAPE LISTEN
VMSSDA_AXP
LISTEN
VMSVAXcluster LISTEN
SCATRANSPORT
LISTEN NODE02 VMS V7.2 MEMBER
SCATRANSPORT OPEN 0
VMSDISK_CL_DRVR OPEN
0
MSCPDISK OPEN 0
MSCPTAPE OPEN
0
VMSVAXcluster OPEN 1024 NODE03
VMS V7.3 MEMBER VMSDISK_CL_DRVR OPEN
0
SCATRANSPORT OPEN 0
VMSTAPE_CL_DRVR OPEN
0
MSCPDISK OPEN 0
MSCPTAPE OPEN
0
VMSVAXcluster OPEN 3 NODE04
VMS V7.2 MEMBER SCATRANSPORT OPEN
0
14
Monitoring tools

ANALYZE/SYSTEM
SDAgt SHOW RESOURCE
SDAgt SHOW LOCK
SDAgt LCK !New SDA extension, 7.2-2 and
above

15
Monitoring tools

ANALYZE/SYSTEM
SHOW LOCK qualifiers (OpenVMS 7.2 and above)
/WAITING
Displays only the waiting lock requests (those
blocked by other locks)
/SUMMARY
Displays summary data and performance counters
SHOW RESOURCE qualifier (OpenVMS 7.2 and above)
/CONTENTION
Displays resources which are under contention

16
Monitoring tools

ANALYZE/SYSTEM
New SDA extension LCK (OpenVMS 7.2-2 and above)
SDAgt LCK !Shows help text with command summary
Can display various additional lock manager
statistics
SDAgt LCK STATISTIC !Shows lock manager statistics
Can show busiest resource trees by lock activity
rate
SDAgt LCK SHOW ACTIVE !Shows lock activity
Can trace lock requests
SDAgt LCK LOAD !Load the debug execlet
SDAgt LCK START TRACE !Start tracing lock requests
SDAgt LCK STOP TRACE !Stop tracing
SDAgt LCK SHOW TRACE !Display contents of trace
buffer

17
Availability Manager

Lock Contention
Data collection for Lock Contention must be
enabled
Detects lock queues (lock requests waiting on
other locks). Identifies
Lock holder
Lock waiters
Can take action to free the blocking lock either
exit image or delete offending process
Lock contention events are logged in the file
AMDSSYSTEMAMDSLOCK_LOG.LOG
Also available in the older DECamds product

18
DECps Example
Full Analysis XYZB11 (Compaq AlphaServer
GS140) Page 13
PSPA
V2.1.5 Wednesday
17-NOV-1999 0630 to 0730 CONCLUSION 5.

R0300 There are many
lock requests per second that are put
into the lock wait queue applications
may be experiencing delays. This
situation usually indicates that users
are contending for shared resources.
Two common reasons for this symptom
Applications may inherently cause this behavior
and not affect the general
workload, so be sensitive to
response time degradation to rule this out.
These applications might be redesigned
to lock resources at a lower level, to
lower the contention. Disk volumes
(including solid-state devices) might be
under too much contention by too many users
across the cluster. Response time
problems would affect the users of
these disks. Try to redistribute the activity
to alleviate the contention.
19
DECps Example
Lastly you may look at the lock wait
queue with SDA to isolate the users
who are waiting, and resources they are
waiting for. Total number of samples
supporting this conclusion 20
CONDITIONS 1. ENQUE_LOCKS_FORCED_TO_WAI
T_RATE .GT. 1.00 CPU_VUP_RATING
2. OCCURRENCES .GE. 1
EVIDENCE Enque lck Process w/ highest disk
I/O Volume w/ wait
------------------------------------- highest
Time of rate Username Imagename
Vol w/hgstIO I/O rate Occurrence --------
----------- ------------ ------------
------------ --------------- 557.25 WXYZ
ZZSERVER SYSFILES SS1 17-NOV
063200 671.42 WXYZ ZZSERVER
SYSFILES SS1 17-NOV 063400
599.68 WXYZ ZZSERVER SS100000000D
SS1 17-NOV 063600 689.27 WXYZ
ZZSERVER SS1000000000 SS1 17-NOV
063800 920.70 WXYZ ZZSERVER
SS1000000007 SS1 17-NOV 064000
20
DECps Example
-------------------------- CLUSTER Lock
------------------------- !
! !
H-Orig Out Enq Dir
Op R-Orig ! ! Lck Act
Bound Wait Incomg Lck Act ! !
(/sec) () () (/sec)
(/sec) ! ! ------ ------
------ ------ ------ ! ! Node Average
6991.0 26.8 2.6 24.6 1873.3 !
! Node Minimum 42.1 13.9 0.9
0.0 0.1 ! ! Node Maximum 14123.3
72.8 9.7 84.7 14658.5 ! ! Cluster
Total 118844.3 26.8 2.6 417.8
31845.1 ! ------------------------------------
-----------------------------
21
DECps Example
-------------------------- CLUSTER Lock
------------------------- !
! !
H-Orig Out Enq Dir
Op R-Orig ! ! Lck Act
Bound Wait Incomg Lck Act ! !
(/sec) () () (/sec)
(/sec) ! ! ------ ------
------ ------ ------ ! ! XYZA03
42.1 37.3 2.2 0.0 0.3 !
! XYZA07 97.4 61.1 3.9
0.0 0.1 ! ! XYZA08 102.2
47.6 2.1 0.0 0.3 ! ! XYZA09
105.5 51.3 4.2 0.0
0.4 ! ! XYZB11 9433.7 24.9
9.7 56.8 14658.5 ! ! XYZB12
11385.7 24.0 1.2 16.0 818.8 !
! XYZB13 4738.7 62.0 3.6
32.9 1051.8 ! ! XYZB14 14123.3
13.9 4.6 16.3 10279.1 ! ! XYZB15
10011.6 26.0 2.1 33.8
2330.8 ! ! XYZB16 7148.1 35.7
1.7 24.8 1428.8 ! ! XYZB18
13447.1 21.5 1.0 27.2 88.6 !
! XYZB19 13569.2 19.5 0.9
52.5 18.3 ! ! XYZB20 6252.1
42.1 2.7 36.5 906.6 ! ! XYZB21
12083.0 22.3 1.1 84.7
126.8 ! ! XYZB22 12061.4 23.5
1.2 15.6 131.7 ! ! XYZA05
608.7 29.3 1.0 0.0 0.2 !
! XYZB23 3637.9 72.8 3.7
20.8 4.3 ! ------------------------------
-----------------------------------
22
OpenVMS Clusters and the Distributed Lock Manager

An OpenVMS Cluster is a set of distributed
systems which cooperate
Cooperation requires coordination
The Distributed Lock Manager is critical to
making that coordination possible

23
Foundation for Shared Access
Users
Application
Application
Application
Node
Node
Node
Node
Node
Node
Distributed Lock Manager
Connection Manager Rule of Total Connectivity and
Quorum Scheme
Shared resources (files, disks, tapes)
24
Distributed Lock Manager

The Lock Manager provides mechanisms for
coordinating access to physical devices, both for
exclusive access and for various degrees of
sharing

25
Distributed Lock Manager

Physical resources that the Lock Manager is used
to coordinate access to include
Tape drives
Disks
Files
Records within a file
as well as internal operating system cache
buffers and so forth

26
Distributed Lock Manager

Physical resources are mapped to symbolic
resource names, and locks are taken out and
released on these symbolic resources to control
access to the real resources

27
Distributed Lock Manager

System services ENQ and DEQ allow new lock
requests, conversion of existing locks to
different modes (or degrees of sharing), and
release of locks, while GETLKI allows the lookup
of lock information

28
OpenVMS ClusterDistributed Lock Manager

Physical resources are protected by locks on
symbolic resource names
Resources are arranged in trees
e.g. File ? Data bucket ? Record
Different resources (disk, file, etc.) are
coordinated with separate resource trees, to
minimize contention

29
Symbolic lock resource names

Symbolic resource names
Common prefixes
SYS for OpenVMS executive
F11B for XQP, file system
RMS for Record Management Services
See the book OpenVMS Internals and Data
Structures by Ruth Goldenberg, et al
Appendix H in Alpha V1.5 version
Appendix A in Alpha V7.0 version

30
Resource names

Example Device Lock
Resource name format is
SYS Device Name in ASCII text

3A3333324147442431245F24535953 SYS_1DGA233 SYS
? (SYS facility) _1DGA233 ? (Device name)
31
Resource names

Example RMS lock tree for an RMS indexed file
Resource name format is
RMS File ID Flags byte Lock Volume Name
Identify filespec using File ID
Flags byte indicates shared or private disk mount
Pick up disk volume name
This is label as of time disk was mounted
Sub-locks are used for buckets and records within
the file

32
Decoding an RMS File Root Resource Name
RMSt......FDDI_COMMON ... 000000
204E4F4D4D4F435F49444446 02 000000011C74 24534D52
24534D52 RMS ? (RMS Facility RMS File Root
Resource) 00 00 0001 1C74 RVN
FilX Sequence_Number File_Number ? File ID
7284,1,0) 02 Flags byte (Disk is mounted
/SYSTEM) 204E4F4D4D4F435F49444446 FDDI_COMMON ?
(Disk label) dump/header/id7284/blockcount0
diskFDDI_COMMON000000indexf.sys Dump
of file _DSA100SYSEXESYSUAF.DAT1 ... ? (File)
33
Internal Structure of an RMS Indexed File
34
RMS Data Bucket Contents
Data Bucket
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
35
RMS Indexed FileBucket and Record Locks

Sub-locks of RMS File Lock
Have to look at Parent lock to identify file
Bucket lock
4 bytes VBN of first block of the bucket
Record lock
8 bytes Record File Address (RFA) of record

36
Distributed Lock ManagerLock Master nodes

OpenVMS assigns a single node at a time to keep
track of all the resources in a given resource
tree, and any locks taken out on those resources
This node is called the Lock Master node for that
tree
Different trees often have different Lock Master
nodes
OpenVMS tends to dynamically move Lock Mastership
duties to the node with the most locking activity
on that tree

37
Distributed Locks
Node A Lock Master for resource X
Node B
Node C
Lock on resource X
Lock on resource X
Lock on resource X
Copy of Node Bs Lock on resource X
Copy of Node Cs Lock on resource X
38
Directory Lookups

This is how OpenVMS finds out which node is the
lock master
Only needed for 1st lock request on a particular
resource tree on a given node
Resource Block (RSB) remembers master node CSID
Basic conceptual algorithm Hash resource name
and index into the lock directory weight vector,
which has been created based on LOCKDIRWT values
for each node

39
Lock Directory Weight Vector
Big node PAPABR Lock
Directory Weight 2 Middle-sized node MAMABR
Lock Directory Weight 1 Satellite node
BABYBR Lock Directory Weight 0
Resulting lock directory weight vector
PAPABR
PAPABR
MAMABR
40
Lock Directory Lookup
Local node
User lock Request User request satisfied
Directory node
Lock Master Node
41
Performance Hints

Avoid DEQing a lock which will be re-acquired
later
Instead, convert to Null lock, and convert back
up later
Or, create locks as sub-locks of a parent lock
that remains held
This avoids directory lookups
and also avoids losing the activity counts in the
root RSB used for lock-remastering decisions

42
Performance Hints

For lock trees which come and go often
A separate program could take out a Null lock on
the root resource on each node at boot time, and
just hold it forever
This avoids lock directory lookup operations for
that tree

43
Lock Request Latencies

Latency depends on several things
Directory lookup needed or not
Local or remote directory node
ENQ or DEQ operation (acquiring or releasing a
lock)
Local (same node) or remote lock master node
And if remote, the speed of interconnect used

44
ENQueue

New lock request (0-2 round trips)
No off-node traffic if this node is lock master
1 round trip if
no other node has interest, or
directory node is also lock master, or
local node is directory node
2 round trips if
directory node is not also the lock master node

45
ENQueue

Conversion or sub-lock (0 or 1 round-trip)
No off-node traffic if this node is lock master
1 round trip to lock master (except 2PC)
RSB already contains CSID of lock master node, so
we never need to do a directory lookup

46
DEQueue

1-way message
No response expected
Client doesnt wait
SCS message guarantee ensures eventual arrival
SCS credits may limit number of these in-flight
at once

47
Lock Request Latency Local vs. Remote

Local requests are fastest
Remote requests are significantly slower
Code path 20 times longer
Interconnect also contributes latency
Total latency up to 2 orders of magnitude (100x)
higher than local requests

48
Lock Request Latencies

I used the LOCKTIME program pair written by Roy
G. Davis, author of the book VAXcluster
Principles, to measure latency of lock requests
locally and across different interconnects
LOCKTIME algorithm
Take out 5000 locks on remote node, making it
lock master for each of 5000 lock trees (requires
ENQLM gt 5000)
Do 5000 ENQs, lock converts, and DEQs, and
calculate average latencies for each
Lock conversion request latency is roughly
equivalent to round-trip time between nodes over
a given interconnect
See LOCKTIME.COM from OpenVMS Freeware V6,
KP_LOCKTOOLS directory

49
Lock Request Latency Local
Client process on same node2-4 microseconds
Lock Master Node
Client
50
Client across Gigabit Ethernet200 microseconds
Lock Request Latency Remote
Lock Master
Client node
Client
GbE Switch
51
Lock Request Latencies
52
Lock Mastership

Lock mastership node may change for various
reasons
Lock master node goes down -- new master must be
elected
OpenVMS may move lock mastership to a better
node for performance reasons
LOCKDIRWT imbalance found (pre-8.3), or
Activity-based Dynamic Lock Remastering
Lock Master node no longer has interest

53
Lock Mastership

Lock master selection criteria
Interest
Only move resource tree to a node which is
holding at least some locks on that resource tree
Lock Directory Weight (LOCKDIRWT) (pre-8.3)
Move lock tree to a node with interest and a
higher LOCKDIRWT
Lock Remaster Weight (LOCKRMWT) (8.3 and above)
Bias lock tree movement toward nodes with a
higher Remaster Weight
Activity Level
Move lock tree to a node with interest and a
higher average activity level (biased by Remaster
Weight for 8.3 and above)

54
How to measure locking activity

OpenVMS keeps counters of lock activity for each
resource tree
but not for each of the sub-resources
So you can see the lock rate for an RMS indexed
file, for example
but not for individual buckets or records within
that file
SDA extension LCK in OpenVMS V7.2-2 and above can
show lock rates, and even trace all lock requests
if needed. This displays data on a per-node
basis.
Cluster-wide summary is available using
LOCK_ACTV.COM from OpenVMS Freeware V6,
KP_LOCKTOOLS directory

55
Lock Remastering

Circumstances under which remastering occurs, and
does not
LOCKDIRWT values
Pre-8.3, OpenVMS tends to remaster to node with
higher LOCKDIRWT values, never to node with lower
LOCKDIRWT
Shifting initiated based on activity counters in
root RSB (biased by Remaster Weight for 8.3 and
above)
PE1 parameter being non-zero can prevent movement
or place threshold on lock tree size
Shift if existing lock master loses interest

56
Lock Remastering

OpenVMS rules for dynamic remastering decision
based on activity levels (prior to version 8.3)
assuming equal LOCKDIRWT values
1) Must meet general threshold of at least 80
lock requests so far (LCKGL_SYS_THRSH)
2) New potential master node must have at least
10 more requests per second than current master
(LCKGL_ACT_THRSH)

57
Lock Remastering

OpenVMS rules for dynamic remastering (pre-8.3)
3) Estimated cost to move (based on size of lock
tree) must be less than estimated savings (based
on lock rate)
except if new master meets criteria (2) for 3
consecutive 8-second intervals, cost is ignored
4) No more than 5 remastering operations can be
going on at once on a node (LCKGL_RM_QUOTA)

58
Lock Remastering

OpenVMS rules for dynamic remastering (pre-8.3)
5) If PE1 on the current master has a negative
value, remastering trees off the node is disabled
6) If PE1 has a positive, non-zero value on the
current master, the tree must be smaller than PE1
in size or it will not be remastered

59
Lock Remastering

Implications of dynamic remastering rules
(pre-8.3)
LOCKDIRWT must be equal for lock activity levels
to control choice of lock master node
PE1 can be used to control movement of lock trees
OFF of a node, but not ONTO a node
RSB stores lock activity counts, so even high
activity counts can be lost if the last lock is
DEQueued on a given node and thus the RSB gets
deallocated

60
Lock Remastering

Implications of dynamic remastering rules
(pre-8.3)
With two or more large CPUs of equal size running
the same application, lock mastership thrashing
is not uncommon
10 more lock requests per second is not much of a
difference when you may be doing 100s or 1,000s
of lock requests per second
Whichever new node becomes lock master may then
see its own lock rate slow somewhat due to the
remote lock request workload

61
How to Detect Lock Mastership Thrashing

Detection of remastering activity
MONITOR RLOCK in 7.3 and above (not 7.2-2)
SDAgt SHOW LOCK/SUMMARY in 7.2 and above
Change of mastership node for a given resource
Check message counters under SDA
SDAgt EXAMINE PMSGL_RM_RBLD_SENT
SDAgt EXAMINE PMSGL_RM_RBLD_RCVD
Counts which increase suddenly by a large amount
indicate remastering of large tree(s)
SENT Off of this node
RCVD Onto this node
See example procedures WATCH_RBLD.COM and RBLD.COM

62
SDAgt SHOW LOCK/SUMMARY

analyze/system
OpenVMS (TM) Alpha system analyzer
SDAgt show lock/summary
...
Lock Manager Performance Counters
----------------------------------
...
Lock Remaster Counters
Tree moved to this node
0
Tree moved to another node
0
Tree moved due to higher Activity
0
Tree moved due to higher LOCKDIRWT
0
Tree moved due to Single Node Locks
0
No Quota for Operation
0
Proposed New Manager Declined
0
Operations completed
0
Remaster Messages Sent
0

63
MONITOR RLOCK
OpenVMS Monitor
Utility DYNAMIC LOCK
REMASTERING STATISTICS
on node KEITH
6-MAY-2002 184235.07
CUR AVE MIN
MAX Lock Tree Outbound Rate 0.00
0.00 0.00 0.00 (Higher
Activity) 0.00 0.00
0.00 0.00 (Higher LCKDIRWT)
0.00 0.00 0.00 0.00
(Sole Interest) 0.00 0.00
0.00 0.00 Remaster Msg Send Rate
0.00 0.00 0.00 0.00
Lock Tree Inbound Rate 0.00
0.00 0.00 0.00 Remaster Msg
Receive Rate 0.00 0.00 0.00
0.00
64
How to Prevent Lock Mastership Thrashing

Upgrade to 8.3 or above
If thats not an option, consider
Unbalanced node power
Unequal workloads
Unequal values of LOCKDIRWT
Non-zero values of PE1

65
Impact of Non-zero PE1 Values

Concern Locking down remastering with PE1 (to
avoid lock mastership thrashing) can result in
sub-optimal lock master node selections over time

66
Mitigating Impact of Non-zero PE1 Values

Possible ways of mitigating side-effects of
preventing remastering using PE1
Adjust PE1 value as high as you can without
producing noticeable delays
Set PE1 to 0 for short periods, periodically

67
Deadlock searches

The OpenVMS Distributed Lock Manager
automatically detects lock deadlock conditions,
and generates an error to one of the programs
causing the deadlock

68
Deadlock searches

Deadlock searches can take lots of time and
interrupt-state CPU time
DECps Performance Analysis report can identify
when these are occurring
DEADLOCK_WAIT parameter controls how long we wait
before starting a deadlock search

69
Interrupt-state/stack saturation

Too much lock mastership workload, MSCP-serving,
etc. can saturate a CPU in interrupt state
See utilization (and detect saturation 90)
with
MONITOR MODES/CPUn/ALL
where n is the number of the CPU of interest
the Primary CPU number can be determined using
SHOW CPU)
T4 MON.MODE Interrupt State per-CPU data,
displayed using TLviz
Adding RMS Global Buffers may help reduce lock
rates (post 7.2-1H1)

70
Response time vs. Utilization
71
Interrupt-state/stack saturation

FAST_PATH
Can shift interrupt-state workload off primary
CPU in SMP systems
IO_PREFER_CPUS value of an even number avoids
sending interrupts from FAST_PATH devices to the
Primary CPU
Consider limiting interrupts to a subset of
non-primary CPU
FAST_PATH for CI since 7.1
FAST_PATH for SCSI and FC is in 7.3 and above
FAST_PATH for LANs and PEDRIVER in 7.3-2
Even with FAST_PATH enabled, the Primary CPU
still received the device interrupt, but handed
it off immediately via an inter-processor
interrupt
7.3-1 allowed FAST_PATH interrupts to bypass the
Primary CPU entirely and go directly to a
non-primary CPU on hardware platforms which
support this
No FAST_PATH for Memory Channel (and most likely
never will be)
No FAST_PATH for Galaxy Shared Memory Cluster
Interconnect

72
How should Fast_Path assignments to CPUs be done?

Customer question How should interrupts from LAN
adapters/PEDRIVER and Fibre Channel HBAs be
directed to CPUs within each box?
Logic might indicate three possible approaches
Use the default OpenVMS Fast_Path assignments
as-is
Spread interrupts as broadly as possible across
CPUs in an attempt to avoid saturation of any
single CPU, or
Put like devices on the same CPU for better
efficiency (fewer inter-processor interrupts,
better cache-line sharing for driver code)
Conclusion after customer test Putting like
devices on same CPU is more efficient and results
in better response times under heavy test
workload.

73
Effect of Fast_Path Options on Interrupt State
Spread
Together
Original
74
Effect of Optimal Fast_Path Settings
75
Dedicated-CPU Lock Manager

With 7.2-2 and above, you can choose to dedicate
a CPU to do lock management work. This may help
reduce MP_Synch time.
Using this can be helpful when
You have more than 5 CPUs in the system, and
Youre already wasting more than a CPUs worth in
MP_Synch time contending for the LCKMGR spinlock
See SDAgt SPL extension and SYSEXAMPLESSPL.COM
LCKMGR_MODE parameter
0 Disabled
gt1 Enable if at least this many CPUs are
running
LCKMGR_CPUID parameter specifies which CPU to
dedicate to LCKMGR_SERVER process

76
Troubleshooting Real-World Problems Using Locking
Data

Techniques
Monitor for
High lock rates
High lock queues
Primary CPU Interrupt-State Saturation
SCS credit waits
Deadlock Searches and Finds

77
LOCK_ACTV.COM Example
0000002020202020202020203153530200004C71004624534D
52 RMSF.qL...SS1 ... RMS lock tree
for file 70,19569,0 on volume SS1 File
specification DISKSS1DATA8PDATA.IDX1
Total 11523 XYZB12 6455
XYZB11 746 XYZB14 611
XYZB15 602 XYZB23 564
XYZB13 540 XYZB19 532
XYZB16 523 XYZB20 415
XYZB22 284 XYZB18 127
XYZB21 125 Lock Master Node for the
resource This is a fairly hot file. Here the
lock master node is optimal.
78
LOCK_ACTV.COM Example
0000002020202032454C494653595302000000D3000C24534D
52 RMS.......SYSFILE2 ... RMS lock tree
for file 12,211,0 on volume SYSFILE2 File
specification DISKSYSFILE2SYSFILE2SYSUAF.DAT
5 Total 184 XYZB16 75
XYZB20 48 XYZB23 41
XYZB21 16 XYZB19 2
XYZB15 1 XYZB13 1
XYZB14 0 XYZB12 0 This
reflects user logins, process creations, password
changes, and such. Note the poor lock master node
selection here (XYZB16 would be optimal).
79
Example Application (re)opens file frequently

Symptom High lock rate on File Access
Arbitration Lock for application data file
Cause BASIC program re-executing OPEN command
for a file BASIC dutifully closes and then
re-opens file
Fix Modify BASIC program to execute OPEN
statement only once at image startup time

80
LOCK_ACTV.COM Example
00000016202020202020202031505041612442313146
F11BaAPP1 .... Files-11 File Access
Arbitration lock for file 22,,0 on volume
APP1 File specification DISKAPP1DATAXDATA
.IDX1 Total 50 XYZB15
8 XYZB21 7 XYZB16 7
XYZB19 6 XYZB20 6
XYZB23 6 XYZB18 5
XYZB13 3 XYZB12 1
XYZB22 1 XYZB14 1 This shows
where the application is apparently opening (or
re-opening) this particular file 50 times per
second.
81
Example Directory File Grows Large

Symptom High queue length on file serialization
lock for .DIR file
Cause Directory file has grown to over 127
blocks
(VMS version 7.1-2 or earlier 7.2 and later are
much less sensitive to this problem, so queuing
occurs at directory file sizes more like
1,000-3,000 blocks)
Fix Delete or rename files out of directory

82
LCKQUE.COM Example
Here are examples where a directory file got very
large under 7.1-2 'F11BvAPP2 '
202020202020202032505041762442313146 Files-11
Volume Allocation lock for volume APP2
'F11BsH...' 00000148732442313146 Files-11
File Serialization lock for file 328,,0 on
volume APP2 File specification
DISKAPP2DATA.DIR1 Convert queue 0, Wait
queue 95 'F11BvLOGFILE '
2020202020454C4946474F4C762442313146 Files-11
Volume Allocation lock for volume
LOGFILE 'F11Bs....' 00000A2E732442313146
Files-11 File Serialization lock for file
2606,,0 on volume LOGFILE File
specification DISKLOGFILE000000LOGS.DIR1
Convert queue 0, Wait queue 3891
83
Example Fragmented File Header

Symptom High queue length on File Serialization
Lock for application data file
Cause CONVERTs onto disk without sufficient
contiguous space resulted in highly-fragmented
files, increasing I/O load on disk array. File
was so fragmented it had 3 extension file headers
Fix Defragment disk, or do an /IMAGE
Backup/Restore

84
LCKQUE.COM Example
'F11Bs....' 0000000E732442313146 Files-11
File Serialization lock for file 14,,0 on
volume THDATA File specification
DISKTHDATATHOT.IDX1 Convert queue 0, Wait
queue 28 This is an example of the result of
reorganizing RMS indexed files with CONVERTs
over a weekend without enough contiguous free
space available, causing a lot of file
fragmentation, and dramatically increasing
the I/O load on a RAID array on the next busy day
(we had to fix this fragmentation with a
backup/restore cycle soon after). The file shown
here had gotten so fragmented as to have 3
extension file headers. The lock we're queuing
on here is the file serialization lock for this
RMS indexed file.
85
OpenVMS Cluster DLM Resources

OpenVMS Documentation on the Web
http//h71000.www7.hp.com/doc
OpenVMS Cluster Systems
Guidelines for OpenVMS Cluster Configurations
Book VAXcluster Principles by Roy G. Davis,
Digital Press, 1993, ISBN 1-55558-112-9

86
Tools Techniques for Solving OpenVMS
Distributed Lock Manager Performance Issues
87
Common Problems
88
Common Problem

Unbalanced LOCKDIRWT settings
Cluster typically consists of at least 2
(commonly more) nodes of same or similar model or
horsepower
Typically all cluster nodes run same or similar
workload, evenly balanced across the cluster
nodes
History of resource tree mastership thrashing
between nodes in the past
LOCKDIRWT is set higher on just one node in the
cluster
It then becomes resource master for all shared
resource trees
Resource tree mastership thrashing problem is
cured
Solution works very well, until the one node gets
overwhelmed

89
Common ProblemUnbalanced LOCKDIRWT settings

Symptoms
Perception of slow response times under heavy
load tendency to fall off performance cliff
RWSCS states may show up in SHOW SYSTEM displays
on nodes with lower LOCKDIRWT
MONITOR DLOCK shows mostly outgoing lock requests
on all but one node one node shows mostly
incoming lock requests
CPU usage (particularly interrupt state) much
higher on the node with the higher LOCKDIRWT
setting

90
Common ProblemUnbalanced LOCKDIRWT settings

Troubleshooting steps
Check for bottlenecks in CPU
Check for interrupt-state saturation using
MONITOR MODES
Use PC Sampling to verify source of
interrupt-state time (lock traffic)
Look for causes of locking imbalance among nodes
Double-check incoming workload balancing
mechanisms
Use SDAgt LCK SHOW ACTIVE on all nodes or the V6
Freeware tool LOCK_ACTV.COM to identify resource
master nodes for the busiest lock trees, and
their relative activity levels, especially
looking for cases where the resource master
nodes lock request rate to the resource tree is
lower than another nodes rate
Examine system parameter values
See if LOCKDIRWT parameter has been set to a
higher value on one node than other nodes and
that node is now the resource master for many
lock trees and has become overloaded as a result

91
Common ProblemUnbalanced LOCKDIRWT settings

Problem resolution
Set LOCKDIRWT to the same value on all nodes to
allow OpenVMS to move resource tree mastership
based on relative activity levels then, to
prevent resource tree mastership thrashing
If possible, add one new faster-CPU system to the
cluster
Try biasing incoming workload so 1 node has a
slightly-higher workload level than the other
nodes
If there are multiple types of workload, try to
direct a little more of each workload type to a
different node in the cluster
If these measures are insufficient to prevent
resource tree mastership thrashing, set the PE1
parameter to a positive, non-zero value (as large
a value as possible and still prevent thrashing).
This non-zero setting may only be necessary
during high-workload periods.
Alternatively, upgrade to OpenVMS version 8.3 or
higher

92
Common Problem

Hard-coded PE1 settings
Cluster typically consists of at least 2
(commonly more) nodes of same or similar model or
horsepower
Typically all cluster nodes run same or similar
workload, evenly balanced across the cluster
nodes
History of resource tree mastership thrashing
between nodes in the past
PE1 is set to a non-zero value on all nodes in
the cluster
Resource master node assignments become fixed in
place
Resource tree mastership thrashing problem is
cured
PE1 may even be set to -1 so as to prevent ALL
remastering (not just activity-based) in the
interest of saving even the little
interrupt-state time spent keeping lock-request
counts

93
Common ProblemHard-coded PE1 settings

Symptoms
Perception of slow response times under heavy
load
One or more nodes may be experiencing CPU
saturation in interrupt state
Processes in RWSCS state may often show up in
SHOW SYSTEM displays on some nodes
MONITOR DLOCK shows imbalance of incoming and
outgoing lock request rates between nodes, and
fewer local lock requests than normal as a
fraction of total lock requests
SDAgt LCK SHOW ACTIVE or V6 Freeware tool
LOCK_ACTV.COM show many of busiest resource
trees are not mastered on the node with the
highest lock request rate to that resource tree
(or else these counts are all zero or values
remain static)

94
Common ProblemHard-coded PE1 settings

Troubleshooting steps
Check for bottlenecks in CPU
Check for interrupt-state saturation using
MONITOR MODES
Use PC Sampling to verify source of
interrupt-state time (lock traffic)
Look for causes of locking imbalance among nodes
Double-check incoming workload balancing
mechanisms
Use SDAgt LCK SHOW ACTIVE on all nodes or the V6
Freeware tool LOCK_ACTV.COM to identify resource
master nodes for the busiest lock trees, and
their relative activity levels, especially
looking for cases where the resource master
nodes lock request rate to the resource tree is
lower than another nodes rate
(Note these activity counts will be unavailable
if PE1 is set to a negative value)
Examine system parameter values
See if PE1 parameter has been set to a non-zero
value, especially -1 or a very-small positive
value like 1

95
Common ProblemHard-coded PE1 settings

Problem resolution
Set PE1 to zero then, to prevent resource tree
mastership thrashing
If possible, add one new faster-CPU system to the
cluster
Try biasing incoming workload so 1 node has a
slightly-higher workload level than the other
nodes
If there are multiple types of workload, try to
direct a little more of each workload type to a
different node in the cluster
If these measures are insufficient to prevent
resource tree mastership thrashing, set the PE1
parameter to a positive, non-zero value (as large
a value as possible and still prevent thrashing).
Setting PE1 to -1 prevents tracking lock-request
rates and should be a last resort. This non-zero
setting may only be necessary during
high-workload periods.
Alternatively, upgrade to OpenVMS version 8.3

96
Common ProblemHard-coded PE1 settings

Real-life example
Two-node cluster for high availability
Rdb database run on only 1 of 2 nodes at a time
(for best performance), with opposite node only
there for failover
Symptoms
After failover between nodes, very poor
performance coupled with high interrupt-state
time
High outgoing lock rate on active node, and
high incoming lock rate on inactive node
PE1 hard-coded to -1 years ago with the intent of
preventing resource tree mastership thrashing at
a time when both nodes were active at once
In failover process, locks apparently stayed
around on inactive node for some reason
Setting PE1 to 0 solved the problem instantly

97
Common Problem

CPU interrupt-state saturation
Symptoms
A variety of strange performance anomalies
May be sporadic or intermittent
Detection
MONITOR MODES, Availability Manager, other
performance data collectors (e.g. DECps, ECP/TDC)
Be aware of potential data collector blindness
(resulting in missing data) during
interrupt-state saturation

98
Common ProblemCPU interrupt-state saturation

Contributing factors
Lack of Fast_Path support
Early OpenVMS versions (e.g. PEDRIVER prior to
7.3-2)
Memory Channel adapters lack Fast_Path support
Large cluster node counts, powerful systems, all
running same application workload (thus sharing
same resource trees)
Nodes can gang up on resource master nodes

99
Common ProblemCPU interrupt-state saturation

Mitigation
Reduce locking demand if possible
Use No-Query locking for read-only access
Cache data if possible instead of re-reading it
Use Fast_Path to spread CPU interrupt workload
across CPUs
PEDRIVER and LAN adapters together on a separate
CPU from other I/O devices if using LAN as the
cluster interconnect
Distribute workload more evenly across nodes in a
cluster
Buy faster CPUs

100
Common Problem

Using the File System as a Database
Symptoms
Feeling of slow I/O performance but no detectable
controller or disk bottlenecks
Application design tends to create large numbers
of separate files. rather than creating records
within files
Directory files may often grow large in size
(thousands of blocks)
Detection
Availability Manager (or DECamds) or SDAgt SHOW
RESOURCE/CONTENTION or V6 Freeware tool
LCKQUE.COM may show lock contention on volume
allocation locks (F11Bv) or on file
serialization locks (F11Bs) for directory files

101
Common ProblemUsing the File System as a
Database

Problem resolution / mitigation
Redesign application to create records within
files instead of separate individual files
Or spread files around among many
disks/directories
To avoid contention on volume allocation locks,
spread files and directories across more separate
volumes
To avoid contention on file serialization locks
on directory files, spread files among more
separate directory files, and be sure to clear
files out of directories to prevent directory
files from growing large

102
Common ProblemUsing the File System as a
Database

Real-life example
Customer had application which tended to create
large numbers of files, which it kept track of
using records in RMS indexed files
Had recently upgraded disk subsystem from
HS-series controllers to EVAs
Customer combined directories which had
previously been spread across 9 HS-series disk
volumes onto 1 EVA disk volume, based on the fact
that the EVA internally spreads data across many
disk spindles, so it could easily handle the
combined I/O rate of all 9 disk units
Result was heavy contention for volume allocation
lock
Mitigation
Spread files across multiple volumes again
Increased size of disk allocation units to reduce
rate of requests for volume allocation lock

103
Common Problem

Many-node cluster
Symptoms
Slow response times due to lock-request latencies
Cluster consists of many (e.g. 8) small (e.g.
4-CPU) nodes
Often a defensive response to NUMA scaling issues
on Wildfire (16-CPU GS-160 or 32-CPU GS-320)
boxes technique was to partition them into 4-CPU
(QBB) nodes and re-combine the partitions into a
cluster to get around NUMA performance problems
across QBB boundaries

104
Common ProblemMany-node cluster

Problem definition
Local lock requests are fastest
Roughly 10-20 times higher CPU cost for remote
requests
Roughly 50 times higher elapsed time for remote
requests
Ideal case for lock-request latency is a single
SMP box, but
High Availability implies you need at least 2
nodes
The more nodes in a cluster, the higher the
probability of remote lock requests

105
Common ProblemMany-node cluster
106
Unusual Problems
107
Unusual Problem 1

Symptoms
Intermittent pauses of 10 seconds to a minute in
application activity across the entire cluster
MONITOR CLUSTER shows basically all disk I/O,
locking, and CPU drops to next to nothing during
these pauses
Problem occurs only during heavy workload
periods. Problem recurs about every 2 to 5
minutes for an hour or two.
Problem started about the time the 2nd Alpha 8400
was added to a cluster of 3 VAX 7000-700s and 1
VAX 7000-800. Problem only gets worse and worse
as more 8400s are added to the cluster.
VMS version is 7.1-2

108
Unusual Problem 1

Troubleshooting steps
Check for traditional bottlenecks in CPU, memory,
I/O.
DECps Analysis Report indicated nothing unusual
Memory utilization was reasonable at 70, and
stayed the same during the pauses
CPU, I/O, and locking rates all went to
essentially zero during the pauses, so obviously
CPU usage was not the bottleneck

Write a Comment

User Comments (0)