P1254156803PjIxq - PowerPoint PPT Presentation

1 / 157
About This Presentation
Title:

P1254156803PjIxq

Description:

Tuesday, May 20 and Thursday, May 22; Session 250 ... OpenVMS tends to dynamically move Lock Mastership duties to the node with the ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 158
Provided by: keithp7
Category:

less

Transcript and Presenter's Notes

Title: P1254156803PjIxq


1
(No Transcript)
2
RMS Block 1
  • Keith Parris Systems/Software Engineer
  • HP
  • Tuesday, May 20 and Thursday, May 22 Session 250

3
Detecting and Solving Performance Bottlenecks
Using Locking Data
4
Background
  • OpenVMS system managers have traditionally looked
    at performance in 3 areas
  • CPU
  • Memory
  • I/O
  • But in OpenVMS clusters, the Distributed Lock
    Manager is another important element involved in
    performance, and while it can sometimes
    complicate matters, it can also provide some
    valuable clues in identifying performance
    bottlenecks in various areas

5
Distributed Lock Manager Monitoring
6
Available Tools
  • MONITOR
  • MONITOR LOCK, DLOCK, RLOCK, MODES, CLUSTER, etc.
  • SHOW CLUSTER/CONTINUOUS
  • Availability Manager (or DECamds), particularly
    Lock Contention
  • Performance data collectors/analyzers
  • T4 / Tlviz
  • HP Perfdat
  • DECps (Unicenter Performance Management for
    OpenVMS)
  • Perfcap (PAWZ)
  • ECP / TDC
  • SDA and SDA Extensions
  • SHOW LOCK, SHOW RESOURCE, etc.
  • Extensions LCK, CNX, SPL, etc.
  • V6 Freeware CD directory KP_LOCKTOOLS
  • LOCKTIME.COM, LOCK_ACTV.COM, LCKQUE.COM

7
MONITOR LOCK
OpenVMS Monitor
Utility LOCK
MANAGEMENT STATISTICS
on node XYZB11
1-JUN-2000 091638.82
CUR AVE MIN MAX
New ENQ Rate 4362.00
4010.50 3659.00 4362.00 Converted ENQ
Rate 690.00 636.66 583.33
690.00 DEQ Rate
4363.33 4011.83 3660.33 4363.33
Blocking AST Rate 6.00 4.83
3.66 6.00 ENQs Forced To Wait
Rate 45.00 39.83 34.66
45.00 ENQs Not Queued Rate 0.66
1.16 0.66 1.66 Deadlock
Search Rate 0.00 0.00
0.00 0.00 Deadlock Find Rate
0.00 0.00 0.00 0.00
Total Locks 419770.00 419767.50
419765.00 419770.00 Total Resources
180201.00 180194.00 180187.00 180201.00
8
MONITOR DLOCK
OpenVMS Monitor
Utility DISTRIBUTED LOCK
MANAGEMENT STATISTICS
on node XYZB11
1-JUN-2000 091727.77
CUR AVE MIN MAX
New ENQ Rate (Local) 902.00
808.00 528.66 993.33
(Incoming) 2302.00 2149.00 2017.66
2302.00 (Outgoing)
962.66 950.11 824.33 1063.33
Converted ENQ Rate (Local) 198.33 199.66
149.33 251.33
(Incoming) 168.66 222.44 168.66
284.33 (Outgoing) 181.00
113.22 75.66 181.00 DEQ Rate
(Local) 899.00 807.11 528.00
994.33 (Incoming)
2300.66 2148.88 2018.66 2300.66
(Outgoing) 971.00 952.77
824.33 1063.00 Blocking AST Rate (Local)
0.00 0.33 0.00 0.66
(Incoming) 1.33 0.88
0.00 1.33
(Outgoing) 5.66 6.11 3.33
9.33 Dir Functn Rate (Incoming) 16.33
16.66 14.00 19.66
(Outgoing) 49.00 24.77 9.66
49.00 Deadlock Message Rate
0.00 0.00 0.00 0.00
9
MONITOR RLOCK
OpenVMS Monitor
Utility DYNAMIC LOCK
REMASTERING STATISTICS
on node KEITH
6-MAY-2002 184235.07
CUR AVE MIN
MAX Lock Tree Outbound Rate 0.00
0.00 0.00 0.00 (Higher
Activity) 0.00 0.00
0.00 0.00 (Higher LCKDIRWT)
0.00 0.00 0.00 0.00
(Sole Interest) 0.00 0.00
0.00 0.00 Remaster Msg Send Rate
0.00 0.00 0.00 0.00
Lock Tree Inbound Rate 0.00
0.00 0.00 0.00 Remaster Msg
Receive Rate 0.00 0.00 0.00
0.00
10
MONITOR CLUSTER
Statistic CURRENT OpenVMS Monitor
Utility 4-APR-2001 175718
CLUSTER STATISTICS
CPU MEMORY
CPU Busy
0 25 50 75 100Memory In Use
0 25 50 75 100
----------------
---------------- XYZB14 57
XYZB15 48
XYZB13 57
XYZB16 48 XYZB18
48 XYZB22
48 XYZB22 42
XYZB21 47
XYZV 40
XYZB18 46 XYZB11
40 XYZV
42 -------------------------
-------------------------------------------------
----- DISK
LOCK
I/O Operation Rate 0 25 50 75
100Tot ENQ/DEQ Rate 0 125 250 375 500
----------------
---------------- 256DPA23 1312
XYZB14 22297
256DPA14 161
XYZB13 5814
256DPA26 125
XYZB11 1844
256DPA122 78
XYZB20 1452
256DPA3500 77
XYZV 888
DSA6002 66
XYZB21 857


11
MONITOR SCS (example /ITEMM_RECEIVED)
OpenVMS Monitor
Utility SCS
STATISTICS on
node XYZB11
4-APR-2001 180457.44 Message Receive Rate
CUR AVE MIN MAX
XYZB11 0.00
0.00 0.00 0.00 HS100B
1.33 3.90 1.33
7.33 XYZB21 85.00
120.84 38.66 189.33 HS101B
2.33 2.37 0.66
3.33 HS103B 7.66
8.61 5.33 15.00 HS102B
0.33 0.33 0.00
0.66 XYZB14
169.33 237.17 97.00 400.33
XYZB22 11.00 14.51
2.00 33.00 XYZB16
4.00 5.09 0.00 14.00
XYZB18 46.00
18.46 1.00 46.00 XYZB15
65.00 77.53 30.66
135.33 SYZB13 59.00
93.90 47.66 159.00 XYZV
31.66 56.87 29.00
114.00 XYZB19
5.00 22.70 0.00 49.66 XYZB12
128.66 61.78
33.66 128.66
12
Monitoring tools
  • SHOW CLUSTER/CONTINUOUS
  • ADD CONNECTIONS
  • ADD CR_WAITS

13
SHOW CLUSTER/CONTINUOUS
View of Cluster from system ID 12345 node
NODE01 6-MAY-2002 144516 -------------
-----------------------------------------------
------ SYSTEMS MEMBERS
CONNECTIONS COUNTERS -----------------
-----------------------------------------------
-- NODE SOFTWARE STATUS
LOC_PROC_NAME CON_STA CR_WAITS
--------------------------------------------
-------------------- NODE01 VMS E7.3
MEMBER SCSDIRECTORY LISTEN
MSCPDISK
LISTEN
MSCPTAPE LISTEN
VMSSDA_AXP
LISTEN
VMSVAXcluster LISTEN
SCATRANSPORT
LISTEN NODE02 VMS V7.2 MEMBER
SCATRANSPORT OPEN 0
VMSDISK_CL_DRVR OPEN
0
MSCPDISK OPEN 0
MSCPTAPE OPEN
0
VMSVAXcluster OPEN 1024 NODE03
VMS V7.3 MEMBER VMSDISK_CL_DRVR OPEN
0
SCATRANSPORT OPEN 0
VMSTAPE_CL_DRVR OPEN
0
MSCPDISK OPEN 0
MSCPTAPE OPEN
0
VMSVAXcluster OPEN 3 NODE04
VMS V7.2 MEMBER SCATRANSPORT OPEN
0
14
Monitoring tools
  • ANALYZE/SYSTEM
  • SDAgt SHOW RESOURCE
  • SDAgt SHOW LOCK
  • SDAgt LCK !New SDA extension, 7.2-2 and
    above

15
Monitoring tools
  • ANALYZE/SYSTEM
  • SHOW LOCK qualifiers (OpenVMS 7.2 and above)
  • /WAITING
  • Displays only the waiting lock requests (those
    blocked by other locks)
  • /SUMMARY
  • Displays summary data and performance counters
  • SHOW RESOURCE qualifier (OpenVMS 7.2 and above)
  • /CONTENTION
  • Displays resources which are under contention

16
Monitoring tools
  • ANALYZE/SYSTEM
  • New SDA extension LCK (OpenVMS 7.2-2 and above)
  • SDAgt LCK !Shows help text with command summary
  • Can display various additional lock manager
    statistics
  • SDAgt LCK STATISTIC !Shows lock manager statistics
  • Can show busiest resource trees by lock activity
    rate
  • SDAgt LCK SHOW ACTIVE !Shows lock activity
  • Can trace lock requests
  • SDAgt LCK LOAD !Load the debug execlet
  • SDAgt LCK START TRACE !Start tracing lock requests
  • SDAgt LCK STOP TRACE !Stop tracing
  • SDAgt LCK SHOW TRACE !Display contents of trace
    buffer

17
Availability Manager
  • Lock Contention
  • Data collection for Lock Contention must be
    enabled
  • Detects lock queues (lock requests waiting on
    other locks). Identifies
  • Lock holder
  • Lock waiters
  • Can take action to free the blocking lock either
    exit image or delete offending process
  • Lock contention events are logged in the file
    AMDSSYSTEMAMDSLOCK_LOG.LOG
  • Also available in the older DECamds product

18
DECps Example
Full Analysis XYZB11 (Compaq AlphaServer
GS140) Page 13
PSPA
V2.1.5 Wednesday
17-NOV-1999 0630 to 0730 CONCLUSION 5.

R0300 There are many
lock requests per second that are put
into the lock wait queue applications
may be experiencing delays. This
situation usually indicates that users
are contending for shared resources.
Two common reasons for this symptom
Applications may inherently cause this behavior
and not affect the general
workload, so be sensitive to
response time degradation to rule this out.
These applications might be redesigned
to lock resources at a lower level, to
lower the contention. Disk volumes
(including solid-state devices) might be
under too much contention by too many users
across the cluster. Response time
problems would affect the users of
these disks. Try to redistribute the activity
to alleviate the contention.
19
DECps Example
Lastly you may look at the lock wait
queue with SDA to isolate the users
who are waiting, and resources they are
waiting for. Total number of samples
supporting this conclusion 20
CONDITIONS 1. ENQUE_LOCKS_FORCED_TO_WAI
T_RATE .GT. 1.00 CPU_VUP_RATING
2. OCCURRENCES .GE. 1
EVIDENCE Enque lck Process w/ highest disk
I/O Volume w/ wait
------------------------------------- highest
Time of rate Username Imagename
Vol w/hgstIO I/O rate Occurrence --------
----------- ------------ ------------
------------ --------------- 557.25 WXYZ
ZZSERVER SYSFILES SS1 17-NOV
063200 671.42 WXYZ ZZSERVER
SYSFILES SS1 17-NOV 063400
599.68 WXYZ ZZSERVER SS100000000D
SS1 17-NOV 063600 689.27 WXYZ
ZZSERVER SS1000000000 SS1 17-NOV
063800 920.70 WXYZ ZZSERVER
SS1000000007 SS1 17-NOV 064000
20
DECps Example
-------------------------- CLUSTER Lock
------------------------- !
! !
H-Orig Out Enq Dir
Op R-Orig ! ! Lck Act
Bound Wait Incomg Lck Act ! !
(/sec) () () (/sec)
(/sec) ! ! ------ ------
------ ------ ------ ! ! Node Average
6991.0 26.8 2.6 24.6 1873.3 !
! Node Minimum 42.1 13.9 0.9
0.0 0.1 ! ! Node Maximum 14123.3
72.8 9.7 84.7 14658.5 ! ! Cluster
Total 118844.3 26.8 2.6 417.8
31845.1 ! ------------------------------------
-----------------------------
21
DECps Example
-------------------------- CLUSTER Lock
------------------------- !
! !
H-Orig Out Enq Dir
Op R-Orig ! ! Lck Act
Bound Wait Incomg Lck Act ! !
(/sec) () () (/sec)
(/sec) ! ! ------ ------
------ ------ ------ ! ! XYZA03
42.1 37.3 2.2 0.0 0.3 !
! XYZA07 97.4 61.1 3.9
0.0 0.1 ! ! XYZA08 102.2
47.6 2.1 0.0 0.3 ! ! XYZA09
105.5 51.3 4.2 0.0
0.4 ! ! XYZB11 9433.7 24.9
9.7 56.8 14658.5 ! ! XYZB12
11385.7 24.0 1.2 16.0 818.8 !
! XYZB13 4738.7 62.0 3.6
32.9 1051.8 ! ! XYZB14 14123.3
13.9 4.6 16.3 10279.1 ! ! XYZB15
10011.6 26.0 2.1 33.8
2330.8 ! ! XYZB16 7148.1 35.7
1.7 24.8 1428.8 ! ! XYZB18
13447.1 21.5 1.0 27.2 88.6 !
! XYZB19 13569.2 19.5 0.9
52.5 18.3 ! ! XYZB20 6252.1
42.1 2.7 36.5 906.6 ! ! XYZB21
12083.0 22.3 1.1 84.7
126.8 ! ! XYZB22 12061.4 23.5
1.2 15.6 131.7 ! ! XYZA05
608.7 29.3 1.0 0.0 0.2 !
! XYZB23 3637.9 72.8 3.7
20.8 4.3 ! ------------------------------
-----------------------------------
22
OpenVMS Clusters and the Distributed Lock Manager
  • An OpenVMS Cluster is a set of distributed
    systems which cooperate
  • Cooperation requires coordination
  • The Distributed Lock Manager is critical to
    making that coordination possible

23
Foundation for Shared Access
Users
Application
Application
Application
Node
Node
Node
Node
Node
Node
Distributed Lock Manager
Connection Manager Rule of Total Connectivity and
Quorum Scheme
Shared resources (files, disks, tapes)
24
Distributed Lock Manager
  • The Lock Manager provides mechanisms for
    coordinating access to physical devices, both for
    exclusive access and for various degrees of
    sharing

25
Distributed Lock Manager
  • Physical resources that the Lock Manager is used
    to coordinate access to include
  • Tape drives
  • Disks
  • Files
  • Records within a file
  • as well as internal operating system cache
    buffers and so forth

26
Distributed Lock Manager
  • Physical resources are mapped to symbolic
    resource names, and locks are taken out and
    released on these symbolic resources to control
    access to the real resources

27
Distributed Lock Manager
  • System services ENQ and DEQ allow new lock
    requests, conversion of existing locks to
    different modes (or degrees of sharing), and
    release of locks, while GETLKI allows the lookup
    of lock information

28
OpenVMS ClusterDistributed Lock Manager
  • Physical resources are protected by locks on
    symbolic resource names
  • Resources are arranged in trees
  • e.g. File ? Data bucket ? Record
  • Different resources (disk, file, etc.) are
    coordinated with separate resource trees, to
    minimize contention

29
Symbolic lock resource names
  • Symbolic resource names
  • Common prefixes
  • SYS for OpenVMS executive
  • F11B for XQP, file system
  • RMS for Record Management Services
  • See the book OpenVMS Internals and Data
    Structures by Ruth Goldenberg, et al
  • Appendix H in Alpha V1.5 version
  • Appendix A in Alpha V7.0 version

30
Resource names
  • Example Device Lock
  • Resource name format is
  • SYS Device Name in ASCII text

3A3333324147442431245F24535953 SYS_1DGA233 SYS
? (SYS facility) _1DGA233 ? (Device name)
31
Resource names
  • Example RMS lock tree for an RMS indexed file
  • Resource name format is
  • RMS File ID Flags byte Lock Volume Name
  • Identify filespec using File ID
  • Flags byte indicates shared or private disk mount
  • Pick up disk volume name
  • This is label as of time disk was mounted
  • Sub-locks are used for buckets and records within
    the file

32
Decoding an RMS File Root Resource Name
RMSt......FDDI_COMMON ... 000000
204E4F4D4D4F435F49444446 02 000000011C74 24534D52
24534D52 RMS ? (RMS Facility RMS File Root
Resource) 00 00 0001 1C74 RVN
FilX Sequence_Number File_Number ? File ID
7284,1,0) 02 Flags byte (Disk is mounted
/SYSTEM) 204E4F4D4D4F435F49444446 FDDI_COMMON ?
(Disk label) dump/header/id7284/blockcount0
diskFDDI_COMMON000000indexf.sys Dump
of file _DSA100SYSEXESYSUAF.DAT1 ... ? (File)
33
Internal Structure of an RMS Indexed File
34
RMS Data Bucket Contents
Data Bucket
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
Data Record
35
RMS Indexed FileBucket and Record Locks
  • Sub-locks of RMS File Lock
  • Have to look at Parent lock to identify file
  • Bucket lock
  • 4 bytes VBN of first block of the bucket
  • Record lock
  • 8 bytes Record File Address (RFA) of record

36
Distributed Lock ManagerLock Master nodes
  • OpenVMS assigns a single node at a time to keep
    track of all the resources in a given resource
    tree, and any locks taken out on those resources
  • This node is called the Lock Master node for that
    tree
  • Different trees often have different Lock Master
    nodes
  • OpenVMS tends to dynamically move Lock Mastership
    duties to the node with the most locking activity
    on that tree

37
Distributed Locks
Node A Lock Master for resource X
Node B
Node C
Lock on resource X
Lock on resource X
Lock on resource X
Copy of Node Bs Lock on resource X
Copy of Node Cs Lock on resource X
38
Directory Lookups
  • This is how OpenVMS finds out which node is the
    lock master
  • Only needed for 1st lock request on a particular
    resource tree on a given node
  • Resource Block (RSB) remembers master node CSID
  • Basic conceptual algorithm Hash resource name
    and index into the lock directory weight vector,
    which has been created based on LOCKDIRWT values
    for each node

39
Lock Directory Weight Vector
Big node PAPABR Lock
Directory Weight 2 Middle-sized node MAMABR
Lock Directory Weight 1 Satellite node
BABYBR Lock Directory Weight 0
Resulting lock directory weight vector
PAPABR
PAPABR
MAMABR
40
Lock Directory Lookup
Local node
User lock Request User request satisfied
Directory node
Lock Master Node
41
Performance Hints
  • Avoid DEQing a lock which will be re-acquired
    later
  • Instead, convert to Null lock, and convert back
    up later
  • Or, create locks as sub-locks of a parent lock
    that remains held
  • This avoids directory lookups
  • and also avoids losing the activity counts in the
    root RSB used for lock-remastering decisions

42
Performance Hints
  • For lock trees which come and go often
  • A separate program could take out a Null lock on
    the root resource on each node at boot time, and
    just hold it forever
  • This avoids lock directory lookup operations for
    that tree

43
Lock Request Latencies
  • Latency depends on several things
  • Directory lookup needed or not
  • Local or remote directory node
  • ENQ or DEQ operation (acquiring or releasing a
    lock)
  • Local (same node) or remote lock master node
  • And if remote, the speed of interconnect used

44
ENQueue
  • New lock request (0-2 round trips)
  • No off-node traffic if this node is lock master
  • 1 round trip if
  • no other node has interest, or
  • directory node is also lock master, or
  • local node is directory node
  • 2 round trips if
  • directory node is not also the lock master node

45
ENQueue
  • Conversion or sub-lock (0 or 1 round-trip)
  • No off-node traffic if this node is lock master
  • 1 round trip to lock master (except 2PC)
  • RSB already contains CSID of lock master node, so
    we never need to do a directory lookup

46
DEQueue
  • 1-way message
  • No response expected
  • Client doesnt wait
  • SCS message guarantee ensures eventual arrival
  • SCS credits may limit number of these in-flight
    at once

47
Lock Request Latency Local vs. Remote
  • Local requests are fastest
  • Remote requests are significantly slower
  • Code path 20 times longer
  • Interconnect also contributes latency
  • Total latency up to 2 orders of magnitude (100x)
    higher than local requests

48
Lock Request Latencies
  • I used the LOCKTIME program pair written by Roy
    G. Davis, author of the book VAXcluster
    Principles, to measure latency of lock requests
    locally and across different interconnects
  • LOCKTIME algorithm
  • Take out 5000 locks on remote node, making it
    lock master for each of 5000 lock trees (requires
    ENQLM gt 5000)
  • Do 5000 ENQs, lock converts, and DEQs, and
    calculate average latencies for each
  • Lock conversion request latency is roughly
    equivalent to round-trip time between nodes over
    a given interconnect
  • See LOCKTIME.COM from OpenVMS Freeware V6,
    KP_LOCKTOOLS directory

49
Lock Request Latency Local
Client process on same node2-4 microseconds
Lock Master Node
Client
50
Client across Gigabit Ethernet200 microseconds
Lock Request Latency Remote
Lock Master
Client node
Client
GbE Switch
51
Lock Request Latencies
52
Lock Mastership
  • Lock mastership node may change for various
    reasons
  • Lock master node goes down -- new master must be
    elected
  • OpenVMS may move lock mastership to a better
    node for performance reasons
  • LOCKDIRWT imbalance found (pre-8.3), or
  • Activity-based Dynamic Lock Remastering
  • Lock Master node no longer has interest

53
Lock Mastership
  • Lock master selection criteria
  • Interest
  • Only move resource tree to a node which is
    holding at least some locks on that resource tree
  • Lock Directory Weight (LOCKDIRWT) (pre-8.3)
  • Move lock tree to a node with interest and a
    higher LOCKDIRWT
  • Lock Remaster Weight (LOCKRMWT) (8.3 and above)
  • Bias lock tree movement toward nodes with a
    higher Remaster Weight
  • Activity Level
  • Move lock tree to a node with interest and a
    higher average activity level (biased by Remaster
    Weight for 8.3 and above)

54
How to measure locking activity
  • OpenVMS keeps counters of lock activity for each
    resource tree
  • but not for each of the sub-resources
  • So you can see the lock rate for an RMS indexed
    file, for example
  • but not for individual buckets or records within
    that file
  • SDA extension LCK in OpenVMS V7.2-2 and above can
    show lock rates, and even trace all lock requests
    if needed. This displays data on a per-node
    basis.
  • Cluster-wide summary is available using
    LOCK_ACTV.COM from OpenVMS Freeware V6,
    KP_LOCKTOOLS directory

55
Lock Remastering
  • Circumstances under which remastering occurs, and
    does not
  • LOCKDIRWT values
  • Pre-8.3, OpenVMS tends to remaster to node with
    higher LOCKDIRWT values, never to node with lower
    LOCKDIRWT
  • Shifting initiated based on activity counters in
    root RSB (biased by Remaster Weight for 8.3 and
    above)
  • PE1 parameter being non-zero can prevent movement
    or place threshold on lock tree size
  • Shift if existing lock master loses interest

56
Lock Remastering
  • OpenVMS rules for dynamic remastering decision
    based on activity levels (prior to version 8.3)
  • assuming equal LOCKDIRWT values
  • 1) Must meet general threshold of at least 80
    lock requests so far (LCKGL_SYS_THRSH)
  • 2) New potential master node must have at least
    10 more requests per second than current master
    (LCKGL_ACT_THRSH)

57
Lock Remastering
  • OpenVMS rules for dynamic remastering (pre-8.3)
  • 3) Estimated cost to move (based on size of lock
    tree) must be less than estimated savings (based
    on lock rate)
  • except if new master meets criteria (2) for 3
    consecutive 8-second intervals, cost is ignored
  • 4) No more than 5 remastering operations can be
    going on at once on a node (LCKGL_RM_QUOTA)

58
Lock Remastering
  • OpenVMS rules for dynamic remastering (pre-8.3)
  • 5) If PE1 on the current master has a negative
    value, remastering trees off the node is disabled
  • 6) If PE1 has a positive, non-zero value on the
    current master, the tree must be smaller than PE1
    in size or it will not be remastered

59
Lock Remastering
  • Implications of dynamic remastering rules
    (pre-8.3)
  • LOCKDIRWT must be equal for lock activity levels
    to control choice of lock master node
  • PE1 can be used to control movement of lock trees
    OFF of a node, but not ONTO a node
  • RSB stores lock activity counts, so even high
    activity counts can be lost if the last lock is
    DEQueued on a given node and thus the RSB gets
    deallocated

60
Lock Remastering
  • Implications of dynamic remastering rules
    (pre-8.3)
  • With two or more large CPUs of equal size running
    the same application, lock mastership thrashing
    is not uncommon
  • 10 more lock requests per second is not much of a
    difference when you may be doing 100s or 1,000s
    of lock requests per second
  • Whichever new node becomes lock master may then
    see its own lock rate slow somewhat due to the
    remote lock request workload

61
How to Detect Lock Mastership Thrashing
  • Detection of remastering activity
  • MONITOR RLOCK in 7.3 and above (not 7.2-2)
  • SDAgt SHOW LOCK/SUMMARY in 7.2 and above
  • Change of mastership node for a given resource
  • Check message counters under SDA
  • SDAgt EXAMINE PMSGL_RM_RBLD_SENT
  • SDAgt EXAMINE PMSGL_RM_RBLD_RCVD
  • Counts which increase suddenly by a large amount
    indicate remastering of large tree(s)
  • SENT Off of this node
  • RCVD Onto this node
  • See example procedures WATCH_RBLD.COM and RBLD.COM

62
SDAgt SHOW LOCK/SUMMARY
  • analyze/system
  • OpenVMS (TM) Alpha system analyzer
  • SDAgt show lock/summary
  • ...
  • Lock Manager Performance Counters
  • ----------------------------------
  • ...
  • Lock Remaster Counters
  • Tree moved to this node
    0
  • Tree moved to another node
    0
  • Tree moved due to higher Activity
    0
  • Tree moved due to higher LOCKDIRWT
    0
  • Tree moved due to Single Node Locks
    0
  • No Quota for Operation
    0
  • Proposed New Manager Declined
    0
  • Operations completed
    0
  • Remaster Messages Sent
    0

63
MONITOR RLOCK
OpenVMS Monitor
Utility DYNAMIC LOCK
REMASTERING STATISTICS
on node KEITH
6-MAY-2002 184235.07
CUR AVE MIN
MAX Lock Tree Outbound Rate 0.00
0.00 0.00 0.00 (Higher
Activity) 0.00 0.00
0.00 0.00 (Higher LCKDIRWT)
0.00 0.00 0.00 0.00
(Sole Interest) 0.00 0.00
0.00 0.00 Remaster Msg Send Rate
0.00 0.00 0.00 0.00
Lock Tree Inbound Rate 0.00
0.00 0.00 0.00 Remaster Msg
Receive Rate 0.00 0.00 0.00
0.00
64
How to Prevent Lock Mastership Thrashing
  • Upgrade to 8.3 or above
  • If thats not an option, consider
  • Unbalanced node power
  • Unequal workloads
  • Unequal values of LOCKDIRWT
  • Non-zero values of PE1

65
Impact of Non-zero PE1 Values
  • Concern Locking down remastering with PE1 (to
    avoid lock mastership thrashing) can result in
    sub-optimal lock master node selections over time

66
Mitigating Impact of Non-zero PE1 Values
  • Possible ways of mitigating side-effects of
    preventing remastering using PE1
  • Adjust PE1 value as high as you can without
    producing noticeable delays
  • Set PE1 to 0 for short periods, periodically

67
Deadlock searches
  • The OpenVMS Distributed Lock Manager
    automatically detects lock deadlock conditions,
    and generates an error to one of the programs
    causing the deadlock

68
Deadlock searches
  • Deadlock searches can take lots of time and
    interrupt-state CPU time
  • DECps Performance Analysis report can identify
    when these are occurring
  • DEADLOCK_WAIT parameter controls how long we wait
    before starting a deadlock search

69
Interrupt-state/stack saturation
  • Too much lock mastership workload, MSCP-serving,
    etc. can saturate a CPU in interrupt state
  • See utilization (and detect saturation 90)
    with
  • MONITOR MODES/CPUn/ALL
  • where n is the number of the CPU of interest
  • the Primary CPU number can be determined using
    SHOW CPU)
  • T4 MON.MODE Interrupt State per-CPU data,
    displayed using TLviz
  • Adding RMS Global Buffers may help reduce lock
    rates (post 7.2-1H1)

70
Response time vs. Utilization
71
Interrupt-state/stack saturation
  • FAST_PATH
  • Can shift interrupt-state workload off primary
    CPU in SMP systems
  • IO_PREFER_CPUS value of an even number avoids
    sending interrupts from FAST_PATH devices to the
    Primary CPU
  • Consider limiting interrupts to a subset of
    non-primary CPU
  • FAST_PATH for CI since 7.1
  • FAST_PATH for SCSI and FC is in 7.3 and above
  • FAST_PATH for LANs and PEDRIVER in 7.3-2
  • Even with FAST_PATH enabled, the Primary CPU
    still received the device interrupt, but handed
    it off immediately via an inter-processor
    interrupt
  • 7.3-1 allowed FAST_PATH interrupts to bypass the
    Primary CPU entirely and go directly to a
    non-primary CPU on hardware platforms which
    support this
  • No FAST_PATH for Memory Channel (and most likely
    never will be)
  • No FAST_PATH for Galaxy Shared Memory Cluster
    Interconnect

72
How should Fast_Path assignments to CPUs be done?
  • Customer question How should interrupts from LAN
    adapters/PEDRIVER and Fibre Channel HBAs be
    directed to CPUs within each box?
  • Logic might indicate three possible approaches
  • Use the default OpenVMS Fast_Path assignments
    as-is
  • Spread interrupts as broadly as possible across
    CPUs in an attempt to avoid saturation of any
    single CPU, or
  • Put like devices on the same CPU for better
    efficiency (fewer inter-processor interrupts,
    better cache-line sharing for driver code)
  • Conclusion after customer test Putting like
    devices on same CPU is more efficient and results
    in better response times under heavy test
    workload.

73
Effect of Fast_Path Options on Interrupt State
Spread
Together
Original
74
Effect of Optimal Fast_Path Settings
75
Dedicated-CPU Lock Manager
  • With 7.2-2 and above, you can choose to dedicate
    a CPU to do lock management work. This may help
    reduce MP_Synch time.
  • Using this can be helpful when
  • You have more than 5 CPUs in the system, and
  • Youre already wasting more than a CPUs worth in
    MP_Synch time contending for the LCKMGR spinlock
  • See SDAgt SPL extension and SYSEXAMPLESSPL.COM
  • LCKMGR_MODE parameter
  • 0 Disabled
  • gt1 Enable if at least this many CPUs are
    running
  • LCKMGR_CPUID parameter specifies which CPU to
    dedicate to LCKMGR_SERVER process

76
Troubleshooting Real-World Problems Using Locking
Data
  • Techniques
  • Monitor for
  • High lock rates
  • High lock queues
  • Primary CPU Interrupt-State Saturation
  • SCS credit waits
  • Deadlock Searches and Finds

77
LOCK_ACTV.COM Example
0000002020202020202020203153530200004C71004624534D
52 RMSF.qL...SS1 ... RMS lock tree
for file 70,19569,0 on volume SS1 File
specification DISKSS1DATA8PDATA.IDX1
Total 11523 XYZB12 6455
XYZB11 746 XYZB14 611
XYZB15 602 XYZB23 564
XYZB13 540 XYZB19 532
XYZB16 523 XYZB20 415
XYZB22 284 XYZB18 127
XYZB21 125 Lock Master Node for the
resource This is a fairly hot file. Here the
lock master node is optimal.
78
LOCK_ACTV.COM Example
0000002020202032454C494653595302000000D3000C24534D
52 RMS.......SYSFILE2 ... RMS lock tree
for file 12,211,0 on volume SYSFILE2 File
specification DISKSYSFILE2SYSFILE2SYSUAF.DAT
5 Total 184 XYZB16 75
XYZB20 48 XYZB23 41
XYZB21 16 XYZB19 2
XYZB15 1 XYZB13 1
XYZB14 0 XYZB12 0 This
reflects user logins, process creations, password
changes, and such. Note the poor lock master node
selection here (XYZB16 would be optimal).
79
Example Application (re)opens file frequently
  • Symptom High lock rate on File Access
    Arbitration Lock for application data file
  • Cause BASIC program re-executing OPEN command
    for a file BASIC dutifully closes and then
    re-opens file
  • Fix Modify BASIC program to execute OPEN
    statement only once at image startup time

80
LOCK_ACTV.COM Example
00000016202020202020202031505041612442313146
F11BaAPP1 .... Files-11 File Access
Arbitration lock for file 22,,0 on volume
APP1 File specification DISKAPP1DATAXDATA
.IDX1 Total 50 XYZB15
8 XYZB21 7 XYZB16 7
XYZB19 6 XYZB20 6
XYZB23 6 XYZB18 5
XYZB13 3 XYZB12 1
XYZB22 1 XYZB14 1 This shows
where the application is apparently opening (or
re-opening) this particular file 50 times per
second.
81
Example Directory File Grows Large
  • Symptom High queue length on file serialization
    lock for .DIR file
  • Cause Directory file has grown to over 127
    blocks
  • (VMS version 7.1-2 or earlier 7.2 and later are
    much less sensitive to this problem, so queuing
    occurs at directory file sizes more like
    1,000-3,000 blocks)
  • Fix Delete or rename files out of directory

82
LCKQUE.COM Example
Here are examples where a directory file got very
large under 7.1-2 'F11BvAPP2 '
202020202020202032505041762442313146 Files-11
Volume Allocation lock for volume APP2
'F11BsH...' 00000148732442313146 Files-11
File Serialization lock for file 328,,0 on
volume APP2 File specification
DISKAPP2DATA.DIR1 Convert queue 0, Wait
queue 95 'F11BvLOGFILE '
2020202020454C4946474F4C762442313146 Files-11
Volume Allocation lock for volume
LOGFILE 'F11Bs....' 00000A2E732442313146
Files-11 File Serialization lock for file
2606,,0 on volume LOGFILE File
specification DISKLOGFILE000000LOGS.DIR1
Convert queue 0, Wait queue 3891
83
Example Fragmented File Header
  • Symptom High queue length on File Serialization
    Lock for application data file
  • Cause CONVERTs onto disk without sufficient
    contiguous space resulted in highly-fragmented
    files, increasing I/O load on disk array. File
    was so fragmented it had 3 extension file headers
  • Fix Defragment disk, or do an /IMAGE
    Backup/Restore

84
LCKQUE.COM Example
'F11Bs....' 0000000E732442313146 Files-11
File Serialization lock for file 14,,0 on
volume THDATA File specification
DISKTHDATATHOT.IDX1 Convert queue 0, Wait
queue 28 This is an example of the result of
reorganizing RMS indexed files with CONVERTs
over a weekend without enough contiguous free
space available, causing a lot of file
fragmentation, and dramatically increasing
the I/O load on a RAID array on the next busy day
(we had to fix this fragmentation with a
backup/restore cycle soon after). The file shown
here had gotten so fragmented as to have 3
extension file headers. The lock we're queuing
on here is the file serialization lock for this
RMS indexed file.
85
OpenVMS Cluster DLM Resources
  • OpenVMS Documentation on the Web
  • http//h71000.www7.hp.com/doc
  • OpenVMS Cluster Systems
  • Guidelines for OpenVMS Cluster Configurations
  • Book VAXcluster Principles by Roy G. Davis,
    Digital Press, 1993, ISBN 1-55558-112-9

86
Tools Techniques for Solving OpenVMS
Distributed Lock Manager Performance Issues
87
Common Problems
88
Common Problem
  • Unbalanced LOCKDIRWT settings
  • Cluster typically consists of at least 2
    (commonly more) nodes of same or similar model or
    horsepower
  • Typically all cluster nodes run same or similar
    workload, evenly balanced across the cluster
    nodes
  • History of resource tree mastership thrashing
    between nodes in the past
  • LOCKDIRWT is set higher on just one node in the
    cluster
  • It then becomes resource master for all shared
    resource trees
  • Resource tree mastership thrashing problem is
    cured
  • Solution works very well, until the one node gets
    overwhelmed

89
Common ProblemUnbalanced LOCKDIRWT settings
  • Symptoms
  • Perception of slow response times under heavy
    load tendency to fall off performance cliff
  • RWSCS states may show up in SHOW SYSTEM displays
    on nodes with lower LOCKDIRWT
  • MONITOR DLOCK shows mostly outgoing lock requests
    on all but one node one node shows mostly
    incoming lock requests
  • CPU usage (particularly interrupt state) much
    higher on the node with the higher LOCKDIRWT
    setting

90
Common ProblemUnbalanced LOCKDIRWT settings
  • Troubleshooting steps
  • Check for bottlenecks in CPU
  • Check for interrupt-state saturation using
    MONITOR MODES
  • Use PC Sampling to verify source of
    interrupt-state time (lock traffic)
  • Look for causes of locking imbalance among nodes
  • Double-check incoming workload balancing
    mechanisms
  • Use SDAgt LCK SHOW ACTIVE on all nodes or the V6
    Freeware tool LOCK_ACTV.COM to identify resource
    master nodes for the busiest lock trees, and
    their relative activity levels, especially
    looking for cases where the resource master
    nodes lock request rate to the resource tree is
    lower than another nodes rate
  • Examine system parameter values
  • See if LOCKDIRWT parameter has been set to a
    higher value on one node than other nodes and
    that node is now the resource master for many
    lock trees and has become overloaded as a result

91
Common ProblemUnbalanced LOCKDIRWT settings
  • Problem resolution
  • Set LOCKDIRWT to the same value on all nodes to
    allow OpenVMS to move resource tree mastership
    based on relative activity levels then, to
    prevent resource tree mastership thrashing
  • If possible, add one new faster-CPU system to the
    cluster
  • Try biasing incoming workload so 1 node has a
    slightly-higher workload level than the other
    nodes
  • If there are multiple types of workload, try to
    direct a little more of each workload type to a
    different node in the cluster
  • If these measures are insufficient to prevent
    resource tree mastership thrashing, set the PE1
    parameter to a positive, non-zero value (as large
    a value as possible and still prevent thrashing).
    This non-zero setting may only be necessary
    during high-workload periods.
  • Alternatively, upgrade to OpenVMS version 8.3 or
    higher

92
Common Problem
  • Hard-coded PE1 settings
  • Cluster typically consists of at least 2
    (commonly more) nodes of same or similar model or
    horsepower
  • Typically all cluster nodes run same or similar
    workload, evenly balanced across the cluster
    nodes
  • History of resource tree mastership thrashing
    between nodes in the past
  • PE1 is set to a non-zero value on all nodes in
    the cluster
  • Resource master node assignments become fixed in
    place
  • Resource tree mastership thrashing problem is
    cured
  • PE1 may even be set to -1 so as to prevent ALL
    remastering (not just activity-based) in the
    interest of saving even the little
    interrupt-state time spent keeping lock-request
    counts

93
Common ProblemHard-coded PE1 settings
  • Symptoms
  • Perception of slow response times under heavy
    load
  • One or more nodes may be experiencing CPU
    saturation in interrupt state
  • Processes in RWSCS state may often show up in
    SHOW SYSTEM displays on some nodes
  • MONITOR DLOCK shows imbalance of incoming and
    outgoing lock request rates between nodes, and
    fewer local lock requests than normal as a
    fraction of total lock requests
  • SDAgt LCK SHOW ACTIVE or V6 Freeware tool
    LOCK_ACTV.COM show many of busiest resource
    trees are not mastered on the node with the
    highest lock request rate to that resource tree
    (or else these counts are all zero or values
    remain static)

94
Common ProblemHard-coded PE1 settings
  • Troubleshooting steps
  • Check for bottlenecks in CPU
  • Check for interrupt-state saturation using
    MONITOR MODES
  • Use PC Sampling to verify source of
    interrupt-state time (lock traffic)
  • Look for causes of locking imbalance among nodes
  • Double-check incoming workload balancing
    mechanisms
  • Use SDAgt LCK SHOW ACTIVE on all nodes or the V6
    Freeware tool LOCK_ACTV.COM to identify resource
    master nodes for the busiest lock trees, and
    their relative activity levels, especially
    looking for cases where the resource master
    nodes lock request rate to the resource tree is
    lower than another nodes rate
  • (Note these activity counts will be unavailable
    if PE1 is set to a negative value)
  • Examine system parameter values
  • See if PE1 parameter has been set to a non-zero
    value, especially -1 or a very-small positive
    value like 1

95
Common ProblemHard-coded PE1 settings
  • Problem resolution
  • Set PE1 to zero then, to prevent resource tree
    mastership thrashing
  • If possible, add one new faster-CPU system to the
    cluster
  • Try biasing incoming workload so 1 node has a
    slightly-higher workload level than the other
    nodes
  • If there are multiple types of workload, try to
    direct a little more of each workload type to a
    different node in the cluster
  • If these measures are insufficient to prevent
    resource tree mastership thrashing, set the PE1
    parameter to a positive, non-zero value (as large
    a value as possible and still prevent thrashing).
    Setting PE1 to -1 prevents tracking lock-request
    rates and should be a last resort. This non-zero
    setting may only be necessary during
    high-workload periods.
  • Alternatively, upgrade to OpenVMS version 8.3

96
Common ProblemHard-coded PE1 settings
  • Real-life example
  • Two-node cluster for high availability
  • Rdb database run on only 1 of 2 nodes at a time
    (for best performance), with opposite node only
    there for failover
  • Symptoms
  • After failover between nodes, very poor
    performance coupled with high interrupt-state
    time
  • High outgoing lock rate on active node, and
    high incoming lock rate on inactive node
  • PE1 hard-coded to -1 years ago with the intent of
    preventing resource tree mastership thrashing at
    a time when both nodes were active at once
  • In failover process, locks apparently stayed
    around on inactive node for some reason
  • Setting PE1 to 0 solved the problem instantly

97
Common Problem
  • CPU interrupt-state saturation
  • Symptoms
  • A variety of strange performance anomalies
  • May be sporadic or intermittent
  • Detection
  • MONITOR MODES, Availability Manager, other
    performance data collectors (e.g. DECps, ECP/TDC)
  • Be aware of potential data collector blindness
    (resulting in missing data) during
    interrupt-state saturation

98
Common ProblemCPU interrupt-state saturation
  • Contributing factors
  • Lack of Fast_Path support
  • Early OpenVMS versions (e.g. PEDRIVER prior to
    7.3-2)
  • Memory Channel adapters lack Fast_Path support
  • Large cluster node counts, powerful systems, all
    running same application workload (thus sharing
    same resource trees)
  • Nodes can gang up on resource master nodes

99
Common ProblemCPU interrupt-state saturation
  • Mitigation
  • Reduce locking demand if possible
  • Use No-Query locking for read-only access
  • Cache data if possible instead of re-reading it
  • Use Fast_Path to spread CPU interrupt workload
    across CPUs
  • PEDRIVER and LAN adapters together on a separate
    CPU from other I/O devices if using LAN as the
    cluster interconnect
  • Distribute workload more evenly across nodes in a
    cluster
  • Buy faster CPUs

100
Common Problem
  • Using the File System as a Database
  • Symptoms
  • Feeling of slow I/O performance but no detectable
    controller or disk bottlenecks
  • Application design tends to create large numbers
    of separate files. rather than creating records
    within files
  • Directory files may often grow large in size
    (thousands of blocks)
  • Detection
  • Availability Manager (or DECamds) or SDAgt SHOW
    RESOURCE/CONTENTION or V6 Freeware tool
    LCKQUE.COM may show lock contention on volume
    allocation locks (F11Bv) or on file
    serialization locks (F11Bs) for directory files

101
Common ProblemUsing the File System as a
Database
  • Problem resolution / mitigation
  • Redesign application to create records within
    files instead of separate individual files
  • Or spread files around among many
    disks/directories
  • To avoid contention on volume allocation locks,
    spread files and directories across more separate
    volumes
  • To avoid contention on file serialization locks
    on directory files, spread files among more
    separate directory files, and be sure to clear
    files out of directories to prevent directory
    files from growing large

102
Common ProblemUsing the File System as a
Database
  • Real-life example
  • Customer had application which tended to create
    large numbers of files, which it kept track of
    using records in RMS indexed files
  • Had recently upgraded disk subsystem from
    HS-series controllers to EVAs
  • Customer combined directories which had
    previously been spread across 9 HS-series disk
    volumes onto 1 EVA disk volume, based on the fact
    that the EVA internally spreads data across many
    disk spindles, so it could easily handle the
    combined I/O rate of all 9 disk units
  • Result was heavy contention for volume allocation
    lock
  • Mitigation
  • Spread files across multiple volumes again
  • Increased size of disk allocation units to reduce
    rate of requests for volume allocation lock

103
Common Problem
  • Many-node cluster
  • Symptoms
  • Slow response times due to lock-request latencies
  • Cluster consists of many (e.g. 8) small (e.g.
    4-CPU) nodes
  • Often a defensive response to NUMA scaling issues
    on Wildfire (16-CPU GS-160 or 32-CPU GS-320)
    boxes technique was to partition them into 4-CPU
    (QBB) nodes and re-combine the partitions into a
    cluster to get around NUMA performance problems
    across QBB boundaries

104
Common ProblemMany-node cluster
  • Problem definition
  • Local lock requests are fastest
  • Roughly 10-20 times higher CPU cost for remote
    requests
  • Roughly 50 times higher elapsed time for remote
    requests
  • Ideal case for lock-request latency is a single
    SMP box, but
  • High Availability implies you need at least 2
    nodes
  • The more nodes in a cluster, the higher the
    probability of remote lock requests

105
Common ProblemMany-node cluster
106
Unusual Problems
107
Unusual Problem 1
  • Symptoms
  • Intermittent pauses of 10 seconds to a minute in
    application activity across the entire cluster
  • MONITOR CLUSTER shows basically all disk I/O,
    locking, and CPU drops to next to nothing during
    these pauses
  • Problem occurs only during heavy workload
    periods. Problem recurs about every 2 to 5
    minutes for an hour or two.
  • Problem started about the time the 2nd Alpha 8400
    was added to a cluster of 3 VAX 7000-700s and 1
    VAX 7000-800. Problem only gets worse and worse
    as more 8400s are added to the cluster.
  • VMS version is 7.1-2

108
Unusual Problem 1
  • Troubleshooting steps
  • Check for traditional bottlenecks in CPU, memory,
    I/O.
  • DECps Analysis Report indicated nothing unusual
  • Memory utilization was reasonable at 70, and
    stayed the same during the pauses
  • CPU, I/O, and locking rates all went to
    essentially zero during the pauses, so obviously
    CPU usage was not the bottleneck
Write a Comment
User Comments (0)
About PowerShow.com