Title: OpenBSI Trouble Shooting
1OpenBSI Trouble Shooting
- Presented By Bob Findley
- Director of Marketing and Business Development
- Written By Steve Hill
- Director, Software Product Business Development
- Email shill_at_bristolbabcock.com
- Phone 860-945-2501
-
2Basic BSAP primer
- Polled protocol created to maximize bandwidth and
allow multilevel data access for difficult
communication environments basics of SCADA -
Host PC
3As found in Manual 5080 Std BSAP
4Immediate Response
5OpenBSI Trouble Shooting - Introduction
- Messages, Buffers and Wait packets
- Ten Most Common Problems examined
- Starting a System up
- Where to start when a system breaks
6What is a message?
A BSAP Message is like a package you send it
somewhere via the network
7What is a buffer?
A Buffer is like a shelf that can only fit one
BSAP message
8Where are the buffers?
- Every depot on the network must have storage
buffers - They are in the PCs
- And they are in the RTUs
- Messages are kept in buffers when they are not in
transit - In 33xx and ControlWave, you must specify how
many you need.
9What is a wait packet?
- A wait packet is like the airway bill receipt.
You must hang on to it until you know the message
has arrived ok. - There is one in use for every message
outstanding on the network
10What happens when theres a problem?
- If the network is down, or slow, messages start
to build up in buffers, waiting for pickup - We will run out of buffers
11or the destination is unavailable?
- If the destination is unavailable, our wait
packets will build up at the PC
12Wait packets and buffers in use
are controlled by the message timeout. Both are
discarded when it expires.
13Buffers and Wait Packets - Summary
- A buffer is a piece of memory that holds a
message waiting to go somewhere. - If the network is busy, or large, they have to
wait a while before being sent so they are
queued in buffers. - A wait packet is a piece of memory that holds the
information necessary to process a reply - If the network is slow, then a reply takes a long
time so theres a lot of wait packets in use. - When a message times out, the wait packet is
discarded. - If theres not enough buffers or wait packets,
cant communicate with healthy RTUs - Symptom timeouts, RTUs going dead in HMI
14Top Ten Problems..
151. Not enough Buffers Wait Packets
- There should be at least one buffer for every
message being transmitted at a time. No harm in
having too many! - Recommend (tags/10) i.e. 2000 tags 200
buffers, but will depend on HMI Software - Consider worst case - of messages sent at
startup. - Recommend wait packets at least buffers x 2
- Look at whats going on in Netview-gtMonitor
- Configured in the NDF file with text editor
16NDF File Contents
- CONSTANTS
- MESSAGE_EXCHANGES15
- WAIT_PACKETS200
- TOTAL_BUFFERS100
- RTU_BLOCKS100
- GOAL_FREE_BUFFERS30
- RTU_RETRIES4
- DEF_MESSAGE_TIMEOUT45
- DELETE_JOURNAL1
172. Not Enough Buffers in the RTU
- For 33xx, you specify the number of buffers in
the load - You get some, but often not enough by default.
- Increase (NEVER DECREASE) the number when you add
more global signals, alarms etc - Sure sign of not enough buffers is NAKs being
transmitted from the RTU. - Dont overload a Pseudo Slave Port!! you cant
change the number of buffers for this! - Check for NAKS in OpenBSI and Communications
stats from RTU.
183. Message Timeout Too Short
- There are two timeouts. Message Timeout and Link
Level Timeout - Message Timeout is the time for a message to go
from the application (HMI) on the PC, to the RTU
and return - Id recommend it being at least 3x combined poll
period of all levels the message travels through
(e.g. 1st level 5 secs, 2nd level60 secs, then
make it 65x3195 seconds - Better is to look at actual time to turn around.
requires looking at analyzer, DLM or HMI stats
19Message Timeout Too Short
- The longer the message timeout, the more wait
packets you will use, if RTUs die. - If you have too short a timeout, you will see
this in the OpenBSI Journal File if using
Dataview or Harvester.
Wed Dec 01 145942 2004 DATAVIE Wait packet for
message id 0010 not found, msg discarded Wed Dec
01 145948 2004 DATAVIE Wait packet for message
id 0011 not found, msg discarded Wed Dec 01
150002 2004 DATAVIE Wait packet for message id
0013 not found, msg discarded Wed Dec 01 150003
2004 HARVEST Wait packet for message id 0013 not
found, msg discarded Wed Dec 01 150005 2004
DATAVIE Wait packet for message id 002A not
found, msg discarded Wed Dec 01 150008 2004
DATAVIE Wait packet for message id 002B not
found, msg discarded
- Use data view as a test
- Then use OpenBSI Journal Tool to view
204. Message Timeout too short in IP RTU
- An Ethernet RTU contains similar code to OpenBSI
- There is a message timeout it uses when talking
to its slaves - The default is only 30 seconds
- Configure this (and various other parameters)
using the Internet_Protocol Module (33xx) or
System Variable Wizard (ControlWave)
215. Link Level Timeout incorrect
- Link Level timeout is the time expected for an
ACK response from a top level RTU. - If its too short, the message is timed out to
early and then the response tramples on the
next message - If its too long, a single dead RTU will use up
most of your bandwidth. - Use a network analyzer or the DLM to check.
Heres some recommendations - Direct Serial 0.2 - 0.5 Seconds
- Radio 1 Second (depends on configuration,
- Internet (via re-director) lt 1 Second
- Satellite/VSAT/Multi-drop cellular 5 second (seen
9 though) - Configured in Netview, as part of the Line
Properties
226. No Alarms with Serial RTU
- For serial RTUs, make sure you didnt plug into
the Pseudo Slave port! - For all RTUs, use the Alarm Router to check that
you dont have an HMI problem
23(No Transcript)
247. No Alarms with IP RTU
- Check the NHP Address is correct in Netview
- Check the NHP Address is correct in the RTU
- Check you reset (33xx)/power-cycled (CW) the RTU
since last NHP address change
258. Two NHPs fighting for control
- Avoid having two PCs configured as NHPs online
at the same time - Use SCADA Software that ensures one offline
- If you do, make certain they have the same
OpenBSI files (and hence NRT versions) - Make sure only one Server is time synching the
RTUs
269. Accidentally Configuring an RBE Signal
- If theres an RBE signal in the load, the host
will try and communicate with the RBE Module in
Task 0 - If it doesnt respond, a time out occurs
- And the RTU is declared dead
- If using OpenEnterprise, take a look at the
numrbe attributes on nw3000device - If using another HMI, search the .ACC for RBE
2710. Broadcast Storms
- Various Network protocols transmit broadcast
packets - Everyone of these will need to be processed by
the OS on the machines they hit. - In the case of an RTU, this means an interrupt
and no control for a few ms. - The worst offenders are ARP and Netbios
- Use a Network Analyzer to look at the network
- Segment the network switches and routers, and
disable Netbios, or make sure its configured
correctly - Keep RTUs off the corporate network. 8Mhz v 2GHz
is a battle you wont win!!
28Problem with a new Network?
- Add RTUs one at a time
- Check communications after each one
- Keep data requests slow until you have them all
running OK. Speed it up SLOWLY. - Dont ever skip one problem thinking it will go
away - If one appears, go back a step
- Once you have it all working, then speed it up
- But check that it still works with 50 dead RTUs
- Check all log files even when you think its
working
29Something has stopped working where to start?
- If the system WAS working
- what changed?
- Look at the Netview Journal Files
- Check OpenBSI Resources (buffers, wait packets)
- Look outside for physical problems!
- Use a network Analyzer (if IP Network)
- Check your system logs! (you do have them,
right?)
30Configuring DLM logging for OpenBSI
Save this file as BSBSAP.INI in the Windows
Folder to monitor Serial communications to
Bristol RTUs on COM1 DLM ENABLED1 COM1C\COM1-
LOG.TXT
Save this file as BSIPDRV in the Windows Folder
to monitor Ethernet communications to a Bristol
RTU DLM Enabled 1 Filter
120.0.210.46 File C\BSAPIP-DLM.LOG Data_Dump
1 Dump_Data 1
31OpenBSI Troubleshooting
- For further information, please contact
- Steve Hill
- Director, Software Product Business Development
- Email shill_at_bristolbabcock.com
- Phone 860-945-2501
-