Title: Advanced Network Management Introduction and Background
1Remote Network Monitoring (RMON)
Mani Subramanian Network Management Principles
and practice, Addison-Wesley, 2000.
2Outline
- Basic Concepts
- RMON Goals
- Control of Remote Monitors
- Multiple Managers
- Table Management
- Statistics group
- History group
- Host and hostTopN groups
- Matrix group
- Alarm group
- Filter and packet Capture group
3Basic Concepts
- Extends the SNMP functionality without changing
the protocol - Allows the monitoring of remote networks
(inter-network management) - MAC-layer (layer 2 in OSI) monitoring
- Defines a Remote MONitoring (RMON) MIB that
supplements MIB-II - with MIB-II, the manager can obtain information
on individual devices only - with RMON MIB, the manager can obtain information
on the LAN as a whole
4Basic Concepts
- called network monitors, analyzers or probes
- A monitor generally can produce summary
information on - error statistics, e.g., counts of collisions on
a LAN - Performance statistics packets delivered per
second, packet size distribution, etc. - A monitor also can store packets for later
analysis - A Monitor may also filter data to limit the
packets counted or captured - filter based on packet type or characteristics
(e.g., packets with certain source address,
erroneous packets)
5Basic Concepts
- A Monitor is required per subnetwork
- A monitor could either be a standalone device
whose only job is monitoring and traffic analysis
- or it could also be a device with other
functionalities (e.g., router, server) - A monitor usually communicates with one (or more)
central MS - RMON essentially is a definition of a MIB
- Standard monitoring functions and interfaces for
communication between SNMP consoles and remote
monitors
6RMON Goals
- Monitoring subnetwork-wide behavior while
reducing the burden on agents and managers - Monitors and analyzes locally and relays data
- Continuous off-line monitoring in the presence of
failures - RMON should collect fault, performance, and
configuration information continuously even when
it is not being polled ? save communication cost - This information may be retrieved later by a
manager - Proactive monitoring
- Continuously runs diagnostics and store network
performance even in the absence of failures - Upon a failure, notify the manager and provide
him with useful info to be able to diagnose the
fault
7RMON Goals
- Provide value-added data
- Perform analysis on collected data, thus
relieving the MS from this responsibility - Support multiple managers
- Multiple managers improves reliability, provides
diversity in network management, etc. - A monitor should be configured to deal with more
than a manager simultaneously
8Network with RMONs
9Control of RMON- Configuration
- RMON is configured for data collection
- RMON MIB contains a number of functional groups
- Each group may contain one or more control
tables and one or more data tables - Control tables (read-write) contain parameters
describing data in data tables (read-only)
- A NMS sets appropriate control parameters to
configure RMON to collect the desired data - The parameters are set by adding a new row to the
control table or by modifying an existing row - As information is collected, data is stored in
rows of the corresponding data table
10Control of RMON- Configuration
- Functions performed by a monitor are defined and
implemented in terms of table rows - Control table may contain objects that specify
the source of data to be collected, the type
of data, the collection timing, etc. - Associated with a single control row are one ore
more rows in one or more data tables
- To modify a particular data collection function
- it is necessary first to invalidate the control
row - this causes the deletion of that row and the
deletion of all associated rows in data tables - NMS can create a new control row with the
modified parameters - NOTE when a row of a control table is deleted,
associated rows in data tables are also deleted.
11Multiple Managers
- RMON probe may be subject to management from
multiple MSs - Potential conflict and unwanted results
- Simultaneous requests for resources could exceed
the capability of the monitor - Monitor resources could be captured by a MS for a
long time, preventing other MSs from accessing
desired information - Resources could be assigned to a MS that crashes
without releasing resources - Avoidance and resolution features are required
- Ownership label identifies the owner of a
particular row of the control table and
associated function
12Multiple Managers
- RMON suggests that ownership label contains one
or more of - IP address, management station name, network
managers name, location or phone number - The ownership label can be used in the following
ways - A MS may recognize resources it owns and no
longer needs - A network operator can identify the MS that owns
a particular resource and negotiate its release - A network operator may have the authority
unilaterally to free resources - A MS after experiencing failure or
re-initialization can recognize resources it had
reserved in the past and free those it no longer
needs - NOTE
- A row in a control table should only then be
altered by its owner and read by other MSs.
13Multiple Managers
- Resource sharing to improve efficiency
- If a certain management function has been defined
by some MS, another MS can share its usage by
observing the associated read-only data rows
(see EntryStatus definition) - However, the MS that owns this control row may
modify or delete the row at any time (and hence
the associated data rows) - Monitors default functions
- These are monitoring functions owned by the
monitor itself - By convention, such ownership labels start with
monitor - A MS can make use of such resources in a
read-only fashion
14Table Management
- The RMON specification includes a set of textual
conventions and procedural rules for row addition
and deletion - Textual conventions 2 new data types
- OwnerString DisplayString
- EntryStatus INTEGER
- valid (1),
- createRequest (2),
- underCreation (3),
- invalid (4)
-
15Control Table
16Data Table
17Control and Data Table- Example
18Row Addition and deletion
- Multiple managers attempt for row addition
- multiple requests to create a row with same
parameters, including index parameters ? conflict - Conflict arbitration is required
- Only the first request is awarded
- Row Deletion
- is achieved by (the owner) setting the status
object for that row to invalid - Row Modification
- is achieved by first invalidating the row and
then adding the row with new object instance
values
- A MS uses SNMP messages to add a row into an RMON
table - SetRequest-PDU message will contain a list of
object identifiers for all columns in the table - When a monitor receives a request
- it must check whether there are any restrictions
defined in the RMON MIB (object is not currently
supported by the MIB) - or any implementation specific restrictions
(e.g., lack of resources) - If row addition is not possible
- GetResponse-PDU with badValue error is returned
19RMON MIB
10 groups
- Each group is used to store data and
- statistics derived from data collected by
- the monitor
- A monitor may have more than one
- physical interface and hence may be
- connected to more than one sub-network
20Statistics Group
- Basic statistics for each monitored subnetwork
- A single table with one entry for each
interface - Variety of counts for each subnetwork, such as
bytes, packets, errors, frame sizes, etc. - Provides useful information about the load on a
subnetwork and its health (counts collisions,
etc..)
21History Group
- Sampling function for one or more of the
interfaces of the monitor - historyControlTable specifies the interface and
details of the sampling function - etherHistoryTable records data
- historyControlTable defines a set of samples at a
particular sampling interval for a particular
interface
- historyControlIndex identifies a row in
the control table - historyControlDataSource identifies interface or
subnetwork that is source of data - historyControlBucketsRequestedrequested
sampling intervals over which data is saved in
the data table (default value 50) - historyControlBucketsGranted actual sampling
intervals over which data will be saved - historyControlInterval interval in seconds
over which data is sampled (default value 1800
seconds (30 minutes))
22History Group
etherHistoryIndex
etherHistorySampleIndex
1
x1
1
x2
1
x3
1
xB1
2
y1
2
y2
yB2
2
etherHistoryTable
23History Group
- etherHistoryTable
- etherHistoryIndex the history of
which this entry is part (index) - etherHistorySampleIndex identifies the
particular sample among all samples associated
with the same row in control table - Table contains also some useful counters
- etherStatsOctets of received octets of data
- etherStatsPkts of received packets, etc
- Subnetwork utilization
- ? medium data rate (bps)
- T sampling interval (seconds)
- Pkts etherStatsPkts 2 -
etherStatsPkts 1 - Octets etherStatsOctets2 -
etherStatsOctets 1 - ? utilization
NOTE 64-bit preamble, and 96-bit IFG
24History Group
- For a given subnetwork, historyControlDataSource,
more than one sampling process is allowed at
different sampling period historyControlInterval - Sampling over short period (e.g. 30s) enables the
monitor to detect sudden changes in traffic
pattern - Sampling over long periods (e.g., 30 minutes)
enables a monitor to observe the steady state
behavior of certain interface - After each sampling interval, the monitor adds a
new row to the etherHistoryTable with the same
etherHistoryIndex - When the rows of a history becomes equal to
historyControlBucketsGranted, as each new row is
added, the oldest row associated with this
history is deleted. circular buffer
25History Group
histroyControlTable
historyControl- DataSource
historyControl- BucketsGranted
historyControl- Index
historyControl- Interval
B1
I1
1
D1
etherHistoryTable
etherHistorySampleIndex
etherHistoryIndex
1
x1
1
x2
26History Group
histroyControlTable
historyControl- DataSource
historyControl- BucketsGranted
historyControl- Index
historyControl- Interval
B1
I1
1
D1
etherHistoryTable
etherHistorySampleIndex
etherHistoryIndex
1
x1
1
x2
Oldest entry (sample) is deleted
27host and hostTopN Groups
- host Group
- Gather statistics about specific hosts on the LAN
- hostInPkts, hostOutPkts, etc..
- By observing s-d MAC addresses in monitored
packets, a monitor can discover new attached
hosts on the LAN - hostTopN Group
- To maintain statistics about the set of hosts on
one subnetwork that top a list based on some
parameter - List of the 10 hosts that transmitted the most
data during a particular day - List of nodes ordered according to errors theyve
sent in the last hour
28Matrix Group
- matrixControlTable
- matrixControlIndex integer uniquely identifies a
row. - matrixControlDataSourceInterface that is source
of traffic - matrixControlTableSize of rows in data table
(matrixSDTable) associated with this row - matrixSDTable
- store statistics on traffic from a source to
multiple destinations - matrixSDSourceAddress MAC address of source
- matrixSDDestAddress MAC address of destination
- matrixSDPckts packets transmitted from s- to
d- - matrixSDOctets octets in packets transmitted
from s- to d-
- Record information about traffic between pairs of
hosts on a subnetwork - error and utilization, e.g. traffic amount,
number of errors - Information is stored in the form of a matrix
- so the operator can retrieve information for any
pair of network addresses, e.g., to find which
devices are making the most use of a server
29alarm Group
- Measuring network performance consists of
identifying abnormal conditions by the monitor
and issuing alarms accordingly - e.g., if there are more than 200 CRC errors (the
threshold) in any 5-minute period (the sampling
interval), an alarm is generated and sent to the
central console. - Alarm group contains a single table alarmTable,
each entry - a variable to be monitored (alarmVariable)
- INTEGER, counter, gauge, TimeTicks
- A sampling interval (alarmInterval)
- most recent sampled value (alarmValue)
- Threshold parameters
- alarmRisingThreshold, and alarmFallingThreshold
- alarmStartupAlarm
- alarm is generated when a row becomes active and
1st sampled value ? risingThreshold, or ?
fallingThreshold or both
30alarm Group
- Mode of operation
- Rising threshold (RT) and Falling threshold (FT)
are defined - RT is crossed when current sampled value is
greater than RT and value of last sampling
interval was less than threshold - FT is crossed when current sampled value is less
than FT and value of last sampling interval was
greater than threshold - absoluteValue and deltaValue (difference of 2
successive intervals). Counter ? use deltaValue
Sampled Object value
Rising threshold
Falling threshold
Time
31filter Group
- Observing only selected packets on a particular
interface - Data filter
- Screen observed packets based on a bit pattern
that a portion of the packet matches (or fails to
match) - Status filter
- Screen observed packets based on their status
(e.g., valid, CRC errors, etc.) - Example screen those packets on some interface
with certain source MAC address!
- The monitor may capture packets that pass the
filter or simply record statistics based on such
packets - Both filters can be combined to form a complex
test to be applied to incoming packets - filter test example we wish to accept all
Ethernet packets with destination address 0xA5
and that do not have a source address of 0xBB!
capture Group
32event Group
- eventTable
- eventDescritpion textual description of the
event - eventType none(1), log(2), snmp-trap (3)
log-and-trap(4) - log an entry is added to the logTable for this
event - snmp-trap an SNMP trap is sent to a MS
- eventCommunity identifies the communities of MSs
to receive the SNMP trap, etc. - logTable
- logTime value of sysUpTime when this log entry
was created - logDescription description of the event that
activated this entry (implementation-dependent) - logEventIndex the event that generated this log
entry
- Supports definition of events (problems, symptoms
of problems) - An event is triggered by a condition located
elsewhere in the MIB - E.g., monitoring a variable that crossed a rising
threshold would cause an event to be generated - Controls the generation and notification of
events - An event may cause an SNMP trap message to be
issued by the monitor
33RMON2
- Enable probes to look beyond LAN segments
- Analyze traffic passing through the router to
determine the ultimate source and destination - Monitor application level traffic (e-mails, file
transfer, WWW, etc.)