Implementing the monitoring system for the Grid application traffic - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Implementing the monitoring system for the Grid application traffic

Description:

... of a grid network is the probability that a grid application program which ... NPR (Network Path Protection Ratio) RTV = tv/tn. tn : traffic volume before failure ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 35
Provided by: gridfor
Category:

less

Transcript and Presenter's Notes

Title: Implementing the monitoring system for the Grid application traffic


1
Implementing the monitoring system for the Grid
application traffic
  • Tai M. Chung
  • School of Information Communication
    Engineering,
  • Sungkyunkwan Univ.
  • 300 Cheoncheon-dong, Jangan-gu, Suwon-si,
    Gyeonggi-do, Korea.
  • Tel 82-31-290-7131, Fax 82-31-299-6673
  • tmchung_at_ece.skku.ac.kr

2
Contents
  • Objective of Research
  • Activities
  • Plan Result
  • Analysis of Monitoring Methods
  • Grid Application Measurement Factors
  • Implementation of the monitoring system

3
Objective of Research
  • Research of grid network applications
    measurement methods
  • Kernel level monitoring
  • Application level monitoring
  • Implementation of grid application monitoring
    systems
  • Design of grid application monitoring systems
  • Define the metrics for grid network applications
  • Implementation of metrics for grid network
    applications
  • Implementation of UI for grid application
    monitoring systems
  • Research of standardization for grid application
    traffic monitoring
  • Suggestion of standard methods to measure the
    grid application traffic
  • Contribution to the global grid application
    monitoring activity for standardization

4
Activities
Analysis Preparation for Implementation
A Study on the Methodology of the Grid
Application Traffic Monitoring
Analysis of the metrics for the grid application
traffic
Analysis of the Performance Measurement Mechanism
On the Kernel Level
On the Application Level
A Selection of the Performance Measurement
Mechanisms and Analysis of the Metrics for Grid
application traffic
A Design of the Grid Application Traffic
Monitoring System and Web-based Management System
Implementation
Implementation of the Grid Application
Performance Measurement Module
Implementation of the Grid Service Interface
Implementation of the Web- based GUI for the
Grid Application Traffic Monitoring
Test Debugging
Test Debugging
A Research of Standardization for Grid
Application Traffic Monitoring and Basic Survey
for the Application Performance Tuning
Standardization Activity
5
Plan Result
6
Analysis of Monitoring Methods
Whitch is the Better One?
Monitoring Tool
Request (Monitoring Information)
Usage CPU, Memory
Provide
  • User level network monitoring using Libpcap

Kernel
  • Kernel Level network monitoring

7
User Level Network Monitoring Using Lipcap
  • What is Libpcap?
  • the Packet Capture library provides a high
    level interface to packet capture systems
  • TCP Header information, IP Header information,
    UDP Header information
  • Merits and demerits of Libpcap
  • merits easy to develop platform independent
    network monitoring applications
  • demerits packet loss can be occur on heavy
    network load

8
Kernel Level network monitoring (1/2)
  • Monitoring /proc filesystem (in Linux)
  • proc filesystem It is used to inform easily
    to system user about various kinds of data
    structure that kernel has
  • kernel tuning can be easily achieved by simply
    modifying each files in the /proc filesystem
  • using network parameter of kernel in the /proc
    filesystem
  • Opened network socket information per application
    could be obtained
  • Using LibKVM (Kernel Virtual Memory access
    library)
  • kvm_open, kvm_nlist, kvm_read and etc.
  • can be used to access directly through /dev/kmem
    device and access to the kernel data structure

Monitoring Tool
Kernel
9
Kernel Level network monitoring (2/2)
  • Network related modules in kernel
  • Netfilter layer
  • framework inside the Linux 2.4.x kernel which
    enables packet filtering, network address
    translation (NAT) and other packet mangling
  • stateful packet filtering (connection tracking)
  • all kinds of network address translation
  • large number of additional features as patches
  • with using ip_conntrack module It is possible
    to monitor network information per connection
  • Merits and demerits of Kernel Level network
    monitoring
  • Merits less system load and monitoring latency
    than using Libpcap
  • Demerits kernel patch is somewhat difficult and
    dangerous job

Monitoring Tool
Kernel
10
Grid Application Measurement Factors
  • Grid network measurement parameter
  • Bandwidth
  • Delay
  • Jitter
  • Loss
  • Grid Application Performance Characteristic
  • Reliability
  • Availability
  • Survivability
  • Closeness

GRID Application
11
Grid network measurement parameter
  • Bandwidth
  • Bandwidth is defined most generally as data per
    unit time
  • Available Bandwidth max amount of data per
    time unit that a hop or path can provide given
    the current utilization
  • Delay
  • The time between when the first part of an
    object passes an observational position and the
    time the last part of that object or related
    object passes
  • Jitter
  • The variation in the one-way delay of packets
  • Important in sizing playout buffers for
    applications requiring regular delivery of
    packets
  • Loss
  • One-way Loss
  • Roundtrip Loss

GRID Network
12
Grid Application Performance Characteristic
  • Grid Performance events required
  • System info
  • cpu, available memory, available free storage,
    network utilization, how many clients can
    connect, failure rate, available disk
  • Data information
  • data type, size, current location
  • Data access info
  • read/write frequency, duration, size, user info
  • Network info
  • Bandwidth, latency, RTT, Packet Loss
  • Grid Application Performance Characteristic
  • Reliability
  • Availability
  • Survivability
  • Closeness

GRID Application
13
Reliability (1/3)
  • The probability that it is functioning properly
    and constantly over a fixed time period
  • The reliability of a grid network is the
    probability that a grid application program which
    runs on multiple processing elements and needs to
    communicate with other processing elements
  • The reliability varies according to the
    conditions of network (retransmission rate,
    tcplistendrop rate), accessibility of network,
    and TCP packets loss rate
  • Conditions of network
  • Retransmission is caused by not receiving ACKs
    fast enough and this is why bad network hardware
    or a congested route
  • Retransmission rate tcpRetransBytes /
    tcpOutDataBytes
  • ListenDrop counts the number of times that a
    connection was dropped due to a full listen queue
  • ListenDrop rate tcpListenDrop / t

14
Reliability (2/3)
  • Accessibility of network
  • Network accessibility is defined as the measure
    of the capacity of a location to be reached by,
    or to reach different locations
  • Accessibility rate (icmp input destination
    unreachable icmp output destination
    unreachable) / (icmp input packets icmp output
    packets)
  • TCP Loss
  • Packet loss describes an error condition in which
    data packets appear to be transmitted correctly
    at one end of a connection, but never arrive at
    the other
  • TCPLoss rate TCPLoss / TCP packets

15
Reliability (3/3) - Reliability Measurement
Identify the available servers of grid network to
run the grid application from the GT3
Step 1
Calculate the network accessibility for the
servers using in grid network to run the grid
application
Step 2
Calculate the tcp loss for all the servers using
in grid network
Step 3
Calculate the network condition in grid network
Step 5
Calculate the system failure rate ?, and system
repair rate µ 1-? (? accessibility rate
network condition rate tcp loss rate)
Step 6
Calculate the grid service reliability in each
server (Reliability µ/(µ?))
Step 7
Let the grid service reliability is R. Then we
can calculate the average, and variance The mean
of grid service reliability is
(for 0 lt i m, i is the number of
servers) The variance of grid service reliability
is (for 0 lt i m, i is the number of servers)
16
Availability (1/2)
  • Network availability means the ability of rapid
    recovery in case of network failure
  • On detection of three dupacks, packet loss is
    assumed and the sender halves congestion window
    size
  • If congestion occurs, let time Tt the time in
    which recover the window size before the
    congestion occurs, let time Tf the time in
    which congestion occurs
  • MTTR Tt Tf
  • MTTF the execution time of grid application
    MTTR
  • ? congestion occur rate 1/MTTR
  • µ congestion repair rate 1/MTTF
  • Network availability MTTF/(MTTFMTTR)

17
Availability (2/2)
18
Survivability (1/5)
  • Capability to provide a prescribed level of QoS
    for existing services after a given number of
    failures occur within the network
  • Property of a network to be resilient to failure
  • Use to describe the available performance of a
    network after a failure
  • Measures the degree of functionality remaining in
    a system after a failure and consists of
    evaluating metrics which quantify network
    performance during failure as well as normal
    operation
  • Monitoring Data Needs for Survivability Guarantee
  • determine the optimal resources for a application
    job
  • applications could use monitoring data to adapt
    themselves to the current situation
  • Fault detection and analysis
  • monitoring data is used to determine faults in
    system components and applications
  • monitoring data could also be used to find the
    cause of the faults

19
Survivability (2/5)
  • Survivability is enhanced by
  • Security techniques where applicable
  • Redundancy, diversity, general trust validation
  • Automated recovery support
  • Strategies for Survivability Guarantee

Network Service View
  • Network Restoration
  • Network Protection
  • Hardware Duplication
  • Software Fault Tolerance
  • Link/Site Diversity
  • Provisioning
  • Configurable parameters

Mitigation/Masking Strategies
  • Design Centering
  • Software Modularity
  • Physical/GUI Desing
  • Traffic Robustness
  • Environmental Robustness
  • Site Location/Integrity

Prevention Strategies
Network
  • Technology Failures
  • Operational activities
  • Procedural errors
  • Traffic overloads
  • Environmental incidents

Failure Events
20
Survivability (3/5) - Survivability Measurement
Factors
  • RTV (Residual Traffic Volume)
  • NPR (Network Path Protection Ratio)

RTV tv/tn tn traffic volume before failure tv
traffic volume after failure
Path protection Ratio wi Path i capacity ki
possible alternate path capacity
capacity(bits) bandwidth(bits/sec) round-trip
time(sec)
21
Survivability (4/5)
  • Resource Reallocation Mechanism After
    Survivability Assessment

Resource Reallocation Mechanism After
Survivability Assessment
1. Monitoring Resource Creation
7. Survivability Assessment Result Reporting
8. Node Path Change Request
2. Performance Measurement Data Collection for
Resource
3. Survivability Assessment
6. Survivability Assessment Result Collection
5. Resource Reallocation Accomplishment
Registry
Grid Application Execution Nodes Path Change
Grid Application Execution Environment (
OS/HW/Storage etc.)
22
Survivability (5/5)
  • Reallocatin Algorithm for Survivability
  • TPU average ProcessorUsage ()
  • TPUEssential ProcessUsage()
  • GPUGeneral ProcessUsage()

Survivability Assessment Resource Reallocation
Algorithm
Grid Application Resource Utilization Measurement
No
Total Utilization Datum Excess
Yes
Essential Service Utilization Datum Excess
Yes
Recovery Resource Reallocation
No
Service Resource Recovery Available Compromise
Resource Utilization Re-measurement
No
Essential Service Utilization Increase
Yes
Forced Exit of Service Available Compromise
23
Network Closeness (1/2)
  • A measure of the degree to which a node is
    adjacent to or can reach others in a network.
  • Closeness is usually measured by the number of
    steps it takes to reach others.
  • Network closeness is based on path-capacity
    measurements and hop counts.
  • Closeness Measurement Factors
  • Round Trip Time
  • Packet loss frequency
  • Throughput

24
Network Closeness (2/2)
  • Validity Assessment for Closeness

r Round Trip Time, Rmax max RTT, ploss
packet loss frequency th throughput thmax
maximum throughput
  • a interval 0,1
  • Closeness Measurement Data Dependence
  • RTT throughput Factors
  • - if closer a to 1 gt the more dependent is
    Closeness on throughput
  • - if closer a to 0 gt the more dependent is
    Closeness on RTT

25
Implementation of the monitoring system
  • Develop Environment
  • Design spec. for Linux kernel based Information
    Collector
  • Kernel based network information gathering
    mechanism
  • Information Gathering Mechanism
  • Components of Information Collector
  • Information Collector daemon
  • Design spec. for GA-NMS Web Service
  • Example of GA-NMS protocol
  • Service Architecture
  • Examples of Implementation

26
Develop Environment
  • Hardware platform
  • CPU Intel Pentium III 600MHz
  • Memory 192MB
  • Disk 6.1GB, 3.1GB
  • Operating System
  • REDHAT Linux 7.3
  • Kernel 2.4.19
  • Running Environment
  • UNIX C
  • Kernel Module
  • Information Collector Module
  • JAVA (Jakarta Tomcat, WSDP(Web Service
    Development Pack))
  • Information Provider Web Service

27
Design spec. for Linux kernel based Information
Collector (1/4)
Kernel based network information gathering
mechanism
28
Design spec. for Linux kernel based Information
Collector (2/4) - Information Gathering Mechanism
  • 1. Hooking
  • It replaces the existing protocol stack logic
    that gathers network related information in the
    abstract with the logic that gathers network
    related information in detail (ex End-to-End
    bandwidth)
  • 2-1. Information gathering using kernel module
  • It gathers information from protocol stack
    hooking layer
  • Protocol stack hooking layer hooks each protocol
    stack and stores network related information
    after processing into user-readable format
  • 2-2. Information gathering using kernel memory
    interface
  • Not completely supported on Linux
  • On common Unix environment an interface is
    supported that user can access to the kernel data
    through it (ex Kernel Virtual Memory interface
    library (KVM))
  • 3. Data accumulating
  • Kernel module stores data into the filesystem
    that can used by user at user level
  • By using ProcFS (Process information
    pseudo-Filesystem) we can reduce the load that
    should be occurred by using real filesystems
  • 4. Information processing
  • A process that user application reads network
    monitoring parameters from ProcFS and processes
    them as network parameter for Grid applications

29
Design spec. for Linux kernel based Information
Collector (3/4) - Components of Information
Collector
  • Protocol stack hooking layer
  • It uses Netfilter Layer that is supported on the
    Linux kernel 2.4.X to 2.6.X.
  • Netfilter layer supports to hook in the protocol
    stack by using user supplementable functions. It
    does not modify the protocol stack code, so it
    can process information that kernel uses without
    modification of original kernel data
  • Kernel module
  • It is based on the ip_conntrack kernel module
    supplied by Netfilter layer.
  • Some codes are added and modified to gather and
    process user specific network parameter in detail
  • Information Collector daemon
  • It is a daemon that processes the network
    related information in the ProcFS
  • It encodes gathered informations with XML scheme
    and send to the Web Service application

30
Design spec. for Linux kernel based Information
Collector (4/4)
Information Collector daemon
31
Design spec. for GA-NMS Web Service (1/3)
  • Definition of Service
  • Grid Application Network Monitoring Service
    (GA-NMS) supplies network monitoring parameters
    that are useful for Grid Applications in the Grid
    network
  • Messaging Protocol
  • It uses XML (eXtensible Markup Language) and SOAP
    (Simple Object Access Protocol) to communicate
    with each services
  • Service Platform Specification
  • Service Platform
  • JAVA WSDP (Web Services Developer Pack) JAXM
    (Java API for XML Messaging) / JAVA
  • Information Collector
  • Linux Kernel module / C Language
  • Site Platform
  • Tomcat, Globus Toolkit 3.0 / JAVA (JSP)

32
Design spec. for GA-NMS Web Service (2/3)
Example of GA-NMS protocol
33
Design spec. for GA-NMS Web Service (3/3)
Service Architecture
34
Examples of Implementation
Main View
Statistics View
Write a Comment
User Comments (0)
About PowerShow.com