Experiment: Step by Step - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Experiment: Step by Step

Description:

IP addresses, types (SOCC node/radar node), etc. Commands to ... Grabs two ports: 49162 - to communicate with LMMs. 8888 - to communicate with RAPIDS clients ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 16
Provided by: annabek
Category:
Tags: experiment | grabs | step

less

Transcript and Presenter's Notes

Title: Experiment: Step by Step


1
Experiment Step by Step
  • Author Anna Bekkerman
  • abekkerm_at_ecs.umass.edu

2
Setup
Node
Client
LMM
Control signals
Node
Server
Data
Target system
LMM
Data
Node
LMM
Client
3
Configuration File
  • Describes an experiment
  • Nodes
  • IP addresses, types (SOCC node/radar node), etc.
  • Commands to start/stop involved processes
  • Collected metrics (CPU/memory utilization, etc.)
  • Monitored processes
  • Net control parameters
  • Delays, drop rates
  • Refresh rates

4
Start LMMs
  • When started, RAPIDS server
  • Grabs two ports
  • 49162 - to communicate with LMMs
  • 8888 - to communicate with RAPIDS clients
  • Reads a configuration file
  • Starts LMMs on all nodes through SSH connections
  • Waits for ack signals from all LMMs
  • Starts setting LMMs up according to the
    configuration file

FIXME Server will wait indefinitely for the acks
from all LMMs. A time-out mechanism should
be introduced.
5
Set LMMs Up
  • Home-made protocol is used to set up LMM
    parameters
  • Examples of commands sent from the server to
    LMMs
  • STM set metric
  • STP set monitored process
  • STE set start-up command
  • STT start
  • SPP stop
  • When a parameter is set, LMM sends an ack signal
    back to the server
  • At the end of each step, server waits for acks
    from all LMMs

6
Start Monitoring
  • When LMM receives the start command
  • If needed, network control application is started
  • Network control application runs only if
    iptables are turned on.
  • iptables select IP packets (as specified in
    iptables rules) and queue them for processing by
    the application.
  • The application introduces delays and/or drops
    packets according to the settings in the
    configuration file.

7
Start Monitoring
  • When LMM receives the start command
  • If needed, network control application is started
  • RAPIDS Message Queues (RMQ) are initialized
  • A mechanism used for communication between
    RAPIDS and monitored applications.
  • See more in the RMQ section.

8
Start Monitoring
  • When LMM receives the start command
  • If needed, network control application is started
  • RAPIDS Message Queues (RMQ) are initialized
  • Heartbeat applications are started
  • Send Im alive signals from radar nodes to
    SOCC nodes.
  • If a signal has not been received, RAPIDS
    reports link failure.
  • FIXME Timeout mechanism should be added to
    minimize false alarms.

9
Start Monitoring
  • When LMM receives the start command
  • If needed, network control application is started
  • RAPIDS Message Queues (RMQ) are initialized
  • Heartbeat applications are started
  • Processes are started
  • Commands are specified by user in the
    configuration file

10
Start Monitoring
  • When LMM receives the start command
  • If needed, network control application is started
  • RAPIDS Message Queues (RMQ) are initialized
  • Heartbeat applications are started
  • Processes are started
  • Commands are specified by user in the
    configuration file
  • Collection sessions are started every t seconds
  • According to the refresh rates provided by user
    in the configuration file

11
Collection Session
  • During each collection session LMM
  • Collects metrics
  • Reads events accumulated in RMQ
  • Sends the metrics and events to the RAPIDS server
  • More details in the LMM section

12
Stop Monitoring
  • When the server is stopped, it sends stop
    commands to all LMMs
  • Upon receiving the stop signal, LMM
  • Stops launching collection sessions
  • Stops processes
  • Using the commands specified by user in the
    configuration file
  • Heartbeat applications are stopped
  • RMQ is deleted
  • Network control applications are stopped

13
What Might Go Wrong?
  • When the server is stopped, it sends stop
    commands to all LMMs
  • Upon receiving the stop signal, LMM
  • Stops launching collection sessions
  • Stops processes
  • Using the commands specified by user in the
    configuration file
  • Heartbeat applications are stopped
  • RMQ is deleted
  • Network control applications are stopped

If untrappable signals (SIGKILL and SIGSTOP)
are used to kill the server, the shut-down
procedures will not be executed!
14
What Might Go Wrong?
  • If commands provided by user do not stop all
    processes, LMM will hang waiting for their
    termination.
  • While an LMM is hanging the port used for
    communication with the server remains unreleased,
    which means that the new experiment cannot be
    started until LMMs are stopped and all necessary
    clean-up procedures have been completed.
  • When the server is stopped, it sends stop
    commands to all LMMs
  • Upon receiving the stop signal, LMM
  • Stops launching collection sessions
  • Stops processes
  • Using the commands specified by user in the
    configuration file
  • Heartbeat applications are stopped
  • RMQ is deleted
  • Network control applications are stopped

15
What Might Go Wrong?
  • When the server is stopped, it sends stop
    commands to all LMMs
  • Upon receiving the stop signal, LMM
  • Stops launching collection sessions
  • Stops processes
  • Using the commands specified by user in the
    configuration file
  • Heartbeat applications are stopped
  • RMQ is deleted
  • Network control applications are stopped
  • FIXME
  • These applications do not always react to the
    termination signal properly.
  • Symptom sometimes a number of zombie processes
    appear
Write a Comment
User Comments (0)
About PowerShow.com