Lemon - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Lemon

Description:

Monitoring Repository (a daemon that receives the metrics) ... Daemons: sshd, ntpd, syslogd, friod,... alive. File size of files: /etc/nologin, /afs/cern.ch, ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 11
Provided by: admo6
Category:
Tags: daemons | lemon

less

Transcript and Presenter's Notes

Title: Lemon


1
Lemon
  • Computer Monitoring at CERN
  • Miroslav Siket
  • CERN-IT/FIO-FS

2
Outline
  • Lemon what it is?
  • Structure
  • Functionality
  • Metrics
  • Alarms
  • Web visualization

3
Lemon LHC Era Monitoring
  • Lemon is a software package containing tools for
    monitoring status and performance of the
    computers (currently limited to Linux and Solaris
    OS)
  • Contains following components
  • Sensors (they measure individual metrics
    values)
  • MSA (Monitoring Sensor Agent)
  • Monitoring Repository (a daemon that receives the
    metrics)
  • Monitoring Repository Backend (storage)
  • LRF (Lemon RRD tool framework caching and web
    presentation tools)
  • Correlation Engines
  • Lemon Client (tool for retrieving data)
  • LAG (Laser Alarm Gateway tool for passing
    alarms to Laser system)
  • See http//cern.ch/lemon for more info

4
Lemon - schema
5
Sensor (MS) and Sensor Agent (MSA)
  • Sensor measures the data based on the requests
    from MSA
  • MSA receives the data from sensor through the
    pipe
  • MSA sends the data to the Monitoring Repository
    (MR) through the UDP socket
  • Typical communication between the two
  • MSA forks sensor system
  • MSA INI 1 LoadAvg
  • MSA GET 1
  • Sensor PUT 1 0.42
  • MSA sends UDP packet to MR
  • MSA controls the frequency and status of
    individual sensors (several of them)
  • You can write sensors yourself (bash, c, perl,)

6
Metrics
  • Measured metrics (about 255)
  • Status OS, disk DMA, RPM ok?, ethlink,
  • Daemons sshd, ntpd, syslogd, friod, alive
  • File size of files /etc/nologin, /afs/cern.ch,
  • Security sshd md5chksum,
  • Performace CPU utilization, memory utilization,
    network bandwidth use,
  • Misc virtual organization number of jobs, smart
    status, temperature,
  • (see the list at http//cern.ch/lemon-status/metri
    c_descriptions.php)
  • Status of the MSA can be seen in the
    /var/log/edg-fmon-agent.log file on each machine
    (log file to edg-fmon-agent daemon)

7
Lemon at CERN
  • Lemon monitors about 2100 computer within 100
    clusters
  • On average it collects about 70 metrics from each
    host
  • Part of the ELFms
  • Integrated with Sure alarm system
  • Collecting about 1GB/day
  • Integrated with CDB

Node
Configuration Management
Node Management
8
Sure system
  • Sure sensor checks values of the individual
    metrics with reference values and rises an alarms
    when the conditions are met
  • Examples
  • Loadavg gt 20 raises Load_high alarm
  • of sshd daemons lt 1 raises sshd_dead alarm
  • of Smart failure in /var/log/messages gt 0
    raises smart_failure alarm
  • Alarms are sent to the Sure servers
  • Operators acknowledge alarms, log them and if
    unable to resolve, notify responsible person
  • Sysadmins receive ITCM tickets for each alarms
    there are procedures how to handle them
  • Special case NO_CONTACT alarm

9
Web visualization and framework
  • LRF pre-process part of the data from Monitoring
    Repoistory and stores them into the RRD files for
    fast visualization
  • Groups the logical units (nodes) into clusters
    based on
  • CDB configuration database definition
  • user defined clusters
  • HW type
  • Racks
  • Php based web interface displays preprocessed
    data on demand and gives together with CDB and
    status information general overview
  • Check it at http//cern.ch/lemon-status

10
Summary
  • Lemon serves to provide monitoring information
    about the computers in the Computer Center at
    CERN
  • Thanks to its integration with Sure (alarm
    system) it allows fast and easy identification
    and repair of problems
  • In connection to CDB it allows easier overview of
    services and visualization of their performance
  • In connection to Remedy (ITCM) allows overview of
    the problems for the given service
Write a Comment
User Comments (0)
About PowerShow.com