Control and monitoring of trigger algorithms using Gaucho PowerPoint PPT Presentation

presentation player overlay
1 / 16
About This Presentation
Transcript and Presenter's Notes

Title: Control and monitoring of trigger algorithms using Gaucho


1
Control and monitoring of trigger algorithms
using Gaucho
  • Eric van Herwijnen
  • Wednesday 12th October 2005

2
Contents
  • The Problem
  • Gaucho architecture
  • Implementation
  • Experience
  • Conclusions

3
The problem
  • Control and monitor trigger (Gaudi) processes on
    event filter farm
  • Send monitoring data (counters, rates,
    histograms, status, error messages) to ECS
  • Configure jobs on the fly
  • Combine information from individual CPUs

4
GAUCHO architecture (1 process)
running
configure
start
ready
PVSS runs DIM clients to send comands and get
data from Gaudi jobs
PVSS runs DIM server to send accumulated data to
ROOT
PVSS project with FSM Gaudi Jobs are device units
FSM command start sends DIM command start to
Gaudi Job
FSM command (configure) starts execution of job
Gaudi Job starts event loop and sends state
running to PVSS
Gaudi Job sends state ready to PVSS
Gaudi Job creates a DIM server
Counters and histograms Sent to PVSS
5
Implementation
  • C Gaudi MonitorSvc allows same online/offline
    code
  • PVSS Panel structure
  • Per job (counters, configuration, dynamic
    subscription to histograms on the transient
    store)
  • Per node (two jobs, counters and histograms
    summed/averaged)
  • Per subfarm (n nodes)
  • 30 datapoints/job, 10 dpes each
  • 100 Dim services/job (some internal)
  • Dim services setup in a PVSSCTRL
  • PVSS library to manipulate histograms (executed
    when panels are open)
  • Packaged as LHCb JCOP Framework compatible tool
  • Root viewer for 2D histograms and further
    analysis

6
Experience
  • First experience during RTTC bad too much CPU
    usage on PVSS machine
  • Scripts rewritten, latest tests with 20 jobs on
    10 lxplus nodes better
  • Tests with dummy Gaudi job
  • Idle configuration for 1 node (2 jobs) 80 Mb, 4
    CPU (excluding PVSS itself)

7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Experience
  • 1 node, 2 jobs 71 CPU usage on PVSS machine
  • 38 PVSSCTRL, 20 PVSSEvent, 10 PVSSData, rest
    other PVSS processes
  • Stopping jobs takes 2 secs, all processes reduce
    CPU consumption as expected
  • Now try 20 jobs over 10 lxplus nodes
  • Idle configuration 225 Mb, 8 CPU

14
(No Transcript)
15
Experience with 20 jobs
  • 2205 dim services
  • CPU usage 100 on PVSS machine
  • Viewing counters (10 secs) and histograms (20
    secs) OK
  • Proportion between PVSSCTRL, PVSSEvent, PVSSDim
    the same
  • Stopping jobs takes about 2 minutes
  • CPU usage correctly drops
  • Some unexplained crashes of PVSSDIM, memory usage
    after stopping stays high

16
Conclusions
  • Performance is now reasonable
  • Next step integration of Gaucho into run control
    system of LHCb event filter farm (November)
  • http//lhcb-comp.web.cern.ch/lhcb-comp/ECS/Gaucho/
    default.htm
Write a Comment
User Comments (0)
About PowerShow.com