Database Services for Physics @ CERN Deployment and Monitoring PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: Database Services for Physics @ CERN Deployment and Monitoring


1
Database Services for Physics _at_ CERNDeployment
and Monitoring
  • Radovan Chytracek
  • CERN IT Department

2
Outline
  • Database services for physics
  • Status today
  • How we do the services tomorrow?
  • Performance tuning
  • Monitoring
  • Oracle monitoring sensors
  • Status of monitoring
  • Conclusions

3
Database Services for Physics
  • Mandate
  • DB Applications deployment, DB administration,
    Consulting
  • Data challenges, data distribution between CERN
    T1 centers (3D project)
  • Scope
  • LHC non-LHC experiments
  • ATLAS, CMS, LHCb, Alice COMPASS HARP
  • LHC Grid applications
  • FTS, LFC, VOMS, GridView
  • Preparation activities for LHC start-up increase
    requirements for DB service
  • Number of DB servers, data volume, availability,
    scalability, automated deployment procedures
  • Having scalable and reliable service is the
    priority

4
Database Services for Physics Today
  • HW infrastructure
  • Oracle Sun cluster
  • Set of single DB instances
  • SUN cluster overloaded
  • Difficulties to isolate existing applications
  • Not the fastest storage
  • Still on Oracle 9i
  • Maintenance issues for single DB instances
  • Many used to run as stop-gap to off-load SUN
    cluster, now phased out
  • No load balancing fail-over
  • Complex maintenance backup

5
Towards a Scalable Service for LHC
  • Deploying Oracle 10g RAC/Linux
  • Isolation (10g services), Scalability (CPU
    storage), Reliability (failover), Manageability
    (easier to administer)
  • Coordinating work-plan across several IT groups
  • Hardware now in place and acceptance tested
  • RAC configuration and functionality tests going
    on now
  • Working on automated DB Server install integrated
    with s/w installation tools used for OS (thanks
    to IT/FIO)
  • Setting-up several RAC systems
  • 4 x 2-node RAC for LHC experiments
  • 2-node integration 4-node testing RAC
  • Migrating apps from SUN cluster to RAC by end of
    2005

6
Oracle RAC Architecture
7
Steps Towards a Reliable Service
  • Well defined pro-active deployment process
  • Proper planning of database capacity (volume
    CPU)
  • Insure the optimization of key applications
    before production starts
  • Classified database application types
  • Resource consuming applications
  • Guarantee of resources
  • Start low, increase as needed
  • Standard applications
  • Smaller database applications which can run in a
    shared service
  • Layered service implemented
  • Development Service (code development, low data
    volumes, no backup)
  • Integration and Validation Service (for key apps)
  • Enough resources for larger tests, consulting
    available, booking 2 months in advance
  • Production Service
  • Full production quality service (backup,
    monitoring, on call service)
  • Monitoring to detect new resource consuming
    applications or changes in access patterns

8
Performance tuning
  • Constant fight on three front-lines
  • HW (CPU, network, storage)
  • Server side (OS, DB, schema design)
  • Client side (bugs, wrong practices, queries)
  • HW can be improved by better iron
  • SW should be safe by not making mistakes
  • New or upgraded apps have the same or new bugs
  • Good schema designs is often difficult
  • Following good practices seems to be tough job
    too
  • DBAs are inevitable
  • Spit out analyze the bad things give advices
    24/7

9
Tracing
  • Server side
  • Various levels, session tracing is the most used
    one
  • Must ship the server trace file back to user
  • Security issues, some development effort required
  • Supported by LCG SW (POOL Oracle plug-in)
  • Client side
  • Required to make the whole picture complete
  • Does not exist out-of-the-box
  • Application code instrumentation needed
  • Often connected to monitoring systems
  • Support being built into LCG SW

10
Monitoring
  • Allow DBAs and developers inspect the current
    state of a database instance in an easy way
    without a need for complex software
  • Goal is to enable database application level
    monitoring in coherent way with the existing OS
    level monitoring provided by LEMON
  • Easy access via web interface to quantities and
    trends describing current database instance
    behavior with keeping their history and
    possibility to zoom in a given time period

11
Monitoring Metrics
  • Considered OEM repository but requires OEM
    infrastructure in place and not all instances are
    in OEM
  • What if OEM is down?
  • Data kept only 1 month
  • Source instances SYS.V... performance views
  • The baseline DB metrics extracted from
    SYS.VSYSSTAT dynamic performance view
  • Recalculated exactly the same way as done in OEM
  • Examples SQLNet in/out data rate, logical I/O,
    physical I/O, SQL per second
  • Application level monitored via SYS.VSESSION
    views

12
DB sensor for LEMON version I
  • SQL script executed via SQLPlus
  • Connecting to the locally detected database
  • Shell driver script executed by a simple Perl
    sensor in LEMON framework
  • Detects local DB settings from /etc/oratab file
    and names of local oracle daemons (pmon)

LEMON framework
Activates each 5 mins Communication via pipe
LEMON DB
DB sensor
Captures stdout from driver script
lemon_sensor.sh
Executes query via Sqlplus and writes data to
stdout
Monitored DB instance
lemon_sensor.sql
query
13
DB sensor for LEMON version II
  • SQL queries still executed via SQLPlus
  • Connecting to the locally detected or remote
    database
  • SQLPlus tool wrapped in Perl class module
  • Allows to keep single permanent connection only
  • DB instance SQLPlus tool auto detection
  • DDL DML and queries API provided

LEMON framework
Activates each 5 mins Communication via pipe
LEMON DB
DB sensor
Captures output from Oracle sensor
oracle_sensor.pl
SQLPlus.pm
Executes query via Sqlplus instance
Monitored DB instance
query
SQLPlus instance
result
14
Monitoring status
  • DB LEMON sensors tested on various systems
  • Single DB instances, Oracle 9i/10g
  • LEMON databases
  • RAC systems, Oracle 10g
  • Web display metrics deployed in LEMON
    development version
  • Little development needed
  • DB metadata read from OEM repository
  • Clicking a metrics graph in detailed view jumps
    to zoomable time period view similar to OEM
  • RAC cluster databases shown as computer cluster
    in LEMON

15
Next steps
  • The monitoring of the WAIT events in progress
  • Performance tuning is difficult without having
    these
  • Deployment of the new LEMON DB sensor on all
    physics databases
  • Currently running on selected instances and few
    RAC nodes
  • Oracle installation procedures need to be updated
    to include proper monitoring settings

16
Summary
  • Building DB services for LHC is a challenge
  • Well defined pro-active service is required
  • Performance tuning and testing are essential for
    the resource planning
  • Save some resources by proper monitoring
  • For details about the LEMON system, see talk by
    Miroslav Siket later this afternoon

17
For the Curious
  • Check out the upcoming
  • LCG Database Deployment
  • And
  • Persistency Workshop
  • 17 October - 19 October 2005
  • http//agenda.cern.ch/fullAgenda.php?idaa055549
Write a Comment
User Comments (0)
About PowerShow.com