Title: Database Services for Physics @ CERN Deployment and Monitoring
1Database Services for Physics _at_ CERNDeployment
and Monitoring
- Radovan Chytracek
- CERN IT Department
2Outline
- Database services for physics
- Status today
- How we do the services tomorrow?
- Performance tuning
- Monitoring
- Oracle monitoring sensors
- Status of monitoring
- Conclusions
3Database Services for Physics
- Mandate
- DB Applications deployment, DB administration,
Consulting - Data challenges, data distribution between CERN
T1 centers (3D project) - Scope
- LHC non-LHC experiments
- ATLAS, CMS, LHCb, Alice COMPASS HARP
- LHC Grid applications
- FTS, LFC, VOMS, GridView
- Preparation activities for LHC start-up increase
requirements for DB service - Number of DB servers, data volume, availability,
scalability, automated deployment procedures - Having scalable and reliable service is the
priority
4Database Services for Physics Today
- HW infrastructure
- Oracle Sun cluster
- Set of single DB instances
- SUN cluster overloaded
- Difficulties to isolate existing applications
- Not the fastest storage
- Still on Oracle 9i
- Maintenance issues for single DB instances
- Many used to run as stop-gap to off-load SUN
cluster, now phased out - No load balancing fail-over
- Complex maintenance backup
5Towards a Scalable Service for LHC
- Deploying Oracle 10g RAC/Linux
- Isolation (10g services), Scalability (CPU
storage), Reliability (failover), Manageability
(easier to administer) - Coordinating work-plan across several IT groups
- Hardware now in place and acceptance tested
- RAC configuration and functionality tests going
on now - Working on automated DB Server install integrated
with s/w installation tools used for OS (thanks
to IT/FIO) - Setting-up several RAC systems
- 4 x 2-node RAC for LHC experiments
- 2-node integration 4-node testing RAC
- Migrating apps from SUN cluster to RAC by end of
2005
6Oracle RAC Architecture
7Steps Towards a Reliable Service
- Well defined pro-active deployment process
- Proper planning of database capacity (volume
CPU) - Insure the optimization of key applications
before production starts - Classified database application types
- Resource consuming applications
- Guarantee of resources
- Start low, increase as needed
- Standard applications
- Smaller database applications which can run in a
shared service - Layered service implemented
- Development Service (code development, low data
volumes, no backup) - Integration and Validation Service (for key apps)
- Enough resources for larger tests, consulting
available, booking 2 months in advance - Production Service
- Full production quality service (backup,
monitoring, on call service) - Monitoring to detect new resource consuming
applications or changes in access patterns
8Performance tuning
- Constant fight on three front-lines
- HW (CPU, network, storage)
- Server side (OS, DB, schema design)
- Client side (bugs, wrong practices, queries)
- HW can be improved by better iron
- SW should be safe by not making mistakes
- New or upgraded apps have the same or new bugs
- Good schema designs is often difficult
- Following good practices seems to be tough job
too - DBAs are inevitable
- Spit out analyze the bad things give advices
24/7
9Tracing
- Server side
- Various levels, session tracing is the most used
one - Must ship the server trace file back to user
- Security issues, some development effort required
- Supported by LCG SW (POOL Oracle plug-in)
- Client side
- Required to make the whole picture complete
- Does not exist out-of-the-box
- Application code instrumentation needed
- Often connected to monitoring systems
- Support being built into LCG SW
10Monitoring
- Allow DBAs and developers inspect the current
state of a database instance in an easy way
without a need for complex software - Goal is to enable database application level
monitoring in coherent way with the existing OS
level monitoring provided by LEMON - Easy access via web interface to quantities and
trends describing current database instance
behavior with keeping their history and
possibility to zoom in a given time period
11Monitoring Metrics
- Considered OEM repository but requires OEM
infrastructure in place and not all instances are
in OEM - What if OEM is down?
- Data kept only 1 month
- Source instances SYS.V... performance views
- The baseline DB metrics extracted from
SYS.VSYSSTAT dynamic performance view - Recalculated exactly the same way as done in OEM
- Examples SQLNet in/out data rate, logical I/O,
physical I/O, SQL per second - Application level monitored via SYS.VSESSION
views
12DB sensor for LEMON version I
- SQL script executed via SQLPlus
- Connecting to the locally detected database
- Shell driver script executed by a simple Perl
sensor in LEMON framework - Detects local DB settings from /etc/oratab file
and names of local oracle daemons (pmon)
LEMON framework
Activates each 5 mins Communication via pipe
LEMON DB
DB sensor
Captures stdout from driver script
lemon_sensor.sh
Executes query via Sqlplus and writes data to
stdout
Monitored DB instance
lemon_sensor.sql
query
13DB sensor for LEMON version II
- SQL queries still executed via SQLPlus
- Connecting to the locally detected or remote
database - SQLPlus tool wrapped in Perl class module
- Allows to keep single permanent connection only
- DB instance SQLPlus tool auto detection
- DDL DML and queries API provided
LEMON framework
Activates each 5 mins Communication via pipe
LEMON DB
DB sensor
Captures output from Oracle sensor
oracle_sensor.pl
SQLPlus.pm
Executes query via Sqlplus instance
Monitored DB instance
query
SQLPlus instance
result
14Monitoring status
- DB LEMON sensors tested on various systems
- Single DB instances, Oracle 9i/10g
- LEMON databases
- RAC systems, Oracle 10g
- Web display metrics deployed in LEMON
development version - Little development needed
- DB metadata read from OEM repository
- Clicking a metrics graph in detailed view jumps
to zoomable time period view similar to OEM - RAC cluster databases shown as computer cluster
in LEMON
15Next steps
- The monitoring of the WAIT events in progress
- Performance tuning is difficult without having
these - Deployment of the new LEMON DB sensor on all
physics databases - Currently running on selected instances and few
RAC nodes - Oracle installation procedures need to be updated
to include proper monitoring settings
16Summary
- Building DB services for LHC is a challenge
- Well defined pro-active service is required
- Performance tuning and testing are essential for
the resource planning - Save some resources by proper monitoring
- For details about the LEMON system, see talk by
Miroslav Siket later this afternoon
17For the Curious
- Check out the upcoming
- LCG Database Deployment
- And
- Persistency Workshop
- 17 October - 19 October 2005
- http//agenda.cern.ch/fullAgenda.php?idaa055549