CDF Taking Stock - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

CDF Taking Stock

Description:

To monitor the space usage, users, SQL, tempspace, sniping of inactive sessions, ... Re-allocate Winchester Disk Array from fcdflnx1 to fcdfora1 ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 20
Provided by: aku9
Category:
Tags: cdf | stock | taking | winchester

less

Transcript and Presenter's Notes

Title: CDF Taking Stock


1
CDF Taking Stock
  • By Anil Kumar
  • CD/CSS/DSG
  • June 22, 2005

2
Current Infrastructure
3
Current Infrastructure
4
CDF Capacity All Applications
  • CDF Offline DB Growth 43(online)5G(offline)/year
  • Slow Control is not in Offline
  • CDF Online DB Growth 50G/year

5
CDF Online Applications
6
CDF Offline Applications
7
Monitoring And Data Modeling Tools
  • Monitoring Tools
  • dbatool/toolman
  • To monitor the space usage, users, SQL,
    tempspace, sniping of inactive sessions, auto
    start of Listener, IA, estimate table/Index stats
  • OEM (Oracle Enterprise Manager)
  • - DB Monitoring tool/ Monthly charts posted on
    web
  • Db Performance Charts
  • http//www-cdserver.fnal.gov/cd_public/css/dsg/db_
    stats/data/db_stats.html
  • The url for the ganglia charts (monitoring tools)
    ishttp//fcdfmon2.fnal.gov/
  • Data Modeling Tool
  • Oracle Designer is used for Data Modeling and
    initial space estimates for
  • applications.

8
Uptimes
  • Cdfonprd 100
  • Cdfofprd 99.4356
  • 1776 minutes unscheduled Down Time since
    11/11/2004
  • Cdf Replica 100

9
Accomplishments
  • Upgraded CDF databases to 9.2.0.6
  • Quarterly Database Security Up-to-date
  • Tuned/Regression test the streams replication as
    per current API usage.
  • Deployment of bzora1 for cdfonprd
  • Very smooth transition. No interruption to
    Data Taking !
  • Decommissioned b0dau35
  • Oracle Backups for cdfonprd to DCache/Enstore
  • http//www-css.fnal.gov/dsg/external/cdfdbmtgs/al
    l_other_documentation/bzora1.pdf
  • Deployed the long/eagerly awaited streams
    replication across CDF databases.
  • Hard Work of css-dsg spanned across more
    than 2 years is finally in production. All issues
    encountered are addressed in timely manner.
  • Smooth Transition to fcdfora6 with streams
  • replication.
  • Decommissioned fcdflnx1.
  • Implemented Capture of Long transactions in db.

10
Replication Tool
  • Streams Replication tool strmrep
  • Production Deployment of Streams Replication
    encountered
  • two issues
  • a) Replication of two packages RUNDB and
    HDWDB caused
  • Streams to halt. Worked very hard to
    address the issue/deploy the workaround.
  • Permanent bug Fix is released by Oracle
    on Thus 06/16/05.
  • This bug was not encountered in
    integration Test.
  • b) SAM cant be replicated using streams
    since SAM application
  • has variable length CLOBS and
    functional index.
  • There was not enough time to do
    regression test and no use case.
  • One more error after production deployment that
    was causing one of streams
  • process to halt. Deployed the work
    around.
  • Oracle found the root cause and bug fix
    will be available in 1-2 weeks.
  • Cdf Streams Status on-line ( Courtesy Randy
    Herber) http//dbb.fnal.gov8520/cdfr2/databases?t
    ypeora-strmsfsrccdfofpr2nsrccdfofpr2gsrccdf
    ofpr2dcbkFILECATALOG
  • Documentation http//www-css.fnal.gov/dsg/internal
    /ora_repl/

11
Freeware Db Support
  • Mysql/Postgres prototype
  • proof of product with CDF data
  • Mechanism for population IS on demand, it does
    not support updates
  • CDF successfully tested with CDF code -
    (Karlsruhe)
  • DSG has begun to provide consulting for freeware
    databases
  • actively maintaining new versions of mysql
    postgres in KITS and working towards a more
    robust environment
  • actively maintaining documentation for mysql
    postgres in our freeware area.
  • Reference url
  • http//www-css.fnal.gov/dsg/external/freeware
  • actively assisting users with questions,
    upgrades, testing, etc. for freeware products.

12
Back-up
  • CDF ONLINE DATABASES
  • cdfonprd
  • - Daily, 7 days of archives, Two Backup copies
    always on DISK
  • - Allocated 535GB Used 461GB ( 2 Copies) ,
    Backup time 1 Hr 23 Min Vs 2 Hrs 30 Min on old
    hardware
  • - CDF on-line Backup to DCache/Enstore Daily
  • cdfondev Daily, 14 days archives, one always on
    DISK cdfondev -gt 2 Hrs 30Min
  • cdfonint Daily, 30 days archives, one always on
    Disk
  • - Allocated 356GB, Used 219GB ,cdfonint -gt 2
    Hrs 15 Min
  • CDF OFFLINE DATABASES
  • cdfofprd DFCSAM, Daily, 8 days of archive and
    Export. One always on Disk.
  • Allocated 270GB , Used 25 GB Backup time -gt
    1 Hr 24 Min
  • Cdfstrm1 being replica of on-line and DFC. No
    backup -gtRMAN/ Tape.
  • cdfofdev Daily cdfofdev, 7 days of archives ,
    cdfofdev -gt 3 Hrs 20 Mins
  • cdfofint 2 times/week for cdfofint, 7 days of
    archives
  • Allocated 67Gb, used 36Gb (2 copies) ,
    cdfofint-gt 230Mins

13
Oracle Backup for cdfonprd toDCache/Enstore
  • RMAN to DCache/Enstore is working fine, but needs
    fine tuning to fit our(dsg) standard, firewall
    independent backup mode.
  • Working reliably. Fully automated for dailys.
  • Data Integrity tested twice while recovery.
  • Data Integrity tested 4 times via md5sum
  • Not currently using weekly or monthly PNFS
    directory structure.
  • Intend to send weekly on Sunday and Monthly on
    Ist.
  • No archives being sent yet.
  • PNFS metadata maintenance being done manually.

14
RMAN Backup on SAN
  • Inexpensive, large disk array can accommodate
    growing RMAN backups
  • Fast reliable backup and recovery
  • 24 x 7 and 8 x 5 support tiers available
  • Can serve various O/S platforms
  • Briefing on the database backup/recovery
    standardization on june 16, discussed the san
    testing in more detail.
  • http//www-css.fnal.gov/dsg/internal/briefings_and
    _projects/briefings/standardizing_database_backups
    .ppt
  • Multiplexing of archives to local disk and SAN

15
RMAN to SAN Experience
  • d0ofdev1 RMANs to SAN since Nov. 04
  • Two 1TB SAN mount points available
  • Keep 2 alternating days of RMANs on SAN,
    once/week to local backup disk
  • RMAN validation to determine backup file
    integrity
  • One validation failure since Nov. 04
  • Recoveries from SAN were all successful

16
SAN issues
  • Current SAN is not 24 x 7 support
  • IDE disks are not as reliable as other, more
    expensive disks are
  • Purchasing 24 x 7 SAN requires licensing and
    changes to O/S to be able to use it
  • Firewall issues (CDF D0 online)
  • We will be extremely careful in implementing SAN
    for bzora1.
  • On bzora1
  • a) PCI Card has been installed.
  • b) fiber between cdf and fcc has been
    identified for use, we are waiting for
    additional san hardware for bzora1.

17
SAM Schema
  • Production Deployments
  • - Autodestination Sub-System of SAM schema
  • - Indexes on Param Values Deployed in
    production.
  • - Data Types correction cut.
  • - Indexes for Volumes
  • Work-in-progress
  • - Request Sub System of SAM Schema. Cut in
    Mini-sam.
  • Upgrade to Mini SAM as SAM Schema Evolved. -gt
    This facilitate individual developers to have
    copy of SAM metadata and seed data available for
    server software rewrite if needed.
  • Mini-SAM in Postgres. Initiative to move towards
    free ware Databases for SAM . Proof of product
    not complete, requires testing with a dbserver 
    from the sam development team

18
Whats Next ?
  • Deploy san/enstore backup recovery plan.
  • ( TESTING OF SAN on d0 offline is
    work-in-progress)
  • Backup to DCache/Enstore already in place for CDF
    on-line
  • Re-allocate Winchester Disk Array from fcdflnx1
    to fcdfora1
  • sothat enough space to reconfigure streams
    integration setup.
  • Reconfigure Streams Test Env cdfonint -gt cdfofint
    -gt cdfrep23
  • SAM Request Sub System Schema Deployment
  • Patch cdf database for replication of RUNDB and
    HDWDB packages.
  • ( Patch was released by Oracle on Thu
    06/16/05)
  • Converting cdfonline to 64 bit. Testing will be
    challenge.
  • O/S upgrade (reinstall) to 2.9 on b0dau36 .
    Decommissioned Veritas.
  • Performance tuning on fcdfora4 to sga gt 2Gb to
    allocate more memory to streams
  • Migrating Slow Control to Linux.
  • Rewrite of dbatools/toolman for enhanced features
    of monitoring and 10g support.
  • Upgrade OEM to 10g . Work in progress.
  • Possible Upgrade to 10g due to incremental
    database backups and streams replications
    enhanced features.
  • Testing of postgres mini sam for proof of
    product.

19
Concerns
  • Replication of SAM depends upon the stress test
    results on fcdfora4.
  • Simulation of Applications as we have for CALIB.
    Robust test Suite needed.
  • Single point of failure for SAM and DFC
  • Migration of DFC to SAM . Plan and Schedule ?
  • Close Out for Data Guard/Standby is still
    pending.
  • Move Slow Control off of bzora1
  • - Require 3 instances
  • - OS Linux ? If Linux then not
    a 247 Machine.
  • Some of CDF Applications Data Model is not in
    Designer.
  • What is cdf's direction, if any, in respect to
    freeware?
  • Any more Streams replica ?
  • Deputy CDF database Liaison ?
  • TNSNAMES deployment for CDF was a nightmare.
    Experience should be documented.
  • Special Clean-up jobs should be co-ordinated with
    css-dsg
  • In case of Hardware Failure on offline, we have
    to resintantiate replication Vs recovery since we
    have partial backups on offline prd db.
  • Move off VxWorks from b0dau36.
Write a Comment
User Comments (0)
About PowerShow.com