By Anil Kumar - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

By Anil Kumar

Description:

The machine has a Clariion 4500 hardware raid array. ... Moving d0 offline to a standardized backup recovery method using a san and enstore. ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 15
Provided by: aku9
Category:

less

Transcript and Presenter's Notes

Title: By Anil Kumar


1
D0 Taking Stock
  • By Anil Kumar
  • CD/CSS/DSG
  • July 10, 2006

2
D0 offline Production/Integration Infrastructure
  • 8 900MHz CPU 16G RAM
  • The machine has a Clariion 4500 hardware raid
    array.
  • Oracle Server 10gR2 (64 bit) on Solaris 2.9 (64
    bit)
  • Load Avg 4-6 CPU usage 77 , Memory Free 50
  • Uptime excluding scheduled down times 99.97716
    Uptime (based on 120 min of total db
    unavailability) since June, 2005 vs 99.85269

3
D0 offline production Luminosity
  • Sun v40z with 2 AMD 844MHz CPU
  • RHEL3 x86_64
  • 2 Ultra 160 RAID controllers
  • 16G RAM
  • Oracle server 10gR1

4
D0 offline development Infrastructure
  • D0ora1
  • Sun E4500
  • 8 400 MHz CPU, 4GB of RAM
  • Oracle 10gR2 64 bit on Solaris 2.9
  • Load Avg 1-2 , CPU usage 33, Mem Free 19
  • D0lum1
  • Sun v40z with 2 844MHz AMD CPU
  • RHEL3 x86_64
  • 16G RAM
  • Oracle server 10gR2

5
Space Usage
6
Space Usage Summary
  • d0ofprd1 1285 GB used.
  • d0ofint1 103 GB used.
  • 2.25TB is available for
  • use for int and production.
  • d0ofdev1 120 GB Used
  • 11GB is available for use.
  • d0oflumd 285GB used
  • doflumi 482MB used
  • 150GB is available for d0oflumd and
    d0oflumi
  • D0oflump 363GB used
  • 411 GB is available.

7
Capacity Planning
  • Next three years expected growth 1.1 TB
  • SAM growth 375Gb/year and other apps
    15Gb/year. This exclude Luminosity DB
  • We have around 2.2TB available.
  • Luminosity growth is 125Gb/year.

8
Accomplishments
  • Upgraded D0 offline databases to 10gR2. Also OS
    upgrade for D0dbsrv nodes. Replacement
  • of d0dbsrv5 node with new hardware and
    upgraded memory to 4GB vs 2GB
  • Export of Trigger Database. Retention Policy 30
    days on disk and daily taken to Dcache.
  • Mini-trigger Simulator Set-up
  • Deployment of Lum Db in production 10gR2.
  • Quarterly Database Security/OS patches
    Up-to-date.
  • Upgrade OEM to 10g
  • Rewrite of dbatools/toolman for enhanced features
    of monitoring and 10g support.
  • Disk Capacity Upgrade on d0 offline production
    database.
  • Db Security Enhancements. Restricting access to
    Dictionary. Restricted Usage of Database Links.
    Password complexiety,locking the obsolete
    accounts and password complexity.

9
Back-up/Recovery
  • D0ofprd1
  • - Daily, 7 days of archives, one backup always
    on DISK
  • - Bi-monthly backup of READ ONLY tablespaces
  • - Allocated 2TB Used 1.2TB, to tape Daily, RMAN
    Back-up time -gt 6 Hrs ( 3 Hrs 45 Excl READ ONLY
    2 Hrs 20 READ ONLY )
  • No Export
  • -Tape Rotation 1 Week for Daily backups and 2
    months for Read Only backups.
  • - Backups taken to dcache 2x/week,
  • Read-Only taken 2x/month. Archives taken
    every 30 min.
  • D0lump Daily backups to SAN. To dcache daily.
    Archives taken every 30 min.
  • D0ofint1 Once a week on Local disk
  • D0ofdev1 Sat. backup on local disk otherwise on
    SAN
  • -Allocated 100GB, used 58GB, Daily Tape Backup
  • RMAN Backup time -gt 2 hrs. Tape Rotation 2
    Months.

10
Production backups to SAN
  • Two 1TB SAN mount points in use on d0ora2
  • One in use on d0lum2
  • daily backup to SAN
  • Always 1 backup on disk, plus X200 tape library
    backup of RMAN from local disk, and dcache copy
  • Read-only portion of database backed up
    twice/month to SAN

11
SAN issues
  • Current SAN is not 24 x 7 support
  • IDE disks are not as reliable as other, more
    expensive disks are.
  • However, these seems to be reliable. We do
    rman backup validate for backup files on SAN.
    Also recentely recovery was done after restore
    from SAN.
  • Current SAN is trouble free except when the path
    failed a couple of months ago, and because the
    san is not dual path, it prevented backups over
    the weekend, as this is not 24/7 supported and
    we had to wait till monday to get support.
  • Purchasing 24 x 7 SAN requires licensing and
    changes to O/S to be able to use it
  • Details for Future of SAN at RunII databases will
    be covered in Ray P/Steve K. s presentation.

12
SAM Schema
  • Production Deployments
  • Storage Location v6_1.
  • SAM Request Sub System v6_0
  • Work-in-progress
  • - v6_3 Retiring Files.
  • Upgrade to Mini SAM as SAM Schema Evolved. -gt
    This facilitate individual developers to have
    copy of SAM metadata and seed data available for
    server software rewrite if needed.
  • Mini-SAM in Postgres. Initiative to move towards
    free ware Databases for SAM
  • Proof of product not complete, requires
    testing with a dbserver  from the sam development
    team
  • 2.38B events in 47 Partitions. Now Avg 1
    partition/ 3 running weeks
  • Partitions Rollover dates URL
  • http//www-css.fnal.gov/dsg/internal/databs_appl/s
    am_event_partitions.html

13
Whats Next ?
  • Deploy san/enstore backup recovery plan.
  • Replacement of Aging Clariion Array.
  • May be new d0dbsrv nodes. At least Primary
    Nodes. Luminosity DB server is 2 times performant
    with C caching Server, but causes intermittent
    crash of other Calib servers. May be dedicated
    nodes for Luminosity Servers.
  • Cut New event Partitions for SAM
  • ASO ( Advanced Security Option) Deployment.
  • Upgrade Designer and its repository to 10g
  • Bundling of Redhat renewal licenses into one P.O.
  • Testing of postgres mini sam for proof of
    product.

14
Concerns
  • Python Dcoracle to be built with Oracle 10g
    Client. Oracle recommends client version should
    be same as Database Version.
  • Any Oracle Patch may break Pyhton
    Dcoracle built with 8i client.
  • Backups will get bigger . So backup of VLDB
  • SAM Servers on Linux ? Security Audits may
    mandate dedicated node for SAM servers and web
    servers.
  • Not Enough Space for Integration db to do full
    refresh of SAM.
  • Single point of failures with D0 offline
    database.
  • Future of the aging clarion array must be
    addressed in next budget.
  • Hardware for D0 DB server machines is very old.
    Should consider upgrading the hardware for d0 db
    servers.
  • Post the Performance Graphs gone in 10gR2
    monitoring tool.
Write a Comment
User Comments (0)
About PowerShow.com