LCG 3D Project Status - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

LCG 3D Project Status

Description:

Some still not existing or bound to MySQL at the minute ... Host system load, space thresholds. Performance problems & optimization. Site Communication ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 16
Provided by: DirkDue4
Category:
Tags: lcg | hosting | mysql | project | status

less

Transcript and Presenter's Notes

Title: LCG 3D Project Status


1
LCG 3D Project Status Plans
  • Dirk Düllmann, IT-DB
  • More details at http//lcg3d.cern.ch

2
Starting Point for a Service Architecture?
O
T0 - autonomous reliable service
T3/4
T1- db back bone - all data replicated - reliable
service
T2 - local db cache -subset data -only local
service
O
O
M
3
Requirement Gathering
  • All participating experiments have signed up
    their representatives
  • Rather 3 than 2 names
  • Tough job, as survey inside experiment crosses
    many boundaries!
  • Started with a simple spreadsheet capturing the
    various applications
  • Grouped by application
  • One line per replication and Tier (distribution
    step)
  • Two main patterns
  • Experiment data data fan-out - T0-gtTn
  • Grid services data consolidation - Tn-gtT0
  • But some exceptions which need to be documented
  • Aim to collect complete set of requirements for
    database services
  • Also data which is stored locally never leaves a
    site
  • Needed to properly size the h/w at each site
  • Most data does not require multi-master
    replication
  • Good news as most service people are scared about
    the deployment impact

4
(No Transcript)
5
Experiment Requirements
  • Requirement discussion is progressing!
  • Experiment representatives had to find contacts
    inside their experiment
  • Application responsibles are not prepared yet to
    give good estimates for data volume and split
    onto tiers
  • Computing models are in this area not too
    concrete yet - and may only get more concrete
    after first deployment of the database service
  • All participating experiments produced a first
    draft requirement sheet
  • Having a central contact on the experiment side
    is useful for consistency and prioritization
    among the requests
  • Need to complete requirement sheets from all
    experiments
  • To agree between experiments and service
    providers on what needs to be provided next year
  • Risks of wrong estimates should not go only to
    the service side!
  • If numbers are wrong well iterate, but no
    request is interpreted as no service required
  • Propose to use the 3D requests also as experiment
    input for CERN service planning
  • Eg as estimate for database backup required for
    2005
  • The sooner we start iterating the quicker there
    may be convergence on a concrete service

6
Preliminary Summary for 2005
  • Main applications mentioned so far
  • FileCatalog, Conditions, Geomentry, Bookkeeping,
    Physics Meta Data, Collections, Grid Monitoring,
    TransferDB
  • Suspect that several smaller(?) applications are
    still missing
  • No online request from CMS yet
  • No concrete input from EGEE/ARDA yet
  • Expect that File Catalog and File Meta Data
    volume needs to be determined by experiments -
    but need a deployment model!
  • Total volume per experiment 50-500 GB
  • Error bars likely as big as the current spread
  • Significant flexibility required from service
    side!
  • Number of applications to be supported O(5)
  • Some still not existing or bound to MySQL at the
    minute
  • May need policy for distributed applications (eg
    require RAL or ODBC based implementation) to
    allow for a deployable service
  • Obviously common packages like ConditionsDB using
    RAL will help
  • Distributed Data becomes read-only down from T0
  • Conservative approach for first service
    deployment
  • may be reviewed later..
  • Good news as this means directed replication
    (rather then multi-master)

7
Application Software
  • DB Service will need application coupling for
    validating the requirements
  • Many data items listed as candidates for the
    service use POOL-RAL or ODBC
  • Started working on a database replica catalog
    interface
  • Avoid too much experiment/grid service code which
    embeds physical connection strings
  • Current prototype based on POOL file catalog
  • Will support network disconnected lookup based on
    simple XML files
  • Final back-end implementation may still change
    once a suitable grid-service can be found
  • Relevant work on database connection and error
    handling in ATLAS (and maybe other experiments)
  • Can/should we consolidate this software eg as
    part of a 3D reference implementation as part of
    the Relational Abstraction Layer (currently in
    POOL)?

8
FroNtier
  • Frontier developers at FNAL have produced working
    prototype of a POOL storage service based on
    Frontier
  • Allows to retrieve Frontier objects as defined in
    the LCG Dictionary as any other objects from POOL
    applications
  • Read-Only access
  • Some remaining integration issues
  • Plan to setup squid caches as part of the 3D test
    bed to allow to compare DB and non-DB approaches
  • and understand any issues with SQUID web cache
    deployment

9
Site Contacts
  • Established contact to several Tier 1 and Tier 2
    sites
  • Tier 1 ASCC, BNL, CERN, CNAF, FNAL, GridKa,
    IN2P3, RAL
  • Tier 2 ANL, U Chicago
  • Visited FNAL and BNL around HEPiX
  • Very useful discussions with experiment
    developers and database service teams there
  • Regular meetings have started
  • (Almost) weekly phone meeting back-to-back for
  • Requirement WG
  • Service definition WG
  • Current time (Thursday 16-18) is inconvenient for
    Taiwan
  • All above sites have expressed interest in
    project participation
  • BNL has started to setup Oracle setup
  • Most sites have allocated and installed h/w for
    participation in the 3D test bed
  • U Chicago agreed to act as Tier 2 in the testbed

10
Replication Test-bed
  • Oracle Streams as main technology Tier0/1
    database backbone
  • Significant experience at FNAL and CERN
  • Interest from all sites to participate in a
    replication test-bed
  • At most sites hardware and oracle installation is
    already available
  • Agreed to start with Oracle 10g
  • Several problems discovered by FNAL have been
    fixed and tested
  • Likely production platform for next year
  • Oracle patches can still pick up any new problems
    observed in the testbed
  • Will exercise POOL File Catalog and test
    work-load
  • Main reason we have an established work load
  • Expand to experiment applications/work-load from
    requirement table
  • Will run on different back-end storage/db setups
  • Different back end storage
  • Clustered and non-clustered instances
  • Assumption Need to be able to factorize out
    backend storage decisions to leave sites
    unconstrained in their purchases in that area.

11
Service Policies
  • Several sites have deployment policies in place
  • E.g. FNAL
  • Staged service levels
  • Development -gt Integration -gt Production systems
  • Well defined move of new code / DB schema during
    development process
  • Apps developers and DB experts review and
    optimize schema before production deployment
  • Similar policy proposal prepared for CERN physics
    database services
  • To avoid recent interference between key
    production applications of different experiments
    on shared resources
  • Caused by missing indices, inefficient queries,
    inadequate hardware resources
  • Storage volume alone is not a sufficient metric
    to define a database service
  • Need dummy workload for each key application to
    define and optimize the service
  • How many requests from how many clients on how
    much data are required?
  • Especially for distributed DB service some
    agreement like this will be essential to avoid
    surprises on either side
  • This may need resources from the database
    services which need to be planned at all/some
    sites

12
Service Split
  • Discussion only just starting
  • Early draft proposal received from FNAL
  • Local Services
  • Server installation, bug security fixes
  • OS patches/upgrades
  • Backup/recovery support
  • Data migration (between db servers)
  • Shared Services
  • Db and OS accounts privileges
  • Storage support (adding more space)
  • Monitoring
  • DB alerts, killer queries cron job output
  • Host system load, space thresholds
  • Performance problems optimization
  • Site Communication
  • Proposal to setup a shared (web based) Log-Book,
    mailing lists
  • Need to establish regular DBA meeting
  • eg as part of weekly/bi-weekly 3D meetings

13
Other areas of ongoing activity
  • Network connectivity
  • First discussions about firewall configuration
    with LCG security
  • Well defined T0-T1 connectivity not considered a
    big issue
  • Need to expose proposal for T2 to a larger
    community of site admins
  • Database Administration Tools
  • Oracle Enterprise Manager (now called Grid
    Control) setup planned for testbed
  • Match with service split and stability in the WAN
    need to be confirmed
  • Application and Database Monitoring
  • Planning client side diagnostics in POOL RAL
  • Evaluating OEM for joined server side diagnostics
  • Database Authentification and Authorisation
  • First ideas from the LCG GD side to be presented
    at the 3D w/s
  • Oracle expert will provide information about
    integration with Oracle back-end
  • Need to come up with a simple set of database
    roles to which individual grid users can be
    mapped (based on VOMS?)

14
Open Questions
  • Application Server Services
  • Still needed by any key application after
    EDG-RLS?
  • If yes, who organises deployment discussion?
  • Need for FroNtier test-bed in 3D?
  • Will collect input from experiments about their
    plans
  • Tier1 to Experiment association?
  • Not all T1 sites will provide DB services
  • Is the current set including their experiment
    association sufficient?

15
Summary
  • LCG 3D project has started with experiment and
    site meetings
  • First set of requirement sheets received
  • split onto tiers and volumes still are still
    unknown or have very large error bars
  • Propose to use the same numbers also for CERN
    database service planning
  • 3D to LCG AA s/w integration has started and is
    hosted in POOL RAL
  • available manpower is limited
  • DB Service Definition
  • Very relevant deployment experience from RUN2 _at_
    FNAL
  • Need closer contact between DB groups also at
    other sites
  • Service task split and application acceptance
    policy proposals are firming up
  • Oracle 10g based replication test-bed in place
    will expand to T1 now
  • Assume that joint test-bed operation can be done
    by site contacts
  • Biggest worry most DB applications are still
    being developed
  • Expect high support load on database services
    (especially at CERN) for 2005
  • Data volume is not the main bottleneck (yet?)
  • CPU/IO contention, isolation and adequate
    resources for development / optimization
    consultancy are more visible constraints right
    now
  • 3D kick-off workshop planned for 13-15 December
    at CERN
  • 3D project plan based on experiment input in
    January
  • Database developer workshop end of January
Write a Comment
User Comments (0)
About PowerShow.com