Response to the PPRP - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Response to the PPRP

Description:

This is not a new initiative where the scope and scale are more elastic. ... The testing and performance monitoring work associated with the Work ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 48
Provided by: roland56
Category:
Tags: pprp | response | testing

less

Transcript and Presenter's Notes

Title: Response to the PPRP


1
Response to the PPRP
  • David Britton 8/Nov/06

2
Introduction
GridPP has addressed the 10 questions received
from the PPRP in the document http//www.gridpp.ac
.uk/docs/gridpp3/pprp/GridPPResponseToPPRP_FINAL.d
oc This presentation will go through those
responses. In addition, GridPP has addressed 120
comments and questions from 7 referees in the
documents http//www.gridpp.ac.uk/docs/gridpp3/pp
rp/referee_response_1_2.doc http//www.gridpp.ac.u
k/docs/gridpp3/pprp/referee_response_3_4_5_6_7.doc
The PPRP has also been presented with a number
of related documents detailing the input to the
PPRP questions from the three large LHC
Experiments and various background documents
addressing individual areas. All this information
is collected on the web page http//www.gridpp.ac
.uk/docs/gridpp3/pprp/
3
PPRP Question-1
1. The Panel would like to further understand
the advantages of the proposed overarching GridPP
model for operations (as opposed to development)
as against each experiment making its own
arrangements.
A The GridPP identity enables a unified and
coordinated voice for the UK community that
raises our profile, strengthens our negotiating
power increases our influence and enables better
communication. B Cross-experiment support the
middleware stack is presently divided into
lower-level middleware that is part of the gLite
release and higher level middleware that is
provided by the experiments. The common goal is
to continue to move middleware from the
experiment specific to the generic level. Thus,
future support for the middleware stack must
follow this transition and is part of the
overarching GridPP model. C The Tier centre
structure has been set up by and through the
GridPP project. The Tier-2 MOUs between GridPP
and the Institutes, establish a uniform
responsibility and the critical relationship
between the Tier-1 and the Tier-2s is carefully
supervised through the deployment team. An
overarching project is more likely to succeed in
nurturing these structures to optimise the UK
Grid for Particle Physics.
4
PPRP Question-1
1. The Panel would like to further understand
the advantages of the proposed overarching GridPP
model for operations (as opposed to development)
as against each experiment making its own
arrangements.
D The GridPP Deployment Team The deployment of
LCG releases will be better implemented by a
coordinated deployment team managed by a common
project. E Without an overarching project, there
is a risk that the UK Particle Physics Grid would
fragment into a set of experiment-specific
resource clusters which would completely
undermine the advantages that predicated the
decision to take the Grid approach that has been
the basis for investment over the last 5
years. In addition, statements have been
received (and presented in full to the PPRP)
from the three large LHC experiment which - All
strongly support the concept. - Propose no
alternative.
5
PPRP Question-2
  • 2. The Panel would like to explore the priorities
    and potential options for descope.
  • If funding were only available to support, 30,
    50 or 70 of the total request what would be the
    priority areas for investment in terms of
    obtaining the best UK science return?
  • What would be the political and experimental
    impacts of funding at a much lower level?
  • How would you prioritise the work packages?

6
PPRP Question-2 PREAMBLE
  • 0) Computing is an integral part of the LHC
    project.
  • GridPP3 is the continuation of a project with a
    previously defined scope. This is not a new
    initiative where the scope and scale are more
    elastic. WE HAVE A WELL DEFINED (FIXED) TASK TO
    PERFORM.
  • The scope of the project was described in the
    PPARC call The original proposal required a
    careful evaluation of the minimum requirements
    consistent with meeting the PPARC call. In
    particular it did not provide the UK with
    additional capacity that might give a competitive
    edge.
  • Hardware is the biggest item. As you de-scope and
    reduce hardware you keep all the data and service
    tasks but throw away the ability to do any
    physics.
  • We are embedded in an international context and
    have been for 5 years. Can not sensibly move
    away from the LCG model of middleware, operations
    and support. The levels of service expected are
    agreed in the MOU signed by PPARC.
  • As in any international collaboration, (e.g. the
    detectors) there are elements of service work
    that need to be contributed in a broadly pro-rata
    manner.
  • BOTTOM LINE Enormously difficult to de-scope a
    project that is well underway with
  • well defined responsibilities. We basically start
    to fail as we de-scope.

7
Input to Scenario Planning - Resources
Changes in the LHC schedule have prompted another
round of resource planning. New global resource
requirements presented to CRRB (Oct 24th) from
which new UK resource requirements have been
derived and incorporated in the scenario
planning. Hardware prices have been re-examined
following recent Tier-1 purchase (CPU was cheaper
than expected). We have adjusted (lower) our
best empirical estimate of future prices but
have also declared a contingency on hardware
spend of 25 (up from 15) over the lifetime of
the project. Combination of the above result in
a 9 savings on the project cost.
8
Input to Scenario Planning - ATLAS
The priority of the ATLAS-UK collaboration to
ensure the best science return is the hardware
and its operation. Within this, ATLAS notes
that UK Tier-2 resources contribute directly to
the UK output, whereas shortages in Tier-1
resources affect all ATLAS physicists globally.
For Tier-1 resources, ATLAS regard the 15
hardware reduction proposed in the 70 scenario
as barely manageable the 50 scenario would do
serious damage to the analysis capacity for the
large UK physics community and it would also
threaten the calibration and commissioning of the
SCT. To reduce the Tier-2 hardware, cuts would
have to be made in simulation, calibration, and
then analysis capability but even the first of
these will degrade physics output. Tier-2 cannot
be cut below the 70 scenario. ATLAS has derived
the UK fraction of the global requirements by
noting that UK authorship is 12.5 (now 13.9) of
the Global ATLAS Tier-1 authorship and that there
are 4 out of 30 (13.3) of ATLAS Tier-2s are in
the UK.
9
Input to Scenario Planning - CMS
The priority of the CMS-UK collaboration is
access to Tier-2 resources in the UK and access
to Tier-1 resources preferably in the UK. CMS
argue that the 70 scenario, achieved with a 15
reduction in the requested hardware, would be at
the threshold for CMS to host a UK Tier-1. In the
50 scenario, the priority for CMS would be to
protect their Tier-2 resources which would have
to be hosted by a Tier-1 external to the UK. The
revised CMS UK hardware request is based on a
more detailed algorithm than a simple fraction of
the global requirements. The scale is set by
dual requirements of (a) a minimum size for a CMS
Tier-1 of 50 of average CMS Tier-1 (7 of
global requirements) and (b) the UK fraction of
Tier-1 authors (same bases at ATLAS) of 8. The
details are calculated from the dual requirements
to accept 4 out of CMSs 50 data-streams (8) and
the need for the Tier-1 to serve an entire AOD
dataset.
10
Input to Scenario Planning - LHCb
The LHCb collaboration has a somewhat different
computing model from ATLAS and CMS with most
analysis performed at the Tier-1 and the Tier-2
used predominantly for Monte Carlo simulation.
LHCb prioritizes Tier-1 hardware and its
operation, followed by Tier-2 hardware and its
operation and finally support etc. The revised
hardware requests from UK LHCb are based on the
new global requirements, calculated from the UK
authorship fraction of 18.6 (revised upwards
from 16.6 at the time of the GridPP3
submission). The Tier-2 resource request also
includes 18.6 of the global LHCb Tier-2 resource
shortfall of 30 to give a total of about 24 of
the global Tier-2 requirements. It is noted that
any fall below the global authorship fraction of
18.6 at either the Tier-1 or Tier-2 would have
to be negotiated in a global context.
11
70 Scenario
An example 70 scenario based on Experiment
Inputs and a bottom-up examination of all posts.
73.5
12
What has been lost in the 70 scenario? - 15
of Hardware
  • - Hardware at the Tier-1 and Tier-2 is reduced
    by 15.
  • - Contributes to a global shortfall of Tier-1
    resources for all three LHC
  • experiments.
  • If cuts applied uniformly
  • - Takes CMS to the threshold level for a UK
    Tier-1.
  • - Takes ATLAS to the threshold for holding the
    entire AOD in the UK.
  • - Reduces the LHCb UK Tier-1 resources below the
    UK authorship fraction.
  • (Un-quantified cost/consequences).
  • The reduction of hardware directly (and
    disproportionately) impacts the ability of UK
    groups to produce physics output and will be a
    competitive disadvantage.

13
What has been lost in the 70 scenario? - 7 of
Tier-1 Staff Effort
  • Staffing effort at Tier-1 in the proposal is
    barely adequate to meet MOU quality of service
    and was identified as a significant risk.
  • Staffing effort does not scale linearly with
    hardware.
  • Cuts achieved by removing 3-FTE ramp-up of
    Tier-1 staff in the GridPP2 period (designed to
    match the ramp-up of hardware) and 1-FTE during
    the GridPP3 period (probably from incident
    response team).
  • The working allowance, previously included to
    address risk of failing to meet MOU service
    levels, has also been removed.
  • Net result is a significant increase in the risk
    that the Tier-1 service levels will not be met in
    full.

14
What has been lost in the 70 scenario? - 11 of
Tier-2 Staff Effort
  • The Tier-2 staff would be reduced by 1.75 FTE out
    of 14.75.
  • This is likely to contribute to either or both
    of
  • (a) a reduction of Tier-2 resources
    levered from the institutes
  • (b) a reduction in the service level
    achieved at the Tier-2s.
  • - The working allowance, previously included
    to address risk of failing to meet MOU service
    levels, has also been removed.
  • Net result is an increase in the risk that the
    Tier-2 resource and service levels will not be
    met.

15
What has been lost in the 70 scenario? - 31 of
Support Staff Effort
The Data Management post (1 FTE) for Replica
Optimisation is not funded. This work was judged
as a good investment to optimise the use of
limited storage resources. Removing funding for
this post removes the likelihood of much greater
savings on the purchase of storage resource in
the future. A reduction in data storage support
(0.5 SY) reduces the flexibility to support
multiple storage technology in the UK. (GridPP
does not wish to support multiple storage
technologies but recognises the likely
need). Continuing support (0.5 FTE) for the
GridPP Real Time Monitor would not be funded.
The RTM is the face of the LCG/EGEE grid, is a
highly visible and acclaimed demonstration show
piece that has repeatedly illustrated the UKs
position as a major international player in this
field. A 1-FTE reduction in the support for the
R-GMA information and monitoring system. This
major UK contribution is deeply embedded in the
EGEE/LCG stack. Any reduction in effort must be
carefully planned in conjunction with our
partners to try to minimise disruption globally.
16
What has been lost in the 70 scenario? - 31 of
Support Staff Effort
  • The Security Vulnerability work (0.5 FTE) would
    be dropped. During GridPP2
  • the UK has pro-actively taken a leading
    international role developing security
  • vulnerability policies and procedures.
  • Support for GridSite would be reduced by 0.5 FTE.
    The GridSite security toolkit
  • developed by GridPP, is embedded in the EGEE/LCG
    middleware and used as the
  • basis for the GridPP and other websites together
    with the GridSiteWiki.
  • A Networking post in the GridPP3 proposal
    designed to help network provision
  • and network monitoring would be reduced to 50.
    This reduces the network
  • support at a time when the network will be coming
    under intense stress and
  • production standards are required.
  • The loss of over 30 of the support staff means
    that UK Grid will operate less effectively (Data
    Management Storage Networking)
  • and International roles and
    responsibilities will be reduced or lost (RTM
    R-GMA Vulnerabilities GridSite).

17
What has been lost in the 70 scenario? 12 of
Operations 10 of Management 25 of Outreach.
Support for the UK Grid Operations Centre in
GridPP3 would be reduced from 3 to 2 FTE. The
current manpower is 5.5 funded by EGEE. This
increases the risk that the Grid Operations
Centre on which GridPP relies to provide Grid
monitoring, ticketing and accounting, would not
function effectively.
In the reduced scenarios the task of managing the
project is likely to be as least as difficult as
for the full proposal. Nevertheless, management
effort would be reduced primarily by not buying
out 25 FTE for the User Board Chair as currently
proposed. There is a risk that the User Board
would not be as pro-active at collecting or
presenting the Users requirements and concerns,
as desired. The 0.5 FTE requested for Industrial
Liaison would be dropped. This means that we are
unlikely to establish much industrial outreach.
18
50 Scenario
An example 50 scenario based on Experiment
Inputs and a bottom-up examination of all posts.
19
What has been lost in the 50 scenario? - 40 of
Tier-1 Hardware
40 of the Tier-1 HW will be lost. All three
LHC Experiments will need to negotiate the
consequences of providing significantly less
Tier-1 resources than their UK Author
fraction. (Un-quantified cost). The UK could no
longer host a CMS Tier-1 centre and special
arrangements would need to be made to provide UK
CMS Tier-2s, access to resources and support at a
non-UK Tier-1. (Un-quantified cost). For ATLAS
and LHCb, this level of Tier-1 resource would do
serious damage to the analysis capacity for the
large UK physics communities and for ATLAS it
would also threaten the calibration and
commissioning of the SCT.
20
What has been lost in the 50 scenario? - 30 of
Tier-2 Hardware
30 of the Tier-2 HW will be lost. The physics
output for all three experiments would be
reduced. Competitive advantage would be
completely lost. ATLAS would apply reductions
to simulation, calibration, and then analysis
capability but even the first of these will
degrade physics output. LHCb would reduce
Monte Carlo simulation, similarly compromising
physics output. As CMS sole UK resource, the
reduction would directly scale the CMS physics
output.
21
What has been lost in the 50 scenario? 22 of
Tier-1 Staff 23 of Tier-2 Staff
Tier-1 staff would be further reduced from 17 to
14 FTE. Comparing this with the current level of
13.5 FTE it is quite apparent that the Tier-1
(which would have much more hardware by that
point) could not reach the level of
service defined in the MOU signed by PPARC.
There would need to be international
negotiations as to whether the Tier-1 could
function as such, for either of the remaining
two experiments. Tier-2 staff would be further
reduced from 13 to 11 FTE. This is likely to
contribute to either or both of (a) a reduction
of Tier-2 resources levered from the institutes
(b) a reduction in the service level achieved at
the Tier-2s.
22
What has been lost in the 50 scenario? - 66
of the Support Staff lost.
The support post for generic metadata issues
would be lost and all support would have to be
via the experiments. Support for grid storage
technologies would be reduced from 7 SY to 2SY
over the project. This would (probably) be
limited to Castor support at CCLRC. Institutes
would need to look elsewhere for support on the
technologies likely to be deployed therein.
The portal work would be stopped leaving the
smaller or future experiments with a higher
hurdle to getting on the Grid. The testing and
performance monitoring work associated with the
Work Load Management system would stop. This is
an area where there is strong European pressure
to continue and is of potentially direct benefit
to UK physics by providing knowledge about the
current condition of the Grid on a site-by-site
basis.
23
What has been lost in the 50 scenario? - 66
of the Support Staff lost.
Support for information and monitoring systems
would be reduced to 1FTE. (R-GMA could not be
supported and negotiations with our international
partners would have to determine how best to use
this post to help the transition to whatever new
system evolved internationally). Security
support would be reduced to 1 FTE. This would be
split as deemed appropriate at the time between
VOMS support and Operational Security. The
support for GridSite (an international
obligation) would be dropped. Again, this would
have to involve discussion with international
partners since the LCG/EGEE middleware stack
would be at risk. The networking support post
for monitoring and provision would be lost. This
would be in a regime where the need for Network
support has become more critical with at least
one of the major experiments attempting to use a
non-UK Tier-1.
24
What has been lost in the 50 scenario? - 30
of the Operations Staff 25 of Management all
dissemination (except GridPP2 period).
Support for the Grid Operations Centre would be
further reduced from 2 to 1.5 FTE, further
increasing the risk that the GOC on which GridPP
relies to provide Grid monitoring, ticketing and
accounting, would not function effectively. One
of the four Tier-2 coordinators would be lost.
This increases the risk of failure of part of
the Tier-2 organisation reduces the deployment
team and increase the likelihood that delays to
upgrades at some sites will reduce the available
resources with a direct impact on physics
output. Management would be further reduced
(this would have to be optimised). There is a
risk that the management becomes less engaged and
therefore less effective. All dissemination and
outreach activities would be stopped after the
GridPP2 phase is complete.
25
30 Scenario
GridPP has examined the original PPARC call and
has determined that it is unable to form a
proposal that meets any of the criteria listed
with funding at the 30 level 2. a) Underpin
the particle physics programme by delivering the
functional Tier 1 centre for the LHC experiments
and for the other experiments where UK groups
will require computing GRID access and
facilities. The 50 scenario presented above
already fails to meet this criterion because the
Tier-1 would be sub-threshold for at least one of
the LHC experiments. At the 30 funding level
there could only be a Tier-1 for (probably) one
LHC experiment. Most likely, in a 30 scenario
there would be no Tier-1 and the resources would
be used as a Tier-2 (though it is not clear what
to do about LHCb). Etc.. (see document)
26
Question-2 Summary
GridPP has taken input from the 3 large LHC
experiments as guidance in an attempt to design a
GridPP3 project in 70 and 50 funding scenarios.
The outcome is a 74 funding scenario that
preserves 85 of the hardware (the threshold for
a UK CMS Tier-1) but is likely to result in a
failure to meet service levels inadequate
support across the UK in many areas and the
elimination of much of the UK obligation to the
international effort that directly and indirectly
benefits UK physicists. A 55 scenario is
provided that doesnt work It does not respect
the criteria of the call and there are large
political and financial unknowns associated with
delivering less than a pro-rata share of LHC
hardware so the real cost cannot be provided. The
UK Grid would not function at the required level
of service and support for UK users would be
completely inadequate. We do not regard the
fine details of these scenarios as fixed but they
are offered as examples of our approach to, and
the consequences of, funding below 90 of the
original proposal.
27
Risks
  • GridPP believes that the risks introduced by the
    74 scenario are very large and
  • urges the PPRP to consider an outcome closer to
    90.
  • In the 90 scenario, all the Risks defined in the
    GridPP3 proposal still apply except that there is
    an increased risk that hardware is more costly
    than planned.
  • In the 74 scenario, there is an additional risk
    that the level of hardware provision for all 3
    experiments will compromise the physics output.
    For LHCb there are unknown consequences at
    providing hardware below the authorship fraction
    level.
  • In the 74 scenario, service levels signed up to
    by PPARC at the Tier-1 and Tier-2 are severely at
    risk.
  • In the 74 scenario, support for middleware in
    the UK will be inadequate. There is a risk that
    this will seriously undermine physics output.
  • In the 74 scenario, most middleware
    contributions by the UK to the international
    effort will be dropped. This puts the whole Grid
    at risk and damages the UK reputation and
    influence.

28
PPRP Question-3
3. The UK would like to play a key role in this
important project but the current financial
constraints necessitate focusing on the crucial
areas and what needs to be done. The Panel would
like to identify these areas, giving
consideration to the current LHC timescale, and
to understand the implications of delaying parts
of the project, especially with regard to
hardware (e.g. same CPU performance with fewer,
fast processors).
Identifying crucial areas is covered the Scenario
Planning presented in response to Question-2 and
by each of the responses to GridPP from the three
large LHC experiments. The new LHC timescale has
been included in the new resource requirements
prepared by the LHC experiments and presented to
the CRRB on October 24th 2006. These new global
requirements have been used to derive new UK
requirements, as described in the response to
Question-2 and in the experiment documents. The
resource requirements are effectively shifted
which, combined with reduced hardware cost
estimates used by GridPP, have resulted in about
a 9 saving on the project cost. This is embedded
in the 70 and 50 scenario plans.
29
PPRP Question-4
  • PART-1
  • The Panel wishes to understand better the
    apparent disparity between the estimated Tier-1
    needs of CMS and ATLAS. It seems that ATLAS
    requires roughly twice the CPU and disk resource,
    but less tape than CMS. Given the similar
    computing models between the two experiments,
    relatively small differences in the parameters
    chosen seem to have significant implications on
    the assessment of need and hence cost.
  • PART-2
  • How has GridPP interacted with the experiments to
    ensure that the most cost effective solution has
    been arrived at?
  • PART-3
  • The Panel wishes to understand the levels of
    requests for tier-1 facilities by the different
    experiments relative to the UK contribution to
    the each experiment.

Part-2
GridPP relies on the careful scrutiny and
rigorous peer review of the computing models and
global resource levels by the LHCC and the CRRB
to ensure that the most cost effective solution
has been achieved.
30
PPRP Question-4
PART-3
ATLAS has derived the UK fraction of the global
Tier-1 requirements by noting that UK authorship
is 12.5 (now 13.9) of the Global ATLAS Tier-1
authorship. CMS has derived their UK Tier-1
hardware request based on a more detailed
algorithm than a simple fraction of the global
requirements. The scale is set by dual
requirements of (a) a minimum size for a CMS
Tier-1 of 50 of average CMS Tier-1 (7 of
global requirements) and (b) the UK fraction of
Tier-1 authors (same bases at ATLAS) of 8. The
details are calculated from the dual requirements
to accept 4 out of CMSs 50 data-streams (8) and
the need for the Tier-1 to serve an entire AOD
dataset. This latter requirement results in a
slightly large fractional requirement at the
Tier-1 in early years which then reduces to 8
in the steady state. LHCb has derived the UK
fraction of Tier-1 resources the UK authorship
fraction of 18.6 (revised from16.6 at the time
of the GridPP3 submission).
31
PPRP Question-4
PART-1
Latest round of resource review has led to
convergence of the models. In particular, CMS has
increased trigger rate (now similar to ATLAS)
during early years to acquire more calibration
and standard-model physics data. Event sizes,
data rates, processing times, and replication
strategy have evolved to become significantly
closer. Remaining difference is the strategy for
data storage and replication ATLAS --- 2 copies
of the ESD data distributed over all Tier-1
centres plus a cumulative AOD sample spanning
multiple years, all on disk. CMS --- 1 copy of
RECO (ESD) is stored over all Tier-1 centres, in
addition to CERN, and only a single years AOD is
stored on disk at Tier-1s (previous years are
accessible from tape). This leads to a smaller
Tier-1 disk requirements from CMS, but higher
requirements on tape infrastructure, bandwidth
and storage. These are different optimisations
that will probably converge as experience is
gained.
32
PPRP Question-5
5. The Panel would like the applicants to justify
the rationale behind the proposed regional Tier-2
structure in GridPP3 and to set out the pros and
cons of other possible structures, for example,
experiment based or rationalised structure with
fewer Tier-2 sites, or fewer institutes. The
Panel would like the applicants to consider
possible cost savings and improvements in
efficiency and service delivery that different
structures might produce.
Need to discuss The Past, The Present, and The
Future. The underlying message is that, the
proposed system is the logical development of
the current structure which works well and, in
turn, was developed for good reasons. We see much
much bigger risks to performance in breaking the
current structure than in keeping it.
33
PPRP Question-5
History of the Tier-2 Structure The current
Tier-2s were formed naturally in response to
local and regional funding opportunities and
other geo-political considerations. Many assumed
(used as leverage) a continuing relationship
with the Particle Physics community. It is
natural that all Particle Physics groups wished
to be associated to a T2, but this was not a
GridPP requirement. However, clearly it was
uniformly perceived as beneficial for the local
physicists and the institute. In GridPP1 there
was no PPARC funding for Tier-2s and in GridPP2
there was PPARC funding for some manpower at
Tier-2s (plus some specialised servers) but not
for the bulk of the computing resources.
Nevertheless large amounts of resources were
made available. GridPP has interacted with
four Tier-2 centres through their management
boards. The overhead of having more than one
site within the Tier-2 is, to first order, an
internal choice (the JeS submission requirement
for the GridPP3 proposal broke this model).
34
PPRP Question-5
  • Current Status of Tier-2 Structure
  • There are currently 17 Institutes organised into
    4 Distributed Tier-2s. Of the 17
  • Institutes, 4 have no GridPP manpower, 8 have
    less than one FTE and 5 have one or
  • more FTEs of GridPP manpower. The total of 9 FTE
    funded by GridPP for hardware
  • support (plus 5.5 FTE specialist posts) is
    clearly is a very cost effective situation
  • given the 3703 KSI2K of CPU and 263 TB of disk
    available (06Q1 numbers). For
  • comparison, the Tier-1 had 13.5 GridPP-funded FTE
    and made available 830 KSI2K
  • and 180 TB in the same period.
  • Performance measures are being developed (within
    GridPP and wLCG). The UK is
  • probably ahead of the game here. There are more
    details in the written response
  • but the UK Tier-2 performance is
  • good relative to other counties
  • improving even though the hurdles are getting
    higher
  • on track to meet the MOU requirements.

35
PPRP Question-5
  • Future of Tier-2 Structure
  • GridPP proposes to continue to develop 4 Regional
    Tier-2 centres.
  • GridPP would like to remain neutral on the number
    of sites and institutions within
  • each Tier-2, and simply offer a packaged of
    hardware money and effort to each
  • Tier-2 in return for the delivery of a specified
    quantity of resource and a specified
  • service level. We believe this approach
  • Allows a market-driven optimisation of resources
    according to constraints which are outside the
    control and knowledge of GridPP (e.g. Other
    sources of funding Institutional priorities and
    strategies prior commitments and aspirations.)
  • Builds upon a system that is both viewed and
    measured as successful.
  • Is in the best interests of Physicists at all
    Institutes allowing some small measure of local
    control whilst enabling Grid access to vast
    resources and providing on-site expertise in as
    many places as possible.

36
PPRP Question-5
  • Future of Tier-2 Structure
  • Alternate structures have been considered
  • Fewer Tier-2s foresee no advantage in having
    the same number of institutes associated with
    fewer Tier-2s. Clear disadvantages.
  • Fewer Institutes Hardware and manpower costs
    remain the same running and infrastructure costs
    likely to become more visible. Some gains in the
    efficiency of staff effort by concentration of
    resources (though this means less levered effort,
    not less GridPP effort service level may be
    easier to achieve). May alienate some
    institutes will result in less leverage of
    resources will leave some institutes without
    local expertise. Conclude It will cost more
    deliver less resources service level might be
    better but physicists less supported. Not the
    optimisation we chose.
  • Experiment-based Tier-2s runs against the grain
    and would leave the UK at odds with the rest of
    the wLCG not a sensible Grid structure and would
    limit peak resources available to individual
    Experiments. Would most likely lead to a
    divergence from standards and a fragmented UK
    Grid.

37
PPRP Question-6
6. The Panel would like to explore the impact to
the UK of leadership roles within LCG. What are
the benefits and costs to the UK of this,
particularly with regard to middleware?
The Big Picture Roles (eg Leadership) and
duties (eg Middleware support) for the LCG
project must be shared between the members. This
allows the common project to benefit from all the
available skills and expertise it provides a
contribution in kind that should broadly reflect
the size of the contributing group it
demonstrates the engagement of all partners and
in return, it enables strategic influence and
other tangible benefits. Performing duties, gives
us the credibility to take on leadership roles.
Appendix-D of the proposal listed 86 external
roles of members of GridPP within related
projects, 17 of which are specifically LCG
related, 22 are within EGEE, and a further 8
associated with computing within the LHC
Experiment collaborations.
38
PPRP Question-6
Specific Examples a) David Kelsey Coordinator
of LCG Grid Security, Chair of Joint
(LCG/EGEE/OSG) Security Policy Group and Deputy
Director of EGEE Security. b) Jeremy Coles
Secretary of LCG Grid Deployment Board. c) John
Gordon UK Representative on LCG Management
Board and a Deputy Chair. d) Neil Geddes UK
member of the LCG Oversight Board (OB) and LCG
Collaboration Board Chair. e) EGEE Project
Executive Board Frank Harris Dave Kelsey, and
previously Pete Clarke. Project Management Board
Chair Robin Middleton (to summer 06). Project
Collaboration Board Dave Colling John Gordon
Jeff Tseng Tony Doyle and Roger Barlow. EGEE
JRA1 (Middleware re-engineering) Cluster Leader
(UK) Steve Fisher.
39
PPRP Question-6
Related Examples i) Nick Brook (formerly
GridPP UB Chair and PMB member) is the LHCb
computing coordinator. ii) Roger Jones
(currently GridPP Applications Coordinator and
PMB member) is the chair of the ATLAS
International Computing Board. iii) Dave Newbold
(formerly GridPP UB chair and PMB member) is the
chair of the CMS Computing Committee. Conclude
That as a consequence of investment and hard work
over the last five years, the current overall
influence of the UK in the LHC Experiments is
very high. This ultimately benefits UK physicists
and has been a good investment.
40
PPRP Question-7
7. Before making a recommendation to the office
about the extension to GridPP2 the Panel would
like more information about each of the posts and
to know whether they are core activities. What
are the implications of not funding these posts
and what evidence is there that a delay in
resolving this will lead to a loss of staff who
might be expected to continue into GridPP3?
Detailed information on the areas covered by the
GridPP extension was provided in the GridPP3
proposal. Specific information on each individual
post was provided on the Institutional JeS forms
submitted to PPARC. All these posts are
considered core to the current programme during
the 7-month period of GridPP2 when it will be
necessary in the build-up of the Production Grid
prior to LHC data-taking. It should be noted that
funding for the applications posts was not
requested but that many of these have not been
funded on the RG leaving a serious shortage of
effort. If not funded We will lose our entire
pool of highly skilled staff the UK will not be
ready for LHC data much of the current work will
be abandoned in and large amounts of resources
will have been wasted. Evidence 25 turnover of
staff since proposal submission, c.f. 10 p.a.
previously.
41
PPRP Question-8
8. The Panel would like to see a full
justification for each of the posts requested in
GridPP3 and to see the cost to PPARC (including
estates and indirect costs) of each post.
A separate document has been provided for PPARC
staff including full details extracted from the
Institutional JeS submissions. This incorporates
a compilation of the Institute submissions
organised by work package, giving the
justification and costs for each post that should
be read in conjunction with the proposal and
relevant appendices.
42
PPRP Question-9
9. The Panel would like to explore the issues of
quality assurance in both Tier-1 and Tier-2
activities. How will the applicants ensure that
GridPP3 provides an adequate and cost-effective
service to its users?
The service levels at the Tier-1 and Tier-2 are
defined by the International Memorandum of
Understanding. The Tier-1/A Management Board,
including PPARC representation, advises all
stakeholders on whether the Tier-1/A Service at
RAL is delivering its objectives on time and
making appropriate use of its available
resources. The main instrument for assuring
quality and levels of service at the Tier-2s will
be a new Memorandum of Understanding between
GridPP and the institutes as described in the
Tier-2 Appendix to the GridPP3 Proposal. This
would set out the required levels of services in
order for the UK to meet its WLCG MoU commitments
and provide the necessary service to UK
physicists. (continued)
43
PPRP Question-9
Quality Assurance is performed by monitoring the
performance of the Tier-1 and Tier-2 compared to
MOU commitments, and the performance compared to
international partners. As previously described,
monitoring is already advanced and being
developed further. We currently monitor - CPU
and storage usage - Site functional test -
Configuration tests - Ticket response times -
Upgrade timescales - Schedule downtime - VO
support - Transfer tests.
44
PPRP Question-10
  • 10. The Panel would like information on where the
    Tier-1 centre will be housed at RAL
  • Is any construction or refurbishment of an
    appropriate building on the critical path
  • for the GridPP project?
  • - Will the centre have sufficient space available
    to meet GridPP's requirements?
  • - What are the risks associated with this?
  • - How will this be funded?

Atlas Centre at RAL has sufficient capacity to
house the full GridPP3 requirements for 2008 LHC
running as given in the proposal. CCLRC has
approved construction of a new computer building
at RAL budgeted at approx 17M and will be funded
by the CCLRC Capital Investment Plan. Completion
is due in summer of 2008 in time for the autumn
delivery which will meet the 2009 data-taking
requirements. This has sufficient space for
capacity to grow to 2012 when the number of racks
is expected to have reached a steady state.
45
PPRP Question-10
  • The main risks are
  • Late completion. There is some slack in the
    schedules to meet the data taking requirements
    for April 2009 which mitigates this risk.
  • b) Power and cooling required to deliver the
    required resources may exceed the estimates. This
    is mitigated by inclusion of chilled water mains
    in the new building to allow direct water cooling
    of the hottest racks if power densities exceed
    current estimates.
  • c) Electricity charges for power and cooling
    which are currently met by CCLRC overheads
    charges. It is possible that at some future time
    these may be attributed directly to GridPP. This
    is explicitly listed as a potential call on
    contingency in the GridPP3 proposal.

46
SUMMARY
  1. GridPP and the experiments have described the
    advantages of the proposed overarching GridPP
    model for operations.
  2. The potential options for descoping the GridPP
    project are extremely limited, but we have
    provided input on the 3 scenarios.
  3. The crucial GridPP areas have been described,
    taking the LHC experiment requirements fully into
    account.
  4. The ATLAS and CMS planned trigger rates have
    converged and the computing models are similar.
    Residual differences have been identified.
  5. The proposed Tier-2 structure and the pros and
    cons of other possible structures have been
    explored.

47
SUMMARY
  • The impact to the UK of leadership and benefit of
    middleware roles within LCG and EGEE has been
    provided.
  • GridPP has collated the required financial
    information about each of the posts in the
    GridPP2 and GridPP3 period, according to WP.
  • GridPP has collated the required post
    descriptions for each of the posts in the
    GridPP2 and GridPP3 period, according to work
    package.
  • The mechanisms that have ensured quality
    assurance at the Tier-1 and Tier-2 have been
    described and the associated costs recognised in
    order to deliver a performant Grid to end-users.
  • GridPP has indicated that the computer building
    at RAL will be funded via CCLRC, is no longer on
    the critical path, and will provide sufficient
    space. Residual risks have been identified.
Write a Comment
User Comments (0)
About PowerShow.com