Title: GridBGC Interim Review Year 2
1Grid-BGC Interim Review Year 2
Implementing an efficient supercomputer-based
grid-compute engine for end-to-end operation of a
high-resolution, high data-volume terrestrial
carbon cycle model. Project Team Project PI
Peter Thornton (NCAR) Co-PI Henry Tufo
(NCAR/CU) Staff Nathan Wilhelmi (NCAR) Craig
Hartsough (NCAR) Matthew Woitaszek (CU) Jason
Cope (CU) Collaborators Don Middleton (NCAR)
Luca Cinquini (NCAR) Rich Loft (NCAR) Beta
testers Niklaus Zimmermann (WSL,
Switzerland) Douglas Ahl (U. Wisc.) Michael White
(Utah State Univ.)
2Science objective large, gridded simulations of
carbon cycle dynamics
Daymet inputs
Grid-BGC outputs
3Grid-BGC Project Goals
- Use emerging Grid-Compute technologies to provide
a research-quality platform for terrestrial
carbon cycle modeling. - Provide a Web Portal user interface to organize
the complicated data dependencies that are
typical of very large gridded ecosystem model
implementations. - Eliminate user interaction with remote
computational resources by implementing automated
job execution. - Provide automated data streaming for model input
and output datasets between the Portal, remote
computational resources, and a remote mass
storage facility. - Provide robust analysis and visualization tools
through the Portal. - Demonstrate end-to-end functionality with a
research-quality application (U.S. 1 km gridded
simulations, targeting NACP). - Focus on the needs of real researchers, through
multiple iterations of platform development and
beta-testing.
4 Grid-BGC Design and Data Flow
5Year 2 Progress Review (by Quarter)
- First Quarter
- Prototype 1 of the User Interface completed (Nov
04) - UCAR Gatekeeper authentication
- User input of site information, conversion to
netCDF - User input of projection information and Daymet
model parameters - Database management of user profiles and project
interdependencies - Design of new security certification method,
using Globus Toolkits MyProxy proxy delegation
service. - Benchmarking Biome-BGC code on multiple
architectures. - Continue migration of core science code to
netCDF I/O.
6Year 2 Progress Review (by Quarter)
- Second Quarter
- Continued migration of core science code to
netCDF I/O - Prototype 1 of User Interface underwent internal
beta testing - Improved support for user data upload
- Added documentation
- Passed off to external beta-testers (at U. Wisc,
Utah State Univ., and WSL Switzerland). - Portal Prototype 2 development underway
- Implemented new database schema to support
Daymet functionality - Implemented development and release
configurations to support ongoing development and
beta-tester usage. - Begin configuration of Globus MyProxy on
Dataportal - Initial configuration of GLOBUS COG/ Bouncy
Castle certificate authority. - Continued discussion and design for integration
of Portal with Job Services component.
7Year 2 Progress Review (by Quarter)
- Third Quarter
- Completed migration of Daymet I/O to netCDF.
Begin implementation of Biome-BGC ingest of
Daymet netCDF output. - Continued development of Portal Prototype 2
- Continued implementation of new database schema
to support Daymet functionality. The database
component of Prototype 1 is functional, but not
as robust as is required. - Continued configuration of Globus MyProxy on
Dataportal, following recommendations from the
annual review. - Continued configuration of Globus CoG Toolkit and
Bouncy Castle certificate authority. - Continued design for integration of Portal with
Job Services component. - Began design for automated tiling support,
scheduled for completion in July 05. - Began scoping inclusion of Columbia as new
resource.
8Year 2 Progress Review (by Quarter)
- Fourth Quarter
- Completed Biome-BGC ingest of Daymet output in
netCDF. - Completed development of Portal Prototype 2
- Completed implementation of new database schema
to support Daymet functionality. - Completed configuration of Globus MyProxy on
Dataportal, following recommendations from the
annual review. - Completed configuration of Globus CoG Toolkit and
Bouncy Castle certificate authority.Implemented
new database schema to support Daymet
functionality. - Completed design for integration of Portal with
Job Services component. - Completed design and implementation for automated
tiling support. - Successful end-to-end testing of Daymet
functionality, with minimal requirements met for
chaining to Biome-BGC. - Modified Portal based on beta-tester experience
with early release of Prototype 2. - Began design of user output specification and
visualization components.
9Updated Project Schedule and Milestones
10Review of Current Project Status
- Topics detailed in the following slides
- Grid-BGC Portal (User Interface)
- Grid Services and Job Management
- Schedule Status
- TRL update
- Year 3 Planning and Milestone Schedule
- Budget Status
11Portal Prototype 1
- Completed shortly after 1st annual review.
- Implemented the overall user interface structure.
- Implemented the user interface functionality for
managing Daymet simulations. - Primarily intended to test the usability and flow
of the system.
12Portal Prototype 2
- Major internal revision of Prototype 1
- Implemented User Interface support for new
GridBGC Job Processing. - Implemented with several open source application
frameworks. - These frameworks greatly reduced the amount of
application infrastructure code required. - Focus on application development, not
infrastructure development. - Spring J2EE Application framework.
- Hibernate ORM toolkit.
13Portal Prototype 2 Cont
- Integration with latest Globus Toolkit components
- GT 4
- Java Cog Kit
- Implemented full user workflow, from user login
to retrieving simulation results. - Automated management of
- User certificates
- User proxies
- Job submission and management
- Data Storage
- New SAN storage was brought online for
Dataportal. - GridBGC has a 4 TB partition allocated for use.
- Partition is online and currently in use.
- All file storage is accessible through standard
GridFTP.
14Portal Overall Workflow
15Portal Data Management Workflow
- User provides data inputs in simple formats.
- Ascii text data files
- Simple GIS formated input grids.
- Some simple data are entered manually into the
portal. - Portal transforms data inputs into the required
file formats for model input (NetCDF files). - Portal handles all preprocessing of data prior to
job submission. Geographic tiling, spatial data
flags, etc. - Portal manages the creation and submission of all
tile-specific data packages for computation on
remote resources. - The portal manages the resulting data output
files for user download or for use as further
input into the BiomeBGC model.
16Portal Job Submission Workflow
17Daymet data and program flow
slope
aspect
tmax
tmin
DEM
mask
whoriz
ehoriz
avghoriz
prcp
gridded data
station data
predict_srad_vp
fill_tair fill_prcp
tmax
prcp
tmin
srad
vp
binary files
predict_tair predict_prcp
binary_to_netcdf
interpolate
tmin
srad
tmax
prcp
vp
id
ct
wt
netcdf files
intermediate data
18Grid Services and Job Management
- Towards a Service Oriented Architecture for
Grid-BGC
19Grid Service and Job Management Prototype
- Grid-BGC is the first production quality
computational grid developed by NCAR and CU - Prototype development began January 2004 and
ended September 2004 - Prototype was more difficult than we anticipated
- We found that the middleware and development
tools for GT 3.2 were not production grade for
our environment, but were rapidly improving - Distributed, heterogeneous, and asynchronous
system development and debugging are difficult
tasks - Feedback from reviewers was helpful in
redesigning the prototype
20Grid-BGC Prototype System Architecture
21Analysis of the Prototype
- What we did well
- Proof of concept grid environment was a success
- Portal, grid services, and automation tools
created a simplified user environment - Fault tolerant job execution
- What we needed to improve
- Modularize the monolithic architecture
- Break out functionality
- Move towards a service oriented architecture
- Re-evaluate data management policies and tools
- GridFTP and Reliable File Transfer (RFT) vs.
DataMover - Misuse of NCAR MSS as temporary storage platform
- Use the appropriate Globus compliant tools
- Globus components can be used in place of
third-party or in-house tools - More recent releases of GT have improved the
quality and usability of the components and
documentation
22Moving to Globus Toolkit 4.0
- Improvements in Grid middleware were immediately
useful in our environment - MyProxy officially supported
- WS GRAM meets our expectations
- RFT and GridFTP perform reliable file transfers
- Rewriting Grid-BGC service was easier than
expected - Modular design limited the amount of changes
needed - Improved documentation of GT components
23Grid-BGC Production Architecture Overview
- Production architecture maintains prototype
goals - Ease of use
- Efficient and productive
- Additionally, production architecture addresses
- Modular design
- GT4 Compliance
- Restructuring execution and data management
24Grid-BGC Production Architecture Overview
25Service Oriented Architecture
- Grid middleware resource-oriented architecture
- RFT file transfers require a full source and full
destination path - GRAM jobs require paths to a working directory
and executables - This model requires all clients to have full
knowledge of the configuration of every client
cluster - What do you really need to run the model?
- Location of input data
- Model execution parameters
- Desired location of output data
- Service Oriented Architecture
- Provide a service that, given the mandatory model
input parameters, knows how to run the model on
the local system - Do not expose system-specific details to the
client users
26Service Oriented Architecture
- The client transmits a model run request
- Model parameters
- Location of input and output files on Grid
Client
Model simulation request
- Grid service knows how to run the software
- Client-provided data are combined with system
environment to produce all details - Service creates RFT/GridFTP transfer requests and
a GRAM job request
Model Grid Service
Workflow Manager
- Grid middleware used to perform tasks
- RFT invoked on behalf of original client to
transfer files - GRAM invoked on behalf of original client to run
computational job
Complete GRAM and RFT requests
Grid Services
27Grid-BGC Service Oriented Architecture
- Clusters running the Grid-BGC service can provide
the Daymet and Biome-BGC models - All system-specific configuration is contained in
the Grid-BGC Service - Custom workflow management software provides
additional fault tolerance and execution control - Default Globus tools used for execution and file
transfer
28Multipurpose Service Oriented Architecture
- This service oriented architecture works for
models with similar characteristics - Multiple servers can advertise the availability
of different models - All model details are contained in the Grid
Service - Actual model is transparent to the custom
workflow manager the workflow manager has no
concept of BGC or Daymet - We have run BGC and Daymet, as well as another
NCAR model
29Tested Functionality
NCAR Portal (dataportal)
Portal
Self Test
Client
Client
Self Test
Self Test
Client
Client
Service
Service
Workflow
Workflow
GRAM
RFT
RFT
GRAM
Model
Model
CU Cluster (hemisphere)
CU Test System (toaster)
30Grid Service Conclusions
- Grid Service design objectives
- Service oriented architecture
- Modular design
- Use only default Grid Toolkit 4.0 components
- Future Work
- Capacity testing
- Job throttling
- Failover and additional fault tolerance
- Workflow parallelism
- Resource advertising through MDS
31Project Schedule Status
- Still following the project schedule presented at
the Year 1 Annual Review, with the following
exception - Prototype 2 release has end-to-end functionality
for Daymet only, not Daymet and Biome-BGC. - Same level of functionality for Biome-BGC to be
completed in Year 3, First Quarter. - All the Portal and Grid Service technology
components for Biome-BGC are inherited from
Daymet implementation.
32Updated Project Schedule and Milestones
33TRL Update end of Year 2
End of Year 1 TRL 4 All system components had
been subjected to stand-alone prototype
implementation and testing, with a focus on
integration of technology components. End of Year
2 TRL 5 We have completed thorough prototype
testing in a representative environment
(end-to-end with Daymet). All basic technology
elements integrated, with reasonably realistic
supporting elements. Implementation conforms to
target environment/interfaces. with the
exception of visualization technology only at
design stage.
34Year 3 Planning and Milestone Schedule
- First Quarter
- Complete Biome-BGC implementation for
Portal/Grid Service Prototype 3. - Complete design for input/output visualization
- Begin end-to-end testing on full-scale problem
domain 1km North America - Second Quarter
- Continue end-to-end testing on North America
domain - Begin implementing input/output visualization
design - Begin producing final system documentation
- Third Quarter
- Complete end-to-end testing on North America
domain. Begin preparing manuscript describing
this application. - Complete implementation of input/output
visualization design - Continue producing system documentation
- Pass final prototype, including visualization
components, to beta-testers
35Year 3 Planning Cont.
- Fourth Quarter
- Receive final guidance from beta-testers
- Finalize system development
- Complete final system documentation
- Submit North America application manuscript
- Prepare and submit manuscript describing the
Grid-BGC system (Eos). - Final reporting
- If all milestones accomplished, will exit at TRL
6.
36Budget Status (Years 1 and 2)
37End
38A Niche in the Workflow Field (Extra Slide)
- Your workflow only does stage-in, execution, and
stage-out. Why arent you just using GRAM as-is? - Running GRAM directly requires complete a priori
knowledge of the target computational
environment, including - Full path to working directory
- Full path to executable
- Our Grid Service converts a Grid-based model
simulation request to a Grid-based GRAM execution
and RFT transfer request - Why arent you using a third-party XML-based
comprehensive workflow manager? - Similarly, these systems simply sequence
primitive GRAM and RFT invocations, again
requiring full knowledge of the target
environment - These systems add extra configuration complexity
our solution works using only default GT
components with little custom code