Mike Folk, Elena Pourmal , Bob McGrath - PowerPoint PPT Presentation

About This Presentation
Title:

Mike Folk, Elena Pourmal , Bob McGrath

Description:

- 1 - HDF. Mike Folk, Elena Pourmal , Bob McGrath ... Wintel. Cray. Architecture in context. Low level Interface. HDF5 Applications. Programming Interface ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 50
Provided by: mike193
Learn more at: http://hdfeos.org
Category:
Tags: mcgrath | bob | elena | folk | mike | pourmal | wintel

less

Transcript and Presenter's Notes

Title: Mike Folk, Elena Pourmal , Bob McGrath


1
HDF Software ProcessLessons Learned Success
Factors
  • Mike Folk, Elena Pourmal , Bob McGrath
  • National Center for Supercomputing Applications
  • University of Illinois at Urbana-Champaign
  • NOBUGS 2004HDF-EOS Workshop VIII

2
Outline
  • What is HDF? and Who is HDF?
  • HDF Architecture
  • Some statistics
  • How do we measure success?
  • How can we achieve success?
  • Group practices
  • Summing up strengths, weaknesses, needs

3
What is HDF?Who is HDF?
4
HDF in a nutshell what it is
  • File format and I/O Libraries for storing,
    managing and archiving large complex scientific
    and other data
  • Tools and utilities
  • Open source, free for any use (U of I license)
  • Well maintained and supported
  • From HDF group, NCSA Univ of Illinois
  • http//hdf.ncsa.uiuc.edu

5
HDF in a nutshell - features
  • General
  • simple and flexible data model
  • Flexible
  • store data of diverse origins, sizes, types
  • supports complex data structures and types
  • Portable
  • available for many operating systems and machines
  • Scalable
  • works in high end computing environments
  • accommodates date of any size or multiplicity
  • Efficient
  • fast access, including parallel i/o
  • Stores big data efficiently

6
HDF in a nutshell - users
  • Apps in industry, academia, government
  • More than 200 distinct applications
  • Large user base
  • E.g. NASA estimates 1.6 million users
  • Underlying format for community standards
  • E.g. HDF-EOS, SAF, CGNS, NPOESS, NeXus

7
Example of HDF file mixing and grouping objects
Text This file was create as a part of
see http//hdf.ncsa.uiuc.edu
foo
a
z
1GB
lat lon temp -------------- 12 23
3.1 15 24 4.2 17 21 3.6
c
b
x
_foo_y
Table
Raster image
Raster image
2-D array
8
HDF Architecture
9
HDF Architecture
Utilities and applications for managing,
manipulating, viewing, analyzing data.
Tools Applications
  • HDF I/O library
  • High-level, object-specific APIs.
  • Low-level API for I/O to files, etc.

HDF5 Applications Programming Interface
Low level Interface
File
File or other data source
10
Users controlled I/O and storage
  • Data pipeline
  • Data transformation
  • Compression
  • Encryption
  • Storage layout
  • Virtual file options
  • Stdio (normal file)
  • Split file
  • MPI-IO other parallel
  • Network
  • Memory
  • custom

HDF I/O Library
HDF File
11
Supported languages and compilers
  • C
  • Wrappers
  • C
  • Fortran90
  • Java
  • Vendors compilers (SUN, IBM, HP, etc.)
  • PGI and Absoft (Fortran)
  • GNU C (e.g. gcc 3.3.2)

12
Supported Machines and OS
  • Solaris 2.7, 2.8 (32/64-bit)
  • IRIX6.5 IRIX64-6.5
  • HPUX 11.00
  • AIX 5.1 (32/64-bit modes)
  • OSF1
  • FreeBSD
  • Linux (SuSe, RH8, RH9) including 64-bit
  • Altix (SGI Linux)
  • IA-32 and IA-64
  • Windows 2000, XP
  • MAC OS X
  • Crays (T3E, SV1, T90IEEE)
  • DOE National Labs machines
  • Linux Clusters

13
Architecture in context
Tools Applications
HDF5 Applications Programming Interface
Low level Interface
File
14
Architecture in context
Tools Applications
HDF5 Applications Programming Interface
Low level Interface
File
15
Architecture in context
Tools Applications
HDF-EOS
SAF
CGNS
HDF5 Applications Programming Interface
Low level Interface
File
16
The testing challenge
Machines operating systems compilers
languages serial and parallel compression
options configuration options virtual file
options backward compatibility a large number
17
Diversity makes our code better
  • Todd Smith, Geospiza

18
Some statistics
19
HDF Statistics
  • HDF Group
  • 15 FTE 3-5 students
  • 2.1million annual budget
  • HDF5 source code distribution
  • 2073 files
  • 917,186 Lines of code
  • HDF Project
  • HDF5, HDF4, H4toH5, H5Lite, Java
  • 3,000,000 lines of code (estimate)

20
HDF5 source distribution by categories (lines of
code)
21
HDF5 staff investment
22
How do we measure success?
23
How do we measure success?
  • Mission
  • Goals and objectives
  • Strong and continuing relationships with users
  • High quality software
  • Strong committed development team
  • Great working environment
  • Adequate funding

24
Mission, goals and objectives
  • Mission
  • To develop, promote, deploy, and support open and
    free technologies that facilitate scientific data
    exchange, access, analysis, archiving and
    discovery
  • Goals (examples)
  • Innovate and evolve the technologies in concert
    with a changing world of technologies
  • Maintain a high level of quality and reliability
  • Collaborate and build communities
  • Build a team

25
Mission, goals and objectives
  • Objectives - how we reach the goal
  • Example
  • Goal
  • Maintain a high level of quality and reliability
  • Objectives
  • Improve testing
  • Implement a program to insure excellent software
    engineering practices
  • Develop and execute a plan to meet
    quality/reliability standards

26
Users
  • Number of users
  • Happy users ?
  • Unhappy users ?
  • Users achieve their goals by using HDF
    technologies
  • Users coming back with new needs
  • Financial support from users

27
Software
  • Technology that addresses users needs and
    demands (current and future)
  • E.g. big files, parallel access, multiple objects
  • Usability
  • Number and types of applications
  • Appropriate APIs and data models
  • Available tools
  • Interoperability with other software
  • E.g. IDL, MatLab, Mathematica

28
Software
  • Stability
  • Can data be shared?
  • Can software run on needed platforms
  • Sustainability
  • Can read data written 15 years ago on obsolete
    platform
  • Is software available in 15 years?
  • Acceptability
  • De facto standard
  • Open standard for exchange of remote-sensed data
  • Over 3,000,000,000,000,000 bytes stored in HDF
    and HDF-EOS

29
How can we achieve success?
30
How can we achieve success?
  • Maintain strong, responsible, and continuing
    relationships with users
  • An approach to needs identification, software
    design, and software implementation based on
    sound principles of software engineering
  • Effective technical processes for developing,
    testing, integrating and maintaining software
  • Business and social processes based on sound
    group management principles

31
Stages of software development at HDF
  • Getting started
  • Creating an implementation approach
  • Implementation and maintenance
  • Relations with users and sponsors
  • Group practices

32
Getting started
  • Discover a need
  • Identify a sponsor
  • Clarify the need, its role, and its importance
  • Enter task into the project plan
  • Make initial estimate of time and resources for
    the task
  • Give it a priority
  • Identify tasks lead
  • Identify a person who will work on the task

33
Creating implementation approach
  • Write up a needs/approach RFC (Request For
    Comment)
  • Actively solicit feedback from developers/sponsors
  • Revise until satisfied
  • Write up a design/approach RFC
  • Get feedback from developers/sponsors
  • Revise until satisfied
  • Revise project plan according to RFC results
  • Archive RFC

34
Implementation and maintenance
  • Identify validation plan (need improvement)
  • Implement
  • Library or tool
  • Tests
  • Documentation
  • Ask sponsor and friendly users for feedback
  • Review results and repeat appropriate steps above
    as needed
  • Clean up (documentation, Web, etc.) and announce
  • Support (debug, fix, add more tests, advertise)

35
Relations with users and sponsors
  • Who are our sponsors?
  • Organizations and communities with institutional
    and financial commitment to HDF
  • NCSA, NASA, DOE ASCI, Boeing,
  • Agencies supporting RD
  • NCSA, NASA, DOE, NSF,
  • Collaborators who make in-kind contributions
  • Cactus, PyTables, NeXUS, CGNS
  • HDF group members

36
Relations with users and sponsors
  • Each task is associated with a sponsor
  • Each task has a priority, which should be
    confirmed with sponsor
  • Each task falls into one of these categories
  • Research
  • RD (research, possibly integrate into product)
  • Development
  • Technology infusion
  • Library or tools enhancement

37
Group practices
38
Group practices - technical
  • Source code management CVS
  • Bug tracking Bugzilla
  • Bugs entered by support staff and developers
  • Prioritized by staff
  • Easy bugs fixed on the fly

39
Group practices - technical
  • The testing challenge
  • Code testing
  • Testing before code check-in
  • Regression testing
  • Remote testing
  • Different configurations testing
  • Backward compatibility testing

40
Thank you
From HDF group system admin lthdfadmin_at_ncsa.uiuc.e
dugt To hdf5lib_at_ncsa.uiuc.edu Subject
HDF5_Daily_Tests_FAILED!!! HDF5 Tests on
041022   
Watchers List HDF5
Daily test features/platforms watchers and
procedure ----------------------------------------
----------------- Procedure The watcher will
investigate and report the cause of failure by
11am. The developer who checked in the error code
may report so by then too. The watcher or the
developer should get the failure fixed and report
it by 3pm. Platforms watchers AIX 5.1
(copper)         Albert FreeBSD                 
Quincey HP-UX                    Elena IA32   
(tungsten)       Raymond IA64    (tg-login)      
Albert IRIX64-6.5 32,64-bit     Raymond IRIX
6.5                 Raymond Linux
2.4                Peter Solaris 2.78 32,64-bit 
Elena Windows                  Kent Features
watchers General Library         
Quincey General parallel        
Albert configuration            Quincey,
James mpich                    Raymond Fortran    
              Elena Intel compilers         
Elena Kent (for windows) PGI
compilers            Elena C                    
  Binh-Minh             Thread-safety           
Quincey Tools                   
Padro --- updated 2004/10/01
   Tests Summary
FAILED eirene setenv CC icc setenv
F9X ifc setenv CXX icc --enable-fortran
--enable-cxx PASSED arabica setenv CC
/afs/ncsa/projects/hdf/packages/mpich_1.2.4/SunOS6
4_5.7/bin/mpicc setenv F9X /afs/ncsa/projects/hdf/
packages/mpich_1.2.4/SunOS64_5.7/bin/mpif90
setenv ALL_LOCAL 1 --enable-fortran
standard PASSED arabica setenv CC mpicc setenv
ALL_LOCAL 1 standard PASSED arabica setenvN 2 CC
cc -xarchv9 setenvN 2 F9X f90 -xarchv9 setenvN
2 CXX CC -xarchv9 standard --with-szlib/afs/ncsa
/projects/hdf/packages/szip_new/SunOS_5.7-64bit PA
SSED arabica standard --enable-cxx
--enable-fortran --with-szlib/afs/ncsa/projects/h
df/packages/szip_new/SunOS_5.7 PASSED Cu12
--enable-parallel PASSED Cu12 --enable-parallel
setenv CFLAGS -q64 setenv FFLAGS -q64 setenvN 3
AR ar -X 64 --enable-fortran --with-zlib/afs/ncsa
/projects/hdf/packages/zlib/AIX5.1-64bit
--with-szlib/afs/ncsa/projects/hdf/packages/szip_
new/AIX5.1-64bit
41
Daily test report
From HDF group system admin lthdfadmin_at_ncsa.uiuc.e
dugt To hdf5lib_at_ncsa.uiuc.edu Subject
HDF5_Daily_Tests_FAILED!!! HDF5 Tests on
041022   
Watchers List HDF5
Daily test features/platforms watchers and
procedure ----------------------------------------
----------------- Procedure The watcher will
investigate and report the cause of failure by
11am. The developer who checked in the error code
may report so by then too. The watcher or the
developer should get the failure fixed and report
it by 3pm. Platforms watchers AIX 5.1
(copper)         Albert FreeBSD                 
Quincey HP-UX                    Elena IA32   
(tungsten)       Raymond IA64    (tg-login)      
Albert IRIX64-6.5 32,64-bit     Raymond IRIX
6.5                 Raymond Linux
2.4                Peter Solaris 2.78 32,64-bit 
Elena Windows                  Kent Features
watchers General Library         
Quincey General parallel        
Albert configuration            Quincey,
James mpich                    Raymond Fortran    
              Elena Intel compilers         
Elena Kent (for windows) PGI
compilers            Elena C                    
  Binh-Minh             Thread-safety           
Quincey Tools                   
Padro --- updated 2004/10/01
   Tests Summary
FAILED eirene setenv CC icc setenv
F9X ifc setenv CXX icc --enable-fortran
--enable-cxx PASSED arabica setenv CC
/afs/ncsa/projects/hdf/packages/mpich_1.2.4/SunOS6
4_5.7/bin/mpicc setenv F9X /afs/ncsa/projects/hdf/
packages/mpich_1.2.4/SunOS64_5.7/bin/mpif90
setenv ALL_LOCAL 1 --enable-fortran
standard PASSED arabica setenv CC mpicc setenv
ALL_LOCAL 1 standard PASSED arabica setenvN 2 CC
cc -xarchv9 setenvN 2 F9X f90 -xarchv9 setenvN
2 CXX CC -xarchv9 standard --with-szlib/afs/ncsa
/projects/hdf/packages/szip_new/SunOS_5.7-64bit PA
SSED arabica standard --enable-cxx
--enable-fortran --with-szlib/afs/ncsa/projects/h
df/packages/szip_new/SunOS_5.7 PASSED Cu12
--enable-parallel PASSED Cu12 --enable-parallel
setenv CFLAGS -q64 setenv FFLAGS -q64 setenvN 3
AR ar -X 64 --enable-fortran --with-zlib/afs/ncsa
/projects/hdf/packages/zlib/AIX5.1-64bit
--with-szlib/afs/ncsa/projects/hdf/packages/szip_
new/AIX5.1-64bit
42
Group practices - technical
  • Release levels
  • Development release
  • Official release
  • Past releases

43
Group practices - technical
  • Coding standards
  • Maintaining platform-independence
  • Maintaining time-independence
  • Rules for changing APIs
  • Documentation
  • Rapid prototyping

44
Group practices business and social
HDF Project
  • Staff breakdown
  • User support
  • Documentation
  • QA
  • Software development
  • Testing
  • Team leadership
  • System administration

Basic library development
Support, doc, QA, maintenance
Tools and Java
Parallel I/O, Grid, big machines
  • Team lead for each team
  • Most staff in two or more teams
  • Staff relationships
  • Complement each other
  • Overlap each other
  • Keep each other honest

45
Group practices business and social
  • Accountability of everyone to the whole process
  • Help desk
  • Approaches to carrying out tasks
  • Paying attention to technical proposals
  • Weekly HDf5 developers meetings
  • HDF seminars
  • Management and administration
  • Performance reviews with emphasis on goals,
    development
  • Critical to success
  • Thats another talk

46
Summing upStrengths, weaknesses, needs
47
Strengths
  • User support
  • Staff
  • High quality, diverse staff with good morale
  • Staff commitment and enthusiasm
  • Ability to address all aspects of product
    development
  • Emphasis on quality control
  • Fast bug fixing and frequent releases
  • Ability to focus on a single product over a long
    term
  • High level of support from sponsors
  • Projects visibility through NCSA, NASA, DOE,
    users

48
Weaknesses
  • Software development team
  • Library expertise still concentrated among too
    few developers
  • Team communication is challenging
  • Processes
  • Release/maintenance take too much time and
    resources
  • Configuration and porting are a huge time sink
  • We dont do enough prototyping
  • Hard to keep up with new technologies
  • Parallel I/O hard to support

49
More weaknesses challenges
  • Usability
  • Software too hard to use for casual users
  • Insufficient documentation
  • Insufficient tools for high level users
  • Insufficient interoperability with common tools
    and formats
  • Marketing
  • Marketing effort is inadequate
  • Need to connect better with users and potential
    users
  • Viable long-term support

50
Most immediate needs
  • Configuration and build
  • Testing and prototyping
  • Marketing
  • Reporting
  • Performance reports
  • General reports to users
  • HDF book
  • Sustainable business model

51
Thank you
Write a Comment
User Comments (0)
About PowerShow.com