Cyberinfrastructure As A Critical Component in Advancing Research and Education - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Cyberinfrastructure As A Critical Component in Advancing Research and Education

Description:

Cyberinfrastructure As A Critical Component in Advancing Research and Education – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 87
Provided by: georg295
Category:

less

Transcript and Presenter's Notes

Title: Cyberinfrastructure As A Critical Component in Advancing Research and Education


1
Cyberinfrastructure As A Critical Component in
Advancing Research and Education
Bob Wilhelmson National Center for
Supercomputing Applications Director of Cyber
Applications and Communities Chief Science
Officer Department of Atmospheric
Sciences Professor bw_at_ncsa.uiuc.edu
2
It's All About Enabling Science, Engineering,
Humanities, and the Arts
3
In a Multi World!
  • Multidisciplinary (e.g. earth science, virtual
    lung, .)
  • Multiscale
  • Multipurpose (research, prediction, education)
  • Multiservices (simulation, data collection,
    mining, archiving visualization)
  • Multiresources (shared and distributed memory,
    innovative hardware, high speed networking,
    rotating disk and archival store, desktop to
    petascale functionality)
  • Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
    DOD,.)
  • Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
    CUAHSI, NEON, ..)

Collaboration Needed
4
In a Multi World!
  • Multidisciplinary (e.g. earth science, virtual
    lung, .)
  • Multiscale
  • Multipurpose (research, prediction, education)
  • Multiservices (simulation, data collection,
    mining, archiving visualization)
  • Multiresources (shared and distributed memory,
    innovative hardware, high speed networking,
    rotating disk and archival store, desktop to
    petascale functionality)
  • Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
    DOD,.)
  • Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
    CUAHSI, NEON, ..)

Innovation Needed
5
In a Multi World!
  • Multidisciplinary (e.g. earth science, virtual
    lung, .)
  • Multiscale
  • Multipurpose (research, prediction, education)
  • Multiservices (simulation, data collection,
    mining, archiving visualization)
  • Multiresources (shared and distributed memory,
    innovative hardware, high speed networking,
    rotating disk and archival store, desktop to
    petascale functionality)
  • Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
    DOD,.)
  • Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
    CUAHSI, NEON, ..)

Leadership Needed
6
Cyberinfrastructure Beyond Just Computing
  • Todays computer is a coordinated set of
    hardware, software, and services providing an
    end-to-end resource.
  • Cyberinfrastructure captures how the science and
    engineering community has redefined computer

The computer as an integrated set of resources
Source Fran Berman
7
Cyberinfrastructure
Cyberinfrastructure is the coordinated
aggregate of software, hardware and other
technologies, as well as human expertise,
required to support current and future
discoveries in science and engineering.
NSF Blue Ribbon Panel (Atkins) Report provided
compelling and comprehensive vision of an
integrated cyberinfrastructure
Thanks to cyberinfrastructure and information
systems, todays scientific tool kit includes
distributed systems of hardware, software,
databases and expertise that can be accessed in
person or remotely. Arden Bement, NSF
Director February, 2005
Source Adapted from Fran Berman
8
When Did CI Emerge?
Cyberinfrastructure
TCS, DTF, ETF
Terascale
NPACI and Alliance
PACI
NSF Networking
Prior Computing Investments
Supercomputing Centers





1985 1990 1995 2000
2005 2010
Source NSF CISE
9
Background Reports
10
CI Needs Driven By
  • From simple to complex
  • From prototype CI to reliable, robust software
    that
  • Dramatically increases productivity
  • That enables new science
  • That requires very stable base middleware
  • From the desktop (high resolution stereo) to
    the petaflop (10,000 to 100,000 processors just
    for you)
  • From focused systems to balanced systems
  • From observations, theory, and modeling to new
    knowledge
  • From interactive to on-demand to real-time to
    batch
  • From hundreds/thousands of capability (most)
    simulations to large (both space and time)
    leadership computations
  • From 10s of gigabytes to petabytes of data
  • From disciplinary to interdisciplinary
    collaboration
  • From small to large communities

11
General Principles for CI in the Geosciences
  • Cyberinfrastructure must serve geoscience.
    Therefore, it must be developed in response to
    the community, not imposed on the community
  • Much of the development must be done in
    partnership with computer scientists because it
    involves substantial computer science innovation
  • Cyberinfrastructure development is expensive,
    therefore we should encourage as much re-use of
    developments as is possible
  • Work specific to individual fields should be
    reviewed by those fields as they are the ones who
    will ultimately use the infrastructure

Leinen
12
CI-Enhanced Knowledge CommunitiesCI Two Years
after the Blue Ribbon Panel Report Dan Atkins
Elevator Speech Advanced CI is critical to
innovation. Innovation is critical to leadership
in global, knowledge-based economies.
  • We must now invest in IT as institutionalized,
    sustained, evolving but robust infrastructure
    that researchers will bet their careers on
  • My history of technology and civil engineering
    friends are quick to remind me that
    infrastructure is among the most complex and
    costly undertakings of modern society
  • There exists a stew bubbling with activities
    called cyberinfrastructure, e-science, grids,
    collaboratories complementary visions and
    activities
  • We need to cooperate in new ways in order to
    compete (co-optition)

13
Virtual Organization Conceptual View of
Information Infrastructure
Identity management
SCIENCE
Portal
Scientists
Registration service
Local resources
Information resources
Task
Observing Systems
Credit Leinen Meacham
Hardware resources
14
NSFS CYBERINFRASTRUCTURE VISION FOR 21ST CENTURY
DISCOVERY
  • Call to Action
  • Strategic Plan for High Performance Computing
    (2006-2010)
  • Strategic Plan for Data, Data Analysis and
    Visualization (2006-2010)
  • Strategic Plan for Collaboratories, Observatories
    and Virtual Organizations (2006-2010)
  • Strategic Plan for Education and Workforce
    (2006-2010)

15
Its a Cyber World!
16
Definitions
  • A cybercommunity is a distributed group of people
    with common goals and it ranges from a few
    individuals to an interdisciplinary or
    international group. These groups can include,
    researchers, policy makers, responders,
    educators, and citizens and often have a long
    term identity and purpose.
  • A cyberenvironment is a subset of general CI
    capabilities and functionality that is designed
    and built to meet the needs of a particular
    community. It includes use of broadly used
    middleware and networks as well a community
    specific facilities, software frameworks,
    networks, and people. Further, itis persistent,
    robust, and supported.
  • A cyberservice is a web or grid service, a
    software tool or a toolkit, a model or collection
    of models, etc.

17
Definitions
Cyberenvironments composed of cyberservices that
enable cyberscience within cybercommunities
  • A cybercommunity is a distributed group of people
    with common goals and it ranges from a few
    individuals to an interdisciplinary or
    international group. These groups can include,
    researchers, policy makers, responders,
    educators, and citizens and often have a long
    term identity and purpose.
  • A cyberenvironment is a subset of general CI
    capabilities and functionality that is designed
    and built to meet the needs of a particular
    community. It includes use of broadly used
    middleware and networks as well a community
    specific facilities, software frameworks,
    networks, and people. Further, itis persistent,
    robust, and supported.
  • A cyberservice is a web or grid service, a
    software tool or a toolkit, a model or collection
    of models, etc.

18
NCSAs Strategic Directions
  • Cyber-resources
  • enabling discovery at the leading edge
  • Leading edge computing and data storage resources
    as well as network connectivity
  • User services to help make effective use of
    high-end computing resources
  • Cyberenvironments
  • harnessing the power of the national
    cyberinfrastructure
  • Integrated, end-to-end software environments to
    provide access and ability to coordinate,
    automate, and apply high-end resources and
    capabilities
  • Cyberservices and cybertechnologies needed to
    build cyberenvironments
  • Innovative Computing Systems
  • defining the path to petascale computing
  • Innovative computing systems that promise to
    significantly decrease the cost and/or extend the
    range of computational science and engineering

19
Why Cyberenvironments?
  • Mosaic
  • By early 1990s, the internet had a wealth of
    resources, but they were inaccessible to most
    scientists
  • Mosaic facilitated the use of the internet by all
    scientists (and, eventually, by laymen!)
  • Cyberenvironments
  • Cyberenvironments will facilitate the use of
    cyber-infrastructure by all scientists

20
CyberenvironmentsBeyond Web Portals
  • Web Portals
  • Reduce barrier to accessing cyberinfrastructure
    by providing convenient point-and-click
    interface
  • Broaden access to and use of cyberinfrastructure
  • Cyberenvironments
  • Help manage large-scale, complex and
    interdependent projects and processes
  • Help manage diverse experimental, computational
    and data resources
  • Bridge local, institutional, national and
    international cyberinfrastructure to create a
    seamless environment
  • Assist in the bi-directional connection between
    raw research artifacts and published artifacts

21
Environmental Cyberenvironment Prototype
  • CLEANER
  • Collaborative Large-scale Engineering Analysis
    Network for Environmental Research
  • Human-dominated, complex environmental systems,
    e.g.,
  • River basins
  • Coastal margins
  • What researchers requested
  • Access to live and archived
  • sensor data
  • Analyze, visualize and compare
  • data
  • Link to computational models
  • Collaborate with colleagues
  • Organize, automate and share cyber-research
    processes

Users can simultaneously view and discuss data
and analyses
22
CyberenvironmentsMAEViz Earthquake Engineering
Hazard Definition
Inventory Aggregation
Fragility Models
Damage Prediction
  • Elements of MAEViz
  • State of the art engineering
  • Distributed data/metadata sources
  • GIS with visual overlays
  • Collaboration environment
  • Builds on some NEESgrid technologies

Decision Support
23
Building Cyberenvironments
Scientific Pathfinders Scientific Communities
Technology Roadmaps
Scientific Roadmaps
Partners
Integrated Project Teams
Requirements Analysis Specification
SDSC PSC TeraGrid Working Groups Advisory
Committees Industrial Partners International Partn
ers
Portals GUIs Workflow Mgmt SE
Applications Data Mining Analysis Visualization
Webservices Collaboratories Middleware Security
Development System Integration
Prototype or Production Cyberenvironments
Cyberarchitecture Working Group
Research Applications
Scientific Discoveries
24
Engaging and Enabling Communities
Scientists, Engineers, Decision Makers, Policy
Makers, Media and Citizens Engaging in
discovery, analysis, discussion, deliberation,
decisions, policy formulation and communication
Collaboration Framework facilitates Idea and
Knowledge Sharing, eLearning and Multi-Objective
Decision Support Processes
Analysis Framework facilitates Data and Model
Discovery, Exploration, and Analysis via the
Collaboration Framework
Data Management Framework builds logical maps of
distributed, heterogeneous information resources
(data, models, tools, etc.) and facilitates
their use via the Analysis and Collaboration
Frameworks
Physical Infrastructure
25
CyberenvironmentsStructure of Cyberenvironments
26
An Example - Linked Environments for Atmosheric
Discovery (NSF Large ITR Project)
27
My View of The World
Meteorology CS
CS Meteorology
28
BIRN Core Software Infrastructure
Distributed Resources
Courtesy Mark Ellisman
29
The Basic Idea of Web Services
A stock quote service
getLastTradePrice( tickerSymbol ) returns
price
getLastTradePrice( SGI)
1.73
  • A Web Service is
  • A network service that provides a functional
    interface for remote clients.
  • An interface is a set of operations the service
    performs
  • Big collaboration between big players
  • IBM, Microsoft, Oracle, HP, Sun everybody
  • Central to MS .NET and IBM services plans
  • Provides a better way to factor complex
    distributed applications into basic, reusable,
    reliable services.

Credit Dennis Gannon
30
So why web services?
  • The web works by a simple set of http commands
  • Get and Post/Put.
  • Complex requests like
  • Get me a non-smoking double room at the special
    rate at the hilton and bill it to the company
    are more complex requests that are encoded in
    awkward URL strings. This is very limited.
  • A web service declares in a WSDL doc,
  • Here is what services I provide. Use this XML
    language and interface definition to send me
    requests and here is how I will respond in XML.

Credit Dennis Gannon
31
Web Services Description Language (WSDL)
  • A standard
  • A description of types and Ports.
  • A port is a set of operation the port can do.
  • WSDL has XML description of these operations and
    their arguments and response types.
  • WSDL is an XML document that
  • Describes the interfaces types for each port
  • The contents of messages it receives and sends
  • Describes the bindings of interfaces to protocols
  • SOAP XML over HTTP is default, others possible
  • Describes the access points (host/port) for the
    protocol bindings

Credit Dennis Gannon
32
NCSAs ALG Research, Development, Technology
Transfer Model (limited web services)
33
Knowledge Discovery Process
34
Advantages of a Framework for Analytics Such as
D2K
  • Scalable Desktop Web Services Grid Services
  • Visual programming system employing a
    data/workflow paradigm
  • Integrated environment for models and
    visualization
  • Capability to access data management tools
    transparently from multiple sources
  • Capability to build custom applications rapidly
  • Data mining algorithms complex data paradigms

35
D2K Infrastructure, Modules, Itineraries, and
Applications
  • D2K Infrastructure
  • D2K API, data flow environment, distributed
    computing framework and runtime system
  • D2K Modules
  • Computational units written in Java that follow
    the D2K API
  • D2K Itineraries
  • Modules that are connected to form an application
  • D2K Toolkit
  • User interface for specification of itineraries
    and execution that provides the rapid application
    development environment
  • D2K-Driven Applications
  • Applications that use D2K modules, but do not
    need to run in the D2K Toolkit
  • D2KSL Web/Grid Services
  • Applications that provide user specific GUI or
    application service

36
A Value of Cyberenvironments (CEs)The 80 20
Flip
  • What if a graduate student or a researcher could
    spend 80 of their time on science and only 20
    on grunt work through technology and software
    innovation?
  • Answer great things!
  • Examples
  • Gaskins finding subsequence series in a protein
    through analytics
  • Lewin new advances through info visualization

37
Evolution Highway in D2K
  • Uses the D2K web service to deliver the
    visualization application
  • Provides a visual means for simultaneously
    comparing mammalian genomes of humans, horses,
    cats, dogs, pigs, cattle, rats, and mice
  • Removes the burden of manually aligning these
    maps
  • Allows cognitive skills to be used on something
    more valuable than preparation and transformation
    of data
  • evolutionhighway.ncsa.uiuc.edu went live on July
    22, 2005
  • Science, Vol 309, Issue 5734, 613-617 , 22 July
    2005

38
First Simulation of an Entire Life Form -
Satellite Tobacco Mosaic Virus
  • Klaus Schulten and collaborators at UIUC
  • Up to 1 million atoms for 50 nanoseconds
  • NAMD modern, scalable molecular dynamics code used

39
Unprecedented Planning/Vision
40
Survey Findings / Recommendations
Survey
Recommendations
  • Expected requirement
  • ten to one thousand times the current ITI
    hardware capacity over the next five to ten
    years,
  • most critical bottlenecks occurring in the
    availability of cpu cycles, memory and
    mass-storage capacity, and network bandwidth.
  • Software systems
  • need to re-engineer models, and data analysis and
    assimilation packages, for efficient use on
    massively parallel computers
  • advances in visualization techniques to deal
    effectively with increasing volumes of
    observations and model output
  • well-designed, documented and tested community
    models of all types.
  • Extreme shortage of skilled ITI technical
    personnel accessible to the ocean sciences
    community
  • Improve access to high-performance computational
    resources across the ocean sciences.
  • Provide technical support for maintenance and
    upgrade of local ITI resources.
  • Provide model, data and software curatorship.
  • Facilitate advanced applications programming

41
Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
  • Give human resource issues top priority
  • Academic reward structure
  • Investments in CI personnel
  • Encourage CI education of AS students, educators,
    support staff and scientists
  • Support mechanisms of communication and
    dissemination of ideas, technologies and
    processes to promote interdisciplinary
    understanding

42
Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
  • Fund entire software life cycle including
    development, testing, hardening, deployment,
    training, support and maintenance
  • Invest in computing infrastructure and capacity
    building at all levels, including centers,
    campuses, and departments
  • Support development of geosciences
    cyber-environments that allow the seamless
    transport of work from the desktop to
    supercomputers to the Grid

43
Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
  • Help organize and coordinate CI across GEO
  • Geoinformatics Steering Committee
  • Geosciences Technology Forum
  • Coordinate with NASA, NOAA, and other agencies to
    enable finding, using, publishing, and
    distributing geoscience data
  • Appropriate standards for metadata

44
Petascale Collaboratory
Overarching Recommendation Establish a
Petascale Collaboratory for the Geosciences with
the mission to provide leadership-class
computational resources that will make it
possible to address, and minimize the time to
solution of, the most challenging problems facing
the geosciences.
DRAFT
45
LEAD A CI Research and Development Effort
Funded by NSF (http//lead.ou.edu)
46
(No Transcript)
47
LEAD Project Motivation
  • Each year, mesoscale weather floods, tornadoes,
    hail, strong winds, lightning, and winter storms
    causes hundreds of deaths, routinely disrupts
    transportation and commerce, and results in
    annual economic losses gt 13B.

Source Kelvin Droegemeier
48
The LEAD Goal
  • To create an grid-based integrated, scalable
    framework that allows analysis tools, forecast
    models, and data repositories to be used as
    dynamically adaptive, on-demand systems that can
  • operate independent of data formats and the
    physical location of data or computing resources
  • change configuration rapidly and automatically in
    response to weather
  • continually be steered by new data (i.e., the
    weather)
  • respond to decision-driven inputs from users
  • initiate other processes automatically and
  • steer remote observing technologies to optimize
    data collection for the problem at hand

49
Sample Problem Scenario
Streaming Observations
50
Why Is LEAD a Collaboration?
  • LEAD is working to develop a comprehensive
    national cyberinfrastructure for mesoscale
    meteorology research, education, and prediction.
    It is addressing the fundamental information
    technology (IT) research challenges needed to
    create an integrated, scalable environment for
  • identifying,
  • accessing,
  • preparing,
  • assimilating,
  • predicting,
  • managing,
  • analyzing,
  • mining, and
  • visualizing
  • a broad array of meteorological data and model
    output, independent of format and physical
    location and having dynamically adaptive,
    on-demand response.

51
Other Geoscience ITRs and Projects
  • GEON
  • SCEC
  • LTER
  • CUASHI
  • CLEANER
  • NEON
  • LOOKING
  • MAEVIS
  • SEEK
  • EARTHSCOPE
  • ORION
  • BIRN
  • CHRONOS
  • NEESGRID

52
First Remote Interactive High Definition Video
Exploration of Deep Sea Vents
Canadian-U.S. Collaboration
Source John Delaney Deborah Kelley, UWash
53
Chemistry
  • Science Drivers
  • Multiscale modeling including high-dimension,
    chemical-accuracy potential energy surfaces
  • Real-time feedback to control of reacting systems
    monitored by sensor technology
  • Prediction of optimal experiments (lower cost of
    discovery and process design)
  • Validation of computational models vs.
    experimental data, and vice versa
  • Cyber-Enabled Chemistry
  • New paradigm for information flow (transparent
    resource sharing such as data grids rather than
    centrally stored data bases workflow management
    tools)
  • New paradigm for shared instrumentation (remote
    chemistry) including broadening participation
  • Interfacing data and software across disciplines
    (interoperability), and development of cyber
    collaboration tools

Rohlfing
54
IT Challenges from a Scientific Perspective
Organ(ism)
Integration Across Scale Data Algorithms
Cells/Cell interactions
Cellular components
Molecular assemblies
Macromolecular structure dynamics
Mechanism, Property Response

Atomic level Structure
55
Cosmic Simulator with a Billion Zone and
Gigaparticle Resolution
Source Mike Norman, UCSD
Compare with Sloan Survey
SDSC Blue Horizon
56
Why Does the Cosmic SimulatorNeed
Cyberinfrastructure?
  • One Gigazone Run
  • Generates 10 TeraByte of Output
  • A Snapshot is 100 GB
  • Need to Visually Analyze as We Create SpaceTimes
  • Visual Analysis Daunting
  • Single Frame is About 8GB
  • A Smooth Animation of 1000 Frames is 1000 x 8
    GB8TB
  • Stage on Rotating Storage to High Res Displays
  • Can Run Evolutions Faster than We can Archive
    Them
  • File Transport Over Shared Internet 50 Mbit/s
  • 4 Hours to Move ONE Snapshot!
  • Many Scientists Will Need Access for Analysis

Source Mike Norman, UCSD
57
Limitations of Uniform Grids for Complex
Scientific and Engineering Problems
512x512x512 Run on 512-node CM-5
Source Greg Bryan, Mike Norman, NCSA
58
Develop Automatic Mesh Refinement (AMR) to
Resolve Mass Concentrations
64x64x64 Run with Seven Levels of Adaption on SGI
Power Challenge, Locally Equivalent to
8192x8192x8192 Resolution
Source Greg Bryan, Mike Norman, John Shalf, NCSA
59
Cosmic SimulatorThresholds of Capability and
Discovery
  • 2000 Formation of Galaxy Cluster Cores (1
    TFLOP/s)
  • 2006 Properties of First Galaxies (40 TFLOP/s)
  • 2010 Emergence of Hubble Types (150 TFLOP/s)
  • 2014 Large Scale Distribution Of Galaxies By
    Luminosity And Morphology (500 TFLOP/s)

Source Mike Norman, UCSD
60
Biomedical Information Research Network
  • Enable new understanding of neurological disease
    by integrating data across multiple scales from
    macroscopic brain function to its molecular and
    cellular underpinnings
  • Federate distributed multiscale brain data
  • Accommodate associated large scale computational
    requirements
  • Provide Infrastructure for Next Generation
    Collaboratory

Scales of NS from Maryann Martone
61
What BIRN is doing
  • Integrating the activities of the most advanced
    biomedical imaging and clinical research centers
    in the U.S. - Serving as a model for programs
    everywhere
  • Establishing distributed and linked data
    collections with partnering groups - create a
    Data GRID for the BIRN
  • Facilitating the use of "grid-based"
    computational infrastructure and integrate BIRN
    with other GRID middleware projects
  • Enabling data mining from multiple distributed
    data collections or databases on neuroimaging and
    bioinformatics
  • Building a stable software and hardware
    infrastructure that will allow centers to
    coordinate efforts to accumulate larger studies
    than can be carried out at one site.

62
What BIRN is doing
  • Changing the use pattern for research data from
    the individual laboratory/project to shared use.
  • Defining processes, procedures and establishing
    best practices so that the BIRN is reliable,
    scalable and extensible to biomedical research
    programs outside of the pioneering Neuroimaging
    Test-beds - able to support the work of thousands
    of researchers.
  • Pushing the envelope of biomedical informatics
    and computer science by causing the development
    of new techniques in databases, information
    retrieval, visualization and computational
    processing.

63
BIRN Network
Yale New Haven
MIT
UCSF San Francisco
Memphis Tenn
64
CARMA
  • Long history of important contributions at NCSA
    in radio astronomy
  • MIRIAD software package originated at NCSA
    contributed to other community codes
  • BIMA archive (1.25 TB) developed and based at
    NCSA
  • BIMA pipeline was developed and is deployed at
    NCSA
  • Major NCSA CARMA involvement is in data reduction
    pipeline, archiving and databases

Combined Array for Research in Millimeter
Astronomy (CARMA)
Effort led by Dick Crutcher and Athol Kimball
65
Building prototypes for LSST
  • LSST a new telescope for exploring the variable
    universe and dark energy
  • 3.2 GPixel camera will image entire available sky
    every 3 days
  • At first light in 2013, telescope will produce 15
    TB/night of raw data 130 TB/night of processed
    products
  • Goal for 2006 to build and test a working
    automated processing system as input to the LSST
    construction proposal
  • Deploy the NOAO Science Archive at NCSA as the
    foundation for the LSST archive
  • use it as a testbed for advanced data access
    mechanisms while serving an existing user base
  • we will deploy an NVO-interoperable security
    framework based on grid security tools
  • In 2005
  • We deployed automated data mirroring system based
    on SRB NCSA BIMA Archive system
  • Currently mirroring data from NOAO telescopes
  • Deploy an intelligent data access system that
    uses grid tools to efficiently distribute data
    across a cluster.
  • Integrate grid-based data workflow systems for to
    automatically create processed data products
    using Teragrid
  • In 2005 we deployed and demonstrated an early
    version for processing simulated data on Teragrid
  • Apply automated system to the first LSST Data
    Challenge using precursor data
  • In 2005 we created the LSST Precursor Data
    Archive for LSST developers nationwide

66
Large Environmental ObservatoriesNeeding
Cyberenvironments
  • CUAHSI (Consortium of Universities for the
    Advancement of Hydrologic Sciences Inc.) for
    hydrology
  • NEON (National Ecological Observatory Network)
    for ecology
  • LOOKING (Laboratory for the Ocean Observatory
    Knowledge Integration Grid)
  • CLEANER (Collaborative Large Scale Engineering
    Analysis Network for Environmental Research) for
    environmental engineering
  • LTER (U.S. Long-Term Ecological Research Network)
    investigating ecological processes over long
    temporal and broad spatial scale

67
CLEANER-Hydrologic Observatories MISSION
STATEMENT
To transform our understanding of the earths
water and related biogeochemical cycles across
spatial and temporal scales to enable
forecasting of critical water-related processes
that affect and are affected by human
activities. And develop scientific and
engineering tools to enable more effective
adaptive management approaches for large-scale
human-dominated environments.
68
The Needand Why Now?
Nothing is more fundamental to life than water.
Not only is water a basic need, but adequate
safe water underpins the nations health,
economy, security, and ecology. NRC (2004)
Confronting the nations water problems the role
of research.
Three critical deficiencies in current
abilities (1) We lack basic data and the
infrastructure to collect them at the needed
resolution. (2) Even if we could collect
them, we lack the means to integrate data across
scales from different media and sources
(observations, experiments, simulations).
(3) We lack sufficiently accurate modeling and
decision-support tools to predict underlying
processes, let alone forecast effects of
different management strategies.
69
Critical Environmental Grand Challenges
  • Understanding and forecasting hydrologic cycle
    processes
  • Designing ecologically sustainable cities
  • Assessing effects of climate change on water
    resources (droughts/floods)
  • Understanding human impacts on major
    biogeochemical cycles and the incidence of
    water-borne communicable diseases
  • Quantifying relationship of land-use/cover to
    aquatic ecosystem quality
  • Reinventing the use of materials (that become
    pollutants)

References NRC (2001) Grand Challenges in
the Environmental Sciences NAE (2002)
Engineering and Environmental Challenges
70
NCSA CLEANER Efforts
  • NCSA has developed a prototype of the CLEANER
    CyberCollaboratory (http//cleaner.ncsa.uiuc.edu)
  • NCSA is leading the CLEANER Project Office
    activities
  • Major requirements gathering initiative
  • Create prototypes
  • Community surveys and interviews to assess needs
  • report of recommendations CLEANER needs
  • NCSA is creating prototypes that build on a
    common CI architecture across communities
  • Two environmental testbeds
  • Illinois River Basin testbed
  • Corpus Christi Bay testbed

Effort led by Barbara Minsker
71
Environmental CI Architecture Research Services
Integrated CI
Supporting Technology
Data Services
Workflows Model Services
Knowledge Services
Meta-Workflows
Collaboration Services
Digital Library
Analyze Data /or Assimilate into Model(s)
Link /or Run Analyses /or Model(s)
Create Hypo-thesis
Obtain Data
Discuss Results
Publish
Research Process
72
National Institute of General Medical Sciences
Mission Statement
  • In ten years, we want every person involved in
    the biomedical enterprise---basic researcher,
    clinical researcher, practitioner, student,
    teacher, policy maker---to have at their
    fingertips through their keyboard instant access
    to all the data sources, analysis tools, modeling
    tools, visualization tools, and interpretative
    materials necessary to do their jobs with no
    inefficiencies in computation or information
    technology being a rate-limiting step.
  • In twenty years, we want intelligent
    computational agents to do complex query and
    modeling tasks in the biomedical computing
    environment, freeing humans for creative
    hypothesis construction and high level analysis
    and interpretation.

Jakobsson
73
Some important problems with biomedical computing
tools
  • They are difficult to use.
  • They are fragile.
  • They lack interoperability of different
    components
  • They suffer limitations on dissemination
  • They often work in one program/one function mode
    as opposed to being part of an integrated
    computational environment.
  • There are not sufficient personnel to meet the
    needs for creating better biological computing
    tools and user environments.

Jakobsson
74
Computation holds great promise for future
progress in biomedical science
  • Cataloguing and analyzing individual genome-based
    variations to permit customized diagnosis and
    therapy.
  • Building comprehensive pathway models for human
    and pathogen cells to provide a framework for
    understanding normal function and disease at the
    subcellular level.
  • Building and deploying dynamic models of disease
    epidemics as a tool for responding to natural
    pandemics and bioterrorist attacks
  • Use of biomimetic principles to construct
    Computer Aided Design systems for molecular
    devices
  • Predicting protein structures and functional
    properties from sequences

Jakobsson
75
The Paradox of Computational BiologyIts
successes are the flip side of its deficiencies
  • The success of computational biology is shown by
    the fact that computation has become integral and
    critical to modern biomedical research.
  • Because computation is integral to biomedical
    research, its deficiencies have become
    significant rate limiting factors in the rate of
    progress of biomedical research.

Jakobsson
76
Biological Computing Challenges
  • Simulation and prediction
  • structures and dynamics
  • Multilevel networks
  • signaling, metabolic, protein interaction
  • gene regulatory

Temporal (seconds)
Spatial (nM3)
Reed
77
High-Performance Computing is a Major GTL Partner
Protein machine Interactions
?
1000 TF 100 TF 10 TF 1 TF
Molecule-based cell simulation
Molecular machine classical simulation
Cell, pathway, and network simulation
Community metabolic regulatory, signaling
simulations
Current U.S. Computing for Open Science
Constrained rigid docking
Constraint-Based Flexible Docking
Genome-scale protein threading
?
OBER / OASCR Partnership
Comparative Genomics
Patinos
Teraflops
Biological Complexity
78
Companies are not Using HPC as Aggressively as
Possible
  • Education and Training Barriers
  • Lack of computational scientists (internal or
    external)
  • Not enough people in the pipeline
  • Poor match between skills taught and skills
    needed

Tichenor - Council on Competitiveness
79
Grand Challenge Case Studies
  •   
  • Five currently intractable
  • Problems that could profoundly advance
    industrial productivity and national
    competitiveness if petaflop or greater compute
    capability can be made available to solve them.

Tichenor
80
Grand Challenge Case Studies
  • Auto Crash Safety Its Not Just for Dummies
  • Full Vehicle Design Optimization for Global
    Market Dominance
  • Keeping the Lifeblood Flowing Boosting Oil and
    Gas Recovery from the Earth
  • Customized Catalysts to Improve Crude Oil Yields
    Getting More Bang From Each Barrel
  • Spin Fiber Faster to Gain a Competitive Edge for
    U.S. Textile Manufacturing

Tichenor
81
Impact of 10X Easier to Use Computers
  • 10X easier-to-use machines deliver strategic
    benefits
  • Develop more powerful applications, or
    fundamentally rewrite current applications
  • Shorten design cycles, faster time-to-market
  • Make HPC available to researchers who dont
    understand programming
  • Increase RD efficiency reduce costs

We would look to rewrite the entire science
underlying the current technology and methodology
we are using.
Tichenor
82
TeraGrid is One of the First Broad Instantiations
of CI
83
The Grid Today
  • Common Middleware
  • abstracts independent, hardware, software, user
    ids, into a service layer with defined APIs
  • added comprehensive security,
  • allows for site autonomy
  • provides a common infrastructure based on
    middleware

User
Application
The underlying infrastructure is abstracted into
defined APIs thereby simplifying developer and
the user access to resources, however, this layer
is not intelligent
Grid Middleware
Infrastructure
Network
Site A
Site B
Source NASA
84
Hope for the Future
  • Customizable Grid Services built on defined
    Infrastructure APIs
  • automatic selection of resources
  • information products tailored to users
  • Account-less processing
  • flexible interface web based, command line, APIs

User
Application
Resources are accessed via various intelligent
services that access infrastructure APIs The
result The Scientist and Application Developer
can focus on science and not on systems management
Intelligent, Customized Middleware
Grid Middleware - Infrastructure APIs (service
oriented)
Infrastructure
Network
Site A
Site B
Source NASA
85
More Than Grid Computing
  • Cyberinfrastructure is
  • A shared responsibility among scientists,
    communities, agencies, and even nations
  • Research as well as deployed systems for
    production research
  • The trick is to do both well and keep everyone
    happy!
  • Because CI is inherently distributed, all of us
    will have a greater role in bringing it about,
    moving it forward and sustaining it

Credit Droegemeier
86
A New Era in the US
  • More interagency involvement in the provision of
    services
  • Stronger linkages with industry to develop next
    generation capabilities
  • A broader, balanced portfolio among multiple
    elements (HPC, data, viz, networking, software,
    people, tools)
  • Emphasis on sustained services and diversity of
    provision

Credit Droegemeier
87
A New Era in the World
Courtesy I. Foster
Write a Comment
User Comments (0)
About PowerShow.com