CSE - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

CSE

Description:

computational physics, computational chemistry, big simulations, Teraflops, Petabytes, ... B Ludaescher, K Lin, S Bowers, E Jaeger-Frank, B Brodaric, C Baru. ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 17
Provided by: bertr68
Category:
Tags: cse | jaeger

less

Transcript and Presenter's Notes

Title: CSE


1
CSE e-Science
  • Bertram Ludäscher
  • Dept. of Computer Science
  • Genome Center
  • University of California, Davis
  • ludaesch_at_ucdavis.edu

2
Computational Science Engineering
  • Traditional view
  • computational physics, computational chemistry,
    big simulations, Teraflops, Petabytes,
  • Yes, but wait -- there is more
  • Emergence of e-Science (UK, Europe),
    cyberinfrastructure (NSF), and NIH programs
  • To illustrate this a bit more

3
Science has been changing lately
  • THEN All science is either physics or stamp
    collecting.
  • Ernest Rutherford, British chemist
    physicist (1871 - 1937)
  • J. B. Birks "Rutherford at Manchester (1962)
  • i.e., from few data, lots of thinking, to
  • NOW Lots of Data Analysis
  • ? Data-driven scientific discovery!

4
The Diversity Unity of Science
Natural Sciences

Earth Sciences
Life Sciences
Physical Sciences
Observations, Measurements, Models, Simulations,
Analyses, Hypotheses Understanding, Prediction,

in vivo, in vitro, in situ, in silico,
Data-, Knowledge-, Workflow- Management is
central to most of them!
compute-intensive
structurally semantics -intensive
data-intensive
metadata-intensive
5
e-Science (UK) and Cyberinfrastructure (US)
  • e-Science is about global collaboration in key
    areas of science and the next generation of
    computing infrastructure that will enable it."
  • Sir John Taylor, Director Office of Science and
    Technology, UK
  • "Cyberinfrastructure is the coordinated aggregate
    of software, hardware and other technologies, as
    well as human expertise, required to support
    current and future discoveries in science and
    engineering. The challenge of Cyberinfrastructure
    is to integrate relevant and often disparate
    resources to provide a useful, usable, and
    enabling framework for research and discovery
    characterized by broad access and 'end-to-end'
    coordination.
  • Fran Berman, San Diego Supercomputer Center, UCSD

6
Towards 2020 Science Report (MSR)
http//research.microsoft.com/towards2020science
  • new develoment at the intersection of computer
    science and the sciences a leap from the
    application of computing to support scientists to
    do science (i.e. computational science) to
    the integration of computer science concepts,
    tools and theorems into the very fabric of
    science. We believe this development
    represents the foundations of a new revolution in
    science
  • we believe computer science is poised to become
    as fundamental to biology as mathematics has
    become to physics
  • to understand cells and cellular systems
    requires viewing them as information processing
    systems, as evidenced by the fundamental
    similarity between molecular machines of the
    living cell and computational automata, and by
    the natural fit between computer process algebras
    and biological signalling and between
    computational logical circuits and regulatory
    systems in the cell
  • We highlight that an immediate and important
    challenge is that of end-to-end scientific data
    management, from data acquisition and data
    integration, to data treatment, provenance and
    persistence.
  • dramatic in its impact, will be the integration
    of new conceptual and technological tools from
    computer science into the sciences.

7
Example Assembling the Tree of Life (AToL)
All organisms (alive or extinct) are part of one
large, genetically connected group Life on
Earth. Major subgroups Eubacteria, Archaea, and
Eukaryotesfurther divided into hierarchically
nested subgroups e.g., eukaryotes contains
plants, animals, fungi animals contains
sponges, cnidarians, Bilateria Bilateria
contains arthropods, molluscs, nematodes, etc.
8
Inferring a phylogenetic tree from disparate data
Aligned DNA sequences
Maximum likelihood tree (DNA)
Discrete morphological data
Maximum parsimony tree
Integrate
Consensus Tree(s)
Maximum likelihood tree (continuous characters)
Continuous characters
NSF Collaborative Research (w/ UPenn) Core
Database Technologies to Enable the Integration
of AToL Information 462,000 (2006-2009)
Actors
Datasets
Datasets
9
A Real-World Example (ChIP-chip workflow)
collaboration with UCD Genome Center
NSF/SEI(BIO)II A Collaborative Scientific
Workflow Environment for Accelerating
Genome-Scale Biological Research 600,139
(2006-2009)
10
DOE/SciDAC-2 SDM CPES Fusion Simulation
(Norbert Podhorszki UC Davis, Scott Klasky ORNL)
Monitor
  • Plasma physics simulation on 2048 processors on
    Seaborg_at_NERSC (LBL)
  • Gyrokinetic Toroidal Code (GTC) to study energy
    transport in fusion devices (plasma
    microturbulence)
  • Generating 800GB of data (3000 files, 6000
    timesteps, 267MB/timestep), 30 hour simulation
    run
  • Under workflow control
  • Monitor (watch) simulation progress (via remote
    scripts)
  • Transfer from NERSC to ORNL concurrently with the
    simulation run
  • Convert each file to HDF5 file
  • Archive files to 4GB chunks into HPSS

DOE/SciDAC-2 Scientific Data Management Center
(w/ LBL) 965,000 (2006-2011)
11
Kepler and Sensor Networks
  • These ones just in (new NSF CEOP projects)
  • Management and Analysis of Environmental
    Observatory Data using the Kepler Scientific
    Workflow System, NCEAS, SDSC, UC Davis, OSU,
    CENS (UCLA), OPeNDAP
  • standardize services for sensor networks, support
    multiple views, protocols
  • COMET Coast-to-Mountain Environmental Transect,
    UC Davis, Bodega Marine Lab, Lake Tahoe Research
    Center
  • study how environmental factors affect ecosystems
    along an elevation gradient from coastal
    California to the summit of the Sierra Nevada

CEOP--COMET Coast-to-Mountain Environmental
Transect 2,158,580 (2006-209)
CEOP--Management and Analysis of Environmental
Observatory Data Using the Kepler Scientific
Workflow System 290,000 (collaborative 2.9M)
(2006-2010)
CEOP/REAP
12
Scientific Workflows Cyberinfrastructure
UPPER-WARE
Upperware
Upper Middleware
Middleware
Underware
NSF ITR (w/ SDSC) Science Environment for
Ecological Knowledge (SEEK) 2,485,683
(2002-2007)
13
Consilience The Unity of Knowledge (E. O. Wilson)
  • "Literally a jumping together of knowledge by the
    linking of facts and fact-based theory across
    disciplines to create a common groundwork for
    explanation." E.O.Wilson
  • eScience, Cyberinfrastructure mechanisms to make
    progress
  • Scientific Workflows crucial elements to get the
    most mileage out of CI to fuel eScience,
    accelerating knowledge discovery
  • CSE needs computer scientists, domain scientists,
    hybrids (e.g. bioinformaticians,
    computational/simulation scientists)

14
Some Related Publications
  • Semantic Type Annotation
  • S Bowers, B Ludaescher. A Calculus for
    Propagating Semantic Annotations through
    Scientific Workflow Queries. ICDE Workshop on
    Query Languages and Query Processing (QLQP),
    LNCS, 2006.
  • S Bowers, B Ludaescher. Towards Automatic
    Generation of Semantic Types in Scientific
    Workflows. International Workshop on Scalable
    Semantic Web Knowledge Base Systems (SSWS), WISE
    2005 Workshop Proceedings, LNCS, 2005.
  • C Berkley, S Bowers, M Jones, B Ludaescher, M
    Schildhauer, J Tao. Incorporating Semantics in
    Scientific Workflow Authoring. SSDBM, 2005.
  • B Ludaescher, K Lin, S Bowers, E Jaeger-Frank, B
    Brodaric, C Baru. Managing Scientific Data From
    Data Integration to Scientific Workflows. GSA
    Today, Special Issue on Geoinformatics, 2006.
  • S Bowers, D Thau, R Williams, B Ludaescher. Data
    Procurement for Enabling Scientific Workflows On
    Exploring Inter-Ant Parasitism. VLDB Workshop on
    Semantic Web and Databases (SWDB), 2004.
  • S Bowers, K Lin, B Ludaescher. On Integrating
    Scientific Resources through Semantic
    Registration. SSDBM, 2004.
  • S Bowers, B Ludaescher. An Ontology-Drive
    Framework for Data Transformation in Scientific
    Workflows. International Workshop on Data
    Integration in the Life Sciences (DILS), LNCS,
    2004.
  • S Bowers, B Ludaescher. Towards a Generic
    Framework for Semantic Registration of Scientific
    Data. International Semantic Web Conference
    Workshop on Semantic Web Technologies for
    Searching and Retrieving Scientific Data, 2003.
  • Workflow Design and Modeling
  • T McPhillips, S Bowers, B Ludaescher.
    Collection-Oriented Scientific Workflows for
    Integrating and Analyzing Biological Data.
    Workshop on Data Integration in the Life Sciences
    (DILS), LNCS, 2006.
  • S Bowers, T McPhillips, B Ludaescher, S Cohen, SB
    Davidson. A Model for User-Oriented Data
    Provenance in Pipelined Scientific Workflows.
    International Provenance and Annotation Workshop
    (IPAW), LNCS, 2006.
  • S Bowers, B Ludaescher, AHH Ngu, T Critchlow.
    Enabling Scientific Workflow Reuse through
    Structured Composition of Dataflow and
    Control-Flow. IEEE Workshop on Workflow and Data
    Flow for Scientific Applications (SciFlow), 2006.
  • S Bowers, B Ludaescher. Actor-Oriented Design of
    Scientific Workflows. International Conference on
    Conceptual Modeling (ER), LNCS, 2005.
  • T McPhillips, S Bowers. Pipelining Nested Data
    Collections in Scientific Workflows. SIGMOD
    Record, 2005.
  • Kepler
  • D Pennington, D Higgins, AT Peterson, M Jones, B
    Ludaescher, S Bowers. Ecological Niche Modeling
    using the Kepler Workflow System. Workflows for
    e-Science, Springer-Verlag, to appear.
  • W Michener, J Beach, S Bowers, L Downey, M Jones,
    B Ludaescher, D Pennington, A Rajasekar, S
    Romanello, M Schildhauer, D Vieglais, J Zhang.
    SEEK Data Integration and Workflow Solutions for
    Ecology. Workshop on Data Integration in the Life
    Sciences (DILS), LNCS, 2005.
  • S Romanello, W Michener, J Beach, M Jones, B
    Ludaescher, A Rajasekar, M Schildhauer, S Bowers,
    D Pennington. Creating and Providing Data
    Management Services for the Biological and
    Ecological Sciences Science Environment for
    Ecological Knowledge. SSDBM, 2005.

15
Kepler Collaboration
  • Open-source
  • Builds on Ptolemy II from UC Berkeley
  • Contributors from
  • SEEK
  • SciDAC SDM
  • Ptolemy
  • GEON
  • ROADNet
  • Resurgence
  • AToL CIPRES, POD
  • Goals
  • Create powerful analytical tools that are useful
    across disciplines
  • Ecology, Biology, Engineering, Geology, Physics,
    Chemistry, Astronomy,

Ptolemy II
Natural Diversity Discovery Project
16
Databases Information Systems (DBIS)
DBIS.ucdavis.edu
DAKS.ucdavis.edu
  • Profs. Michael Gertz, Bertram Ludaescher
  • Drs. Shawn Bowers, Timothy McPhillips, Norbert
    Podhorszki
  • 12 graduate students
Write a Comment
User Comments (0)
About PowerShow.com