Grid computing and e-Science - PowerPoint PPT Presentation

About This Presentation
Title:

Grid computing and e-Science

Description:

Grid computing and e-Science Lecturer: PhD. Ph m Tr n V Presenter: Phan Quang Thi n Tr n Ph c Hi p Nguy n Minh Nh t – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 51
Provided by: eduv3
Category:

less

Transcript and Presenter's Notes

Title: Grid computing and e-Science


1
Grid computing and e-Science
  • Lecturer PhD. Ph?m Tr?n Vu
  • Presenter Phan Quang Thi?n
  • Tr?n Phu?c Hi?p
  • Nguy?n Minh Nh?t

2
Outline
  • Whats e-science
  • New modes of scientific inquiry
  • Fault diagnosis and prognostic system
  • Grid service for diagnostic problem
  • Distributed Aircraft Maintenance
    Environment(DAME) project
  • Conclusion

3
Whats e-Science?
  • e-Science is about global collaboration in key
    areas of science, and the next generation of
    infrastructure that will enable it.
  • John Taylor
  • Director General of Research Councils
  • Office of Science and Technology
  • Purpose of the UK e-Science initiative is to
    allow scientists to do faster, better or
    different research

4
Cyberinfrastructure/e-Infrastructure and the
Grid (from NSF)
  • At the heart of the cyberinfrastructure vision
    is the development of a cultural community that
    supports peer-to-peer collaboration and new modes
    of education based upon broad and open access to
    leadership computing data and information
    resources online instruments and observatories
    and visualization and collaboration services.
  • Dr. Arden L. Bement, Jr. , Director of National
    Science Foundation
  • Includes not only computers but also data storage
    resources and specialized facilities
  • Long term goal is to develop the middleware
    services that allow scientists to routinely build
    the infrastructure for their Virtual
    Organisations

5
New Modes Of Scientific
  • Data-intensive science
  • Simulation-Based Science
  • Remote Access to Experimental Apparatus

6
Data-intensive science
  • Worldwide, scientists and engineers are
    producing, accessing, analyzing, integrating and
    storing terabytes of digital data daily through
    experimentation, observation and simulation
  • These vast amount of data needs to be
    preprocessed and distributed for further
    analysis.

7
Data-intensive science (Cont)
Annual data storage 12-14 PetaBytes/year
Each of the four LHC experiments will
generate several petabytes of experimental data
per year
8
Simulation-Based Science
  • The Japanese Earth Simulator was in 2003 running
    numerical simulations of Earths climate at a
    sustained rate of 40 teraflop/sec.
  • The U.S. Encyclopedia of Life (EOL) project.
  • http//www.eol.org/
  • The UK Comb-e-Chem project
  • The goal of this project is to synthesize large
    numbers of new compounds by high-throughput
    combinatorial methods and then map their
    structure and properties.

Structure Properties
Knowledge Prediction
9
Remote Access to Experimental Apparatus
  • The advance of technology is also producing
    revolutionary new experimental apparatus.
  • Allow remote participants to design, execute, and
    monitor experiments.

10
Remote Access to Experimental Apparatus (Cont)
  • Sharing engineering research equipment, data
    resources, and leading edge computing resources.
  • Remote access to perform teleobservation and
  • teleoperation of experiments.

11
Virtual organizations for distributed communities
  • The convergence of information, grid, and
    networking technologies with contemporary
    communications now enables science and
    engineering communities to pursue their research
    and learning goals in real-time and without
    regard to geography.
  • The size and/or complexity of the problem
    requires that people in several organizations
    collaborate and share computing resources, data,
    instruments
  • Virtual organization
  • A set of individuals and/or institutions
    defined by such sharing rules
  • In other words, VOs are dynamic
    federations of heterogeneous organizational
    entities sharing data, metadata, processing and
    security infrastructure

12
Framing New Infrastructures
  • If you need huge Computing Power and/or Data
    Storage
  • If do not have a supercomputer in your
    institution
  • If you have access to a reasonable network
    connection
  • ? Grid (Distributed Computing) could be a good
    solution

13
Client Server ad hock model
Scientist
14
The Grid Model - Information Utilities
MIDLEWARE
Scientist
15
Scientists
Need something here
  • Infrastructure

16
use Web 2.0 here
Grid
17
The social process of science
Undergraduate Students
Digital Libraries
scientists
Graduate Students
experimentation
Data, Metadata Provenance WorkflowsOntologies
18
An e-Science Grid Framework
19
Scientific Workflows
  • Capture individual data transformation and
    analysis steps
  • Large monolithic applications broken down to
    smaller jobs
  • Smaller jobs can be independent or connected by
    some control flow/ data flow dependencies
  • Usually expressed as a Directed Acyclic Graph of
    tasks
  • Allows the scientists to modularize their
    application
  • Scaled up execution over several computational
    resources

20
Workflow
  • Workflows orchestrate processes on the Grid
  • Workflows are a processing model that incorporate
    tasks, data, and rules.
  • Workflow management systems execute tasks on the
    Grid using data once the tasks dependencies are
    satisfied based on rules.

21
Workflow (cont)
  • A decision system that develops strategies for
    reliable and efficient execution in a variety of
    environments.
  • Reliable and scalable execution of
  • dependent tasks
  • Reliable, scalable execution of independent tasks
    (locally, across the network), priorities,
    scheduling
  • Cyberinfrastructure Local machine, cluster, PBS
    (Condor) pool, Grid

22
Execute Environment
  • Globus and Condor Services for job scheduling
  • Globus Services for data transfer and Cataloging
  • Information Services
  • - information about data location
  • - information about the execution sites

23
The Grid Problem
  • Everyday researchers doing everyday research
  • BUT heroic Grid infrastructure not being
    adopted
  • A data-centric perspective, like researchers
  • BUT Grid gives APIs to computation not data
  • Collaborative and participatory
  • BUT Grid has deeply rooted service provider
    mindset
  • Better not Perfect
  • BUT Grid aims to provide well-engineered
    perfect solution
  • Giving autonomy to researchers
  • BUT Grid imposes institutional control (at this
    time)
  • About pervasive computing
  • BUT Grid is about portals, not the next
    generation of users

24
Summary
  • e-Science is about doing new science
  • Grid is just one part of the solution
  • Users are not just consumers of infrastructure.
    Empower them.
  • Think Web 2.0 on top of Grid and other services
  • Workflows make e-Science easier, and Web 2 makes
    workflows easier.

25
Diagnosis and prognostic system
  • Computer-based fault diagnosis and prognostic
    (DP)
  • Arise in many domains medicine, engineering,
    transport, and aero-space

26
Operational Scenario
Engine flight data
London Airport
Airline office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
27
Diagnosis and prognostic (DP) System
  • Data-centric
  • Require complex interactions among agents
  • Distributed
  • Need to provide supporting and qualifying
    evidence for the DP offered
  • Safety and business critical and high
    dependability requirements

28
Data Centricity
  • Integrating data from several different system
    for root cause determination
  • Require vast data repositories
  • The types of data can also be highly diverse
  • Not only sensor data but also non-declarative
    knowledge
  • The interpretation of the knowledge can vary
    among the entities

29
Data Centricity
  • Grid computing
  • Knowledge and semantics (chapter 23)
  • Solutions for the management and archiving of
    large data repositories
  • Remote collection and distribution of data
  • Coherent integration of information from diverse
    databases (chapter 22)

30
Multiple stakeholders
  • Involve a number of stakeholders
  • The system owner
  • Experts
  • The commercial service provider
  • .
  • Grid computing
  • Interaction of diverse parts is inherent within
    the Grid computing model

31
Distribution
  • Data storage, data mining, and fault diagnosis
    may take place at different location
  • Across diverse IT systems
  • The system can also be highly dynamic involving
    a number of disparate entities (virtual, change
    often)

32
Distribution
  • Grid computing
  • The standardization of communication and
    application protocols in the Grid paradigm
  • Grid portal support effective interactions with
    users

33
Data Provenance
  • Transparency and trust results
  • Steps to arrive at a decision
  • Grid computing
  • Develop open data communication protocols
  • Meta-labeling schemes

34
Dependability
  • Guaranteed service availability
  • Data security
  • System security

35
Dependability
  • Grid computing
  • Offer a security model to secure distributed
    computing (chapter 21)
  • Address data access and data confidentiality
  • The concept of guaranteed service and
    quality-of-service (chapter 18)

36
The aero-engine DP problem
  • Modern aero-engine must operate with extremely
    high reliability
  • Combine advanced mechanical engineering systems
    with electronic control systems
  • Using engine sensor
  • Prognostic applications

37
DAME project
Engine flight data
London Airport
Airline office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
38
DAME project
  • Principal challenges
  • Vast data repositories
  • Advanced pattern-matching and data-mining
    methods with suitable response times
  • Collaboration among a number of diverse actors

39
DAME service
DAME Diagnostics

Portal
...
Case Based
Modelling/
Decision
QUOTE
Support
Reasoning
Grid Services Management
Simulation
Novel
Data
The Grid
l
a
e
n
Data-Mining
c
s
a
a
o
i
t
a
e
t
i
t
v
r
t
t
a
n
a
a
a
a
w
r
a
r
i
t
e
D
D
D
a
g
P
e
a
S
p
n
R
D
O
E
Vibration
Shaft Speed
Fuel Flow
40
Core services and tools
  • Engine data service
  • Data storage and mining service
  • Engine modeling service
  • Case-based reasoning support
  • Maintenance interface service

41
Engine data service
  • Control the interaction between QUOTE system and
    its communication to ground station
  • Establish the link to the Grid data repositories.
  • Many replication of this service highly
    transient

42
Data storage and mining service
  • Consists of the AURA patter-matching engine
    system
  • Use specialized methods to rapidly search both
    raw and archived engine data
  • Resemble data-mining service

43
Engine modeling service
  • Infer the current state of the engine
  • Perform model-based data fusion

44
Case-based reasoning support
  • Use case-based reasoning to improve the knowledge
    base
  • Capture fault DP methods in a procedural way
  • Manage workflows associated with DP operations
  • Build and maintain the DAME knowledge base

45
Maintenance interface service
  • Organize all interaction with stake-holders
    involved in taking remedial actions
  • Capture information that helps validate or refine
    the output from the preceding DP processes

46
(No Transcript)
47
(No Transcript)
48
Conclusion
  • Ambitious vision for the future of science and
    engineering
  • The realization of this vision will require
    long-term investments of financial resources
  • Should not underestimate the difficulty of the
    technical challenges before realize the vision
  • The realization of these goals is extremely
    important for the future of science and
    engineering

49
Q A
  • Thank you!

50
Reference
  • I. Foster and C. Kesselman, The Grid 2
    Blueprint for a New Computing Infrastructure.
    Morgab Kaufmann Publishers, 1999.
  • Cyberinfrastructure Vision for 21st Century
    Discovery (NSF)
  • National e-Science centre http//www.nesc.ac.uk/
    action/esi/
  • Dame homepage http//www.cs.york.ac.uk/dame/
Write a Comment
User Comments (0)
About PowerShow.com