IBM UK - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

IBM UK

Description:

IBM UK – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 28
Provided by: MalcolmA1
Category:
Tags: ibm | ds | games

less

Transcript and Presenter's Notes

Title: IBM UK


1
IBM UK Ireland Technical Consultancy
Group Prof. Malcolm Atkinson Director www.nesc.a
c.uk 22nd May 2003
2
Outline
  • What is e-Science?
  • UK e-Science
  • UK e-Science Roles and Resources
  • Scientific Data Curation
  • Data Access Integration
  • Data Analysis Interpretation
  • e-Science driving Disruptive Technology
  • Economic impact, Mobile Code, Decomposition
  • Global infrastructure, optimisation management
  • Dont care where computing

3
What is e-Science?
4
Foundation for e-Science
  • e-Science methodologies will rapidly transform
    science, engineering, medicine and business
  • driven by exponential growth (1000/decade)
  • enabling a whole-system approach

sensor nets
5
Convergence Ubiquity
Multi-national, Multi-discipline,
Computer-enabled Consortia, Cultures Societies
New Opportunities, New Results, New Rewards
6
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular
Modeling and Bioinformatics, Urbana-Champaign
7
global in-flight engine diagnostics
100,000 engines 2-5 Gbytes/flight 5 flights/day
2.5 petabytes/day
Distributed Aircraft Maintenance Environment
Universities of Leeds, Oxford, Sheffield York
8
Tera ? Peta Bytes
  • RAM time to move
  • 15 minutes
  • 1Gb WAN move time
  • 10 hours (1000)
  • Disk Cost
  • 7 disks 5000 (SCSI)
  • Disk Power
  • 100 Watts
  • Disk Weight
  • 5.6 Kg
  • Disk Footprint
  • Inside machine
  • RAM time to move
  • 2 months
  • 1Gb WAN move time
  • 14 months (1 million)
  • Disk Cost
  • 6800 Disks 490 units 32 racks 7 million
  • Disk Power
  • 100 Kilowatts
  • Disk Weight
  • 33 Tonnes
  • Disk Footprint
  • 60 m2

Now make it secure reliable!
May 2003 Approximately Correct See also
Distributed Computing Economics Jim Gray,
Microsoft Research, MSR-TR-2003-24
9
e-Science in the UK
10
Additional UK e-Science Funding
  • First Phase 2001 2004
  • Application Projects
  • 74M
  • All areas of science and engineering
  • gt60 Projects
  • 340 at first All Hands Mtg
  • Core Programme
  • 35M
  • Collaborative industrial projects
  • 80 Companies
  • gt 30 Million
  • Second Phase 2003 2006
  • Application Projects
  • 96M
  • All areas of science and engineering
  • Core Programme
  • 16M 25M (?)
  • Core Grid Middleware

EU money ! 40M Janet upgrade HPC(x) 55M
11
e-Science and SR2002
  • Research Council 2004-6 2001-4
  • Medical 13.1M (8M)
  • Biological 10.0M (8M)
  • Environmental 8.0M (7M)
  • Eng Phys 18.0M (17M)
  • HPC 2.5M (9M)
  • Core Prog. 16.2M ? (15M) 20M
  • Particle Phys Astro 31.6M (26M)
  • Economic Social 10.6M (3M)
  • Central Labs 5.0M (5M)

12
NeSC in the UK
You are here
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
13
UK Grid Operational Heterogeneous
  • Currently a Level-2 Grid based on Globus Toolkit
    2
  • Transition to OGSI/OGSA will prove worthwhile
  • There are still issues to be resolved
  • OGSA definition / delivery
  • Hosting environments Platforms
  • Combinations of Services supported
  • Material and grids to support adopters
  • A schedule of transitions should be
    (approximately provisionally) published
  • Expected time line
  • Now GT2 L2 service GT3 M/W development
    evaluation
  • Q3 Q4 2003 GT2 L3 GT3 L1
  • Q1 Q2 2004 significant project transitions to
    GT3 L2/L3
  • Late Q4 2004 most projects have transitioned
    end GT2 L3

14
Data Access Integration
15
Biology Medicine
  • Extensive Research Community
  • gt1000 per research university
  • Extensive Applications
  • Many people care about them
  • Health, Food, Environment
  • Interacts with virtually every discipline
  • Physics, Chemistry, Nanoengineering,
  • 450 Databases relevant to bioinformatics
  • Heterogeneity, Interdependence, Complexity,
    Change,
  • Wonderful Scientific Questions
  • How does a cell work?
  • How does a brain work?
  • How does an organism develop?
  • Why is the biosphere so stable?
  • What happens to the biosphere when the earth
    warms up?

1 petabyte digital data / hospital / year
gt Lothian Region Hospitals produce more data
than CERN
16
Database Growth
PDB Content Growth
39,856,567,747
17
Infrastructure Architecture
Data Intensive X Scientists

Data Intensive Applications for Science X

Simulation, Analysis Integration Technology for
Science X

Generic Virtual Data Access and Integration Layer

OGSA










OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources

Distributed

Virtual Integration Architecture
18
Data Access Integration Services
19
Data Access and Integration Services

1a. Request to Registry for
sources of data about x
Data

y

Registry

1b. Registry

responds with

Factory handle

2a. Request to Factory for access and

integration from resources Sx and Sy

Factory

2c. Factory

returns handle of GDS to client

3b. Client
2b. Factory creates

tells

GridDataServices network

analyst

Client

3a. Client submits sequence of

scripts each has a set of queries

GDTS

to GDS with XPath, SQL, etc

1
XML
Analyst

GDS

GDTS

database

GDS

2
S

x
GDS

S

y
3c. Sequences of result sets returned to

Relational
analyst as formatted binary described in

GDTS

GDS

GDS

2
3
a standard XML notation

database

1
GDS

GDTS

20
ODD-Genes
PSE
21
Scientific Data
  • Challenges
  • Data Huggers
  • Meagre metadata
  • Ease of Use
  • Optimised integration
  • Dependability
  • Opportunities
  • Global Production of Published Data
  • Volume? Diversity?
  • Combination ? Analysis ? Discovery
  • Opportunities
  • Specialised Indexing
  • New Data Organisation
  • New Algorithms
  • Varied Replication
  • Shared Annotation
  • Intensive Data Computation
  • Challenges
  • Fundamental Principles
  • Approximate Matching
  • Multi-scale optimisation
  • Autonomous Change
  • Legacy structures
  • Scale and Longevity
  • Privacy and Mobility

22
Disruptive e-Science Drivers?
23
Mohammed Mountains
  • Petabytes of Data cannot be moved
  • It stays where it is produced or curated
  • Hospitals, observatories, European Bioinformatics
    Institute,
  • Distributed collaborating communities
  • Expertise in curation, simulation analysis
  • Distributed diverse data collections
  • Discovery depends on insights
  • Tested by combining data from many sources
  • Using sophisticated models algorithms
  • What can you do?

24
Move computation to the data
  • Assumption code size ltlt data size
  • Develop the database philosophy for this?
  • Queries are dynamically re-organised bound
  • Develop the storage architecture for this?
  • Compute closer to disk?
  • System on a Chip using free space in the on-disk
    controller
  • Safe hosting of arbitrary computation
  • Proof-carrying code for data and compute
    intensive tasks robust hosting environments
  • Provision combined storage compute resources
  • Decomposition of applications
  • To ship behaviour-bounded sub-computations to
    data
  • Co-scheduling co-optimisation
  • Data Code (movement), Code execution
  • Recovery and compensation

Dave Patterson SIGMOD 98
25
Software Changes
  • Integrated Problem Solving Environments
  • Users application developers see
  • Abstract computer and storage system
  • Where and how things are executed can be ignored
  • Diversity, detail, ownership, dependability, cost
  • Explicit and visible
  • Increasing sophistication of description
  • Metadata for discovery
  • Metadata for management and optimisation
  • Raising the semantic level of discourse
  • Applications developed dynamically by composition
  • Mobile, Safe Re-organisable Code
  • Predictable Guaranteed behaviour
  • Decomposition re-composition
  • New programming languages understanding needed

26
Organisational Cultural Changes
  • Access to Computation Data must be simple
  • All use a computational, semantic, data-rich web
  • i.e. its invisible the portal / browser lets
    you do more
  • Responsibility of data publishers
  • Cost, dependability, trustworthy, capable,
    flexibility,
  • Shared contributions compose indefinitely
  • Knowledge accumulation and interdependence
  • Contributor recognition and IPR
  • Complexity and management of infrastructure
  • Always on
  • Must be sustained
  • Paid for
  • Hidden

Health, Energy, Finance, Government , Education
Games _at_ Home
27
Comments Questions Please
www.ogsadai.org.uk
www.nesc.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com