Towards eScience: Scientific Data Grid in CAS - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Towards eScience: Scientific Data Grid in CAS

Description:

Sharing of scientific data resources and collaboration based on it are achieved. ... Today and tomorrow's research demands global collaboration e-Science. ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 47
Provided by: apgridGrid
Category:

less

Transcript and Presenter's Notes

Title: Towards eScience: Scientific Data Grid in CAS


1
Towards e-Science Scientific Data Grid in CAS
  • Yongzheng Ma
  • CNIC, CAS (myz_at_cnic.cn)
  • APEC TEL GRID WORKSHOP
  • Sep. 20, 2004, Singapore

2
Outlines
  • e-Science activities in CAS
  • CAS and e-Science
  • Current efforts for e-Science in CAS
  • Scientific Data Grid
  • Resource
  • Middleware
  • Applications
  • Conclusion

3
Whats e-Science
  • e-Science
  • Informatization of research activities.

4
Why e-Science?
  • Challenges in modern research
  • problems are more complex than ever
  • research object is not isolated, but
    cross-discipline and large-scale
  • data processing, simulation and computing become
    indispensable methods
  • more and more communication and collaboration
    among scientists

5
Background of e-Science for CAS
  • CAS launched Knowledge Innovation Program in
    1998, its time NOW to push it forward in all
    aspects.
  • Scientists demand a higher level Informatization
    to meet their requirements in research
    activities.
  • CAS started the Informatization Program in the
    10th Five-year Plan (2001-2005)
  • Informatization will make great effects on
    promotion of technology innovation and knowledge
    innovation.

6
Informatization of Research Activities
  • Bridge the gaps of time, space and environment,
    enable global, cross-discipline, large-scale
    collaboration among scientists
  • Change the way how scientists do research,
    greatly improve communication and collaboration,
    advance the development of science and technology
  • Informatization of Research Activities is the
    pioneer of Informatization of the whole society

7
Features of e-Science
  • Open
  • Resource sharing
  • Supercomputer, Data, Instruments,
  • Coordinated research
  • working with a colleague across an ocean as if
    they were within a same lab
  • cross-discipline, complex, coordinated
    problem-solving

8
Infrastructure for e-Science
  • Computing resources
  • Data resources
  • Software resources
  • Communication resources
  • Human resources
  • Scientific Instruments
  • particle accelerators, telescopes, sensors,

9
e-Science and Application
  • e-Science provides an informatized environment
    and platform for research
  • Individual applications for fields and areas
    should be developed case by case
  • Application is key

10
Milestones of e-Science in CAS
  • In 2000, proposed Informatized Research
    Environment in the SDB project
  • In March 2001, proposed Scientific Data Grid
  • In August 2001, the project funded by the CAS
    Informatization Program
  • In December 2001, proposed China Science Grid
  • In October 2002, Scientific Data Grid joined
    the China National Grid and became a key component

11
e-Science Activities in CAS (2001-2005)
  • Upgrading IT Infrastructure
  • Constructing Scientific Research Environment
  • Developing Key IT Technologies
  • Demonstrating Science Applications

12
Upgrading IT Infrastructure
  • Networks
  • CSTNET
  • Domestic links 155M-2.5G
  • International links 310M
  • CNGI (China Next Generation Internet)
  • Supported by National Development and Reform
    Commission
  • 12 GigaPoPs, 2.5-10G links will build by CAS
  • Scientific Database
  • 10TB
  • Supercomputing Environment
  • 5 TFLOPS
  • Mass Storage System
  • 100TB
  • Visualization Environment
  • SGI Oynx3000

Lenovo 6800, Installed at CNIC
13
DeepComp 6800
  • Developed by the Lenovo Group Corp, China
  • Completed in Nov. 2003
  • Installed at CNIC, CAS in December, 2003
  • 2.6TB memory
  • 81TB disks
  • 4.183TFLOPS Linpack performance (78.5
    efficiency)
  • Ranked at 14th in the Top500 list (in Nov, 2003)

14
Lenovo DeepComp 6800
15
Constructing Scientific Research Environments
based on the Internet
  • Network of Field Observatories
  • Ecology network
  • Astronomical Observatories
  • Weather stations
  • Mountain disaster stations
  • Network of Digital Libraries of Specimen
  • 24 (zoology, botany, fossil, mineral, ), 80 of
    the whole country
  • Digital Library of Specimen is starting
  • Network of Digital Libraries
  • National Science Technology Digital Library
  • Network of Scientific Instruments
  • LAMOST, BEP-II, Electron Microscopes,

16
Key IT Technologies
  • NGI Technology
  • IPv6/IPv4 Transition
  • Network Measurement
  • IPv6 Root DNS
  • Multicast
  • Hierarchy Network
  • Security
  • Resource Location Addressing
  • Grid Computing
  • Data Grid Middleware
  • Data Integration
  • Grid Information Service
  • Grid Security
  • Metadata
  • Grid-enabled application

17
Grid-enabled Applications
  • Virtual Observatory
  • Digital Earth
  • HEP Data Grid
  • Bio Grid
  • Chemical Integrated Information System

18
China Science Grid
  • By 2005, Scientific Data Grid will have been
    built. Sharing of scientific data resources and
    collaboration based on it are achieved.
  • Then, computing resources and scientific
    instruments will be integrated into. China
    Science Grid will be built on the SDG.
  • Also, develop grid-enabled applications and
    establish application grids bio grid, astro
    grid, etc.
  • China Science Grid an instance of e-Science

19
International Collaboration
  • PRAGMA, 2002
  • GLORIAD, Jan, 2004
  • NCSA(US), Kurchatov Institute(RU)
  • KISTI (Korea)
  • Internet2
  • APAN

20
Introduction to GLORIAD
  • Proposed network/program to be operational in
    2004
  • Co-developed (and to-be-co-funded) by U.S.,
    Russia, China
  • Expanded capacity for science and education
    collaboration (10 Gbps)
  • New Global Ring topology for reliability and
    new applications
  • Essential for supporting advanced SE
    applications (particularly HEP, Astronomy,
    Atmospheric Sciences, Bioinformatics,optical
    network research, network security research)

21
GLOBAL RING NETWORK FOR ADVANCED APPLICATIONS
DEVELOPMENTRussia-China-USA Science Education
Network
22
e-Science Planning in Future
  • Starting to plan the 11th Five-year
    Informatization Program (2006-2010)
  • Focus on e-Science in CAS
  • Work with CNGI (China Next Generation Internet)
  • International Collaboration
  • GLORIAD
  • PRAGMA
  • APAN
  • Potential Killer Science Applications
  • Virtual Observatory
  • High Energy Physics
  • Bioinformatics

23
Scientific Data Grid (SDG)
  • An exploration towards e-Science
  • Undertaken by CAS
  • Background
  • Current Status
  • Resource constructing
  • Middleware developing
  • Experimental applications

24
Background
  • Scientific Data Grid (SDG) is built upon the mass
    scientific data resources of the Scientific
    Database (SDB).
  • SDB is a long-term project since 1983, in which
    there are multi-disciplinary scientific data
    accumulated through the course of science
    activities in CAS.
  • The vision of SDG is to take valuable data
    resources into full play by benefiting from
    advanced information technologies, in particular,
    the Grid technology.

25
Data Resources
  • Scientific Database (SDB)
  • 45 institutions across 16 cities
  • 313 databases
  • 10TB total volume
  • Cover a lot of disciplines
  • Chemistry, Biology, Geosciences, Environment,
    Astronomy, High energy physics,

26
SDG Platform
  • Data Center
  • Part nodes of DeepComp 6800
  • 20TB SAN Storage
  • TFLOPS-scale computing

27
SDG Software Modules
28
SDG Middleware and ToolKits
  • Grid Middleware
  • Grid Information System
  • SDG Uniform Access Interface
  • SDG Security System
  • SDG Toolkits

SDG GIS V1.0 Universal Metadata Tool
V2.0 Statistics Tool V1.1
29
SDG GIS V1.0
SDG Applications
  • Backend MDS/LDAP
  • Two types of Information
  • System info
  • Metadata
  • Management and Service
  • Centralized
  • Distributed



Query GRIP GRRP MDR
P
SDG GIIS
MDW
SDG Sub-GIIS
MDIS
DCIS
I-MDIS
C-MDIS
MDIS
DCIS
C-MDIS
I-MDIS
30
SDG Universal Metadata Tool
  • metadata is tree-like and more flexible than
    fix-column tables, difficult to deal with on web
    UI
  • use xml files to store interim results

31
Universal Metadata Management Tool
32
Statistics Analysis Tool (SAT) for Data Volume
  • Features
  • Win2000/XP, Linux
  • Java 1.4
  • Globus Toolkit 3 Core
  • Oracle, SQL Server, File System
  • Deploy
  • Data nodes 45 institutes at CAS, across 16
    cities in China
  • Mediator CNIC
  • Service Monitor

33
Windows 2k/xpJava 1.4GT3 Core
Statistics Services
34
(No Transcript)
35
SDG Middleware and ToolKits
  • SDG Middleware
  • Grid Information System
  • SDG Uniform Access Interface
  • SDG Security System
  • SDG Toolkits

Data Access Subsystem 1.0
36
SDG Data Access Service Framework
Application Clients
Grid Level Services
Internet
Information Service
Internet

Oracle
mySQL
Member Institutes
Member Institutes
DB2
SQLServer
Node Level Services Data Resources
Foxpro
FileSystem


37
Data Access
38
SDG Middleware and ToolKits
  • SDG Middleware
  • Grid Information System
  • SDG Uniform Access Interface
  • SDG Security System
  • SDG Toolkits

SDG CA V1.0 Access Control Toolkit V1.1
39
SDG Security System
  • GSI based
  • Use certificates to identify users
  • Role-based local access control

Full Process of security-related operations under
SDG Security System
40
Security Subsystem
41
SDG Middleware and ToolKits
  • SDG Middleware
  • Grid Information System
  • SDG Uniform Access Interface
  • SDG Security System
  • SDG Toolkits

SDG Portal (prototype) Image Process Tool
1.0 Storage Sharing Service
42
(No Transcript)
43
Pilot Applications
  • Virtual Observatory
  • High Energy Physics
  • Global Climate Data Integration
  • Bioinformatics Integration
  • Resources and Environment Monitoring

44
China Virtual Observatory Demo
45
Conclusion
  • Today and tomorrows research demands global
    collaboration e-Science.
  • The progress of Information Technology make it
    possible.
  • CAS is making great efforts on e-Science with its
    Informatization Program in the 10th Five-year
    Plan.
  • The e-Science Program in the 11th Five-year Plan
    (2006-2010) is being worked out. e-Science will
    become the groundwork of research in the future
    five years.
  • Scientific Data Grid is the first experimental
    project for CAS e-Science.
  • A few of science applications on SDG would be our
    exploration towards e-Science.

46
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com