Title: Towards eScience: Scientific Data Grid in CAS
1Towards e-Science Scientific Data Grid in CAS
- Yongzheng Ma
- CNIC, CAS (myz_at_cnic.cn)
- APEC TEL GRID WORKSHOP
- Sep. 20, 2004, Singapore
2Outlines
- e-Science activities in CAS
- CAS and e-Science
- Current efforts for e-Science in CAS
- Scientific Data Grid
- Resource
- Middleware
- Applications
- Conclusion
3Whats e-Science
- e-Science
- Informatization of research activities.
4Why e-Science?
- Challenges in modern research
- problems are more complex than ever
- research object is not isolated, but
cross-discipline and large-scale - data processing, simulation and computing become
indispensable methods - more and more communication and collaboration
among scientists
5Background of e-Science for CAS
- CAS launched Knowledge Innovation Program in
1998, its time NOW to push it forward in all
aspects. - Scientists demand a higher level Informatization
to meet their requirements in research
activities. - CAS started the Informatization Program in the
10th Five-year Plan (2001-2005) - Informatization will make great effects on
promotion of technology innovation and knowledge
innovation.
6Informatization of Research Activities
- Bridge the gaps of time, space and environment,
enable global, cross-discipline, large-scale
collaboration among scientists - Change the way how scientists do research,
greatly improve communication and collaboration,
advance the development of science and technology - Informatization of Research Activities is the
pioneer of Informatization of the whole society
7Features of e-Science
- Open
- Resource sharing
- Supercomputer, Data, Instruments,
- Coordinated research
- working with a colleague across an ocean as if
they were within a same lab - cross-discipline, complex, coordinated
problem-solving
8Infrastructure for e-Science
- Computing resources
- Data resources
- Software resources
- Communication resources
- Human resources
- Scientific Instruments
- particle accelerators, telescopes, sensors,
9e-Science and Application
- e-Science provides an informatized environment
and platform for research - Individual applications for fields and areas
should be developed case by case - Application is key
10Milestones of e-Science in CAS
- In 2000, proposed Informatized Research
Environment in the SDB project - In March 2001, proposed Scientific Data Grid
- In August 2001, the project funded by the CAS
Informatization Program - In December 2001, proposed China Science Grid
- In October 2002, Scientific Data Grid joined
the China National Grid and became a key component
11e-Science Activities in CAS (2001-2005)
- Upgrading IT Infrastructure
- Constructing Scientific Research Environment
- Developing Key IT Technologies
- Demonstrating Science Applications
12Upgrading IT Infrastructure
- Networks
- CSTNET
- Domestic links 155M-2.5G
- International links 310M
- CNGI (China Next Generation Internet)
- Supported by National Development and Reform
Commission - 12 GigaPoPs, 2.5-10G links will build by CAS
- Scientific Database
- 10TB
- Supercomputing Environment
- 5 TFLOPS
- Mass Storage System
- 100TB
- Visualization Environment
- SGI Oynx3000
Lenovo 6800, Installed at CNIC
13DeepComp 6800
- Developed by the Lenovo Group Corp, China
- Completed in Nov. 2003
- Installed at CNIC, CAS in December, 2003
- 2.6TB memory
- 81TB disks
- 4.183TFLOPS Linpack performance (78.5
efficiency) - Ranked at 14th in the Top500 list (in Nov, 2003)
14Lenovo DeepComp 6800
15Constructing Scientific Research Environments
based on the Internet
- Network of Field Observatories
- Ecology network
- Astronomical Observatories
- Weather stations
- Mountain disaster stations
-
- Network of Digital Libraries of Specimen
- 24 (zoology, botany, fossil, mineral, ), 80 of
the whole country - Digital Library of Specimen is starting
- Network of Digital Libraries
- National Science Technology Digital Library
- Network of Scientific Instruments
- LAMOST, BEP-II, Electron Microscopes,
16Key IT Technologies
- NGI Technology
- IPv6/IPv4 Transition
- Network Measurement
- IPv6 Root DNS
- Multicast
- Hierarchy Network
- Security
-
- Resource Location Addressing
- Grid Computing
- Data Grid Middleware
- Data Integration
- Grid Information Service
- Grid Security
- Metadata
- Grid-enabled application
-
17Grid-enabled Applications
- Virtual Observatory
- Digital Earth
- HEP Data Grid
- Bio Grid
- Chemical Integrated Information System
18China Science Grid
- By 2005, Scientific Data Grid will have been
built. Sharing of scientific data resources and
collaboration based on it are achieved. - Then, computing resources and scientific
instruments will be integrated into. China
Science Grid will be built on the SDG. - Also, develop grid-enabled applications and
establish application grids bio grid, astro
grid, etc. - China Science Grid an instance of e-Science
19International Collaboration
- PRAGMA, 2002
- GLORIAD, Jan, 2004
- NCSA(US), Kurchatov Institute(RU)
- KISTI (Korea)
- Internet2
- APAN
20Introduction to GLORIAD
- Proposed network/program to be operational in
2004 - Co-developed (and to-be-co-funded) by U.S.,
Russia, China - Expanded capacity for science and education
collaboration (10 Gbps) - New Global Ring topology for reliability and
new applications - Essential for supporting advanced SE
applications (particularly HEP, Astronomy,
Atmospheric Sciences, Bioinformatics,optical
network research, network security research)
21GLOBAL RING NETWORK FOR ADVANCED APPLICATIONS
DEVELOPMENTRussia-China-USA Science Education
Network
22e-Science Planning in Future
- Starting to plan the 11th Five-year
Informatization Program (2006-2010) - Focus on e-Science in CAS
- Work with CNGI (China Next Generation Internet)
- International Collaboration
- GLORIAD
- PRAGMA
- APAN
- Potential Killer Science Applications
- Virtual Observatory
- High Energy Physics
- Bioinformatics
23Scientific Data Grid (SDG)
- An exploration towards e-Science
- Undertaken by CAS
- Background
- Current Status
- Resource constructing
- Middleware developing
- Experimental applications
24Background
- Scientific Data Grid (SDG) is built upon the mass
scientific data resources of the Scientific
Database (SDB). - SDB is a long-term project since 1983, in which
there are multi-disciplinary scientific data
accumulated through the course of science
activities in CAS. - The vision of SDG is to take valuable data
resources into full play by benefiting from
advanced information technologies, in particular,
the Grid technology.
25Data Resources
- Scientific Database (SDB)
- 45 institutions across 16 cities
- 313 databases
- 10TB total volume
- Cover a lot of disciplines
- Chemistry, Biology, Geosciences, Environment,
Astronomy, High energy physics,
26SDG Platform
- Data Center
- Part nodes of DeepComp 6800
- 20TB SAN Storage
- TFLOPS-scale computing
27SDG Software Modules
28SDG Middleware and ToolKits
- Grid Middleware
- Grid Information System
- SDG Uniform Access Interface
- SDG Security System
- SDG Toolkits
SDG GIS V1.0 Universal Metadata Tool
V2.0 Statistics Tool V1.1
29SDG GIS V1.0
SDG Applications
- Backend MDS/LDAP
- Two types of Information
- System info
- Metadata
- Management and Service
- Centralized
- Distributed
Query GRIP GRRP MDR
P
SDG GIIS
MDW
SDG Sub-GIIS
MDIS
DCIS
I-MDIS
C-MDIS
MDIS
DCIS
C-MDIS
I-MDIS
30SDG Universal Metadata Tool
- metadata is tree-like and more flexible than
fix-column tables, difficult to deal with on web
UI - use xml files to store interim results
31Universal Metadata Management Tool
32Statistics Analysis Tool (SAT) for Data Volume
- Features
- Win2000/XP, Linux
- Java 1.4
- Globus Toolkit 3 Core
- Oracle, SQL Server, File System
- Deploy
- Data nodes 45 institutes at CAS, across 16
cities in China - Mediator CNIC
- Service Monitor
33Windows 2k/xpJava 1.4GT3 Core
Statistics Services
34(No Transcript)
35SDG Middleware and ToolKits
- SDG Middleware
- Grid Information System
- SDG Uniform Access Interface
- SDG Security System
- SDG Toolkits
Data Access Subsystem 1.0
36SDG Data Access Service Framework
Application Clients
Grid Level Services
Internet
Information Service
Internet
Oracle
mySQL
Member Institutes
Member Institutes
DB2
SQLServer
Node Level Services Data Resources
Foxpro
FileSystem
37Data Access
38SDG Middleware and ToolKits
- SDG Middleware
- Grid Information System
- SDG Uniform Access Interface
- SDG Security System
- SDG Toolkits
SDG CA V1.0 Access Control Toolkit V1.1
39SDG Security System
- GSI based
- Use certificates to identify users
- Role-based local access control
Full Process of security-related operations under
SDG Security System
40Security Subsystem
41SDG Middleware and ToolKits
- SDG Middleware
- Grid Information System
- SDG Uniform Access Interface
- SDG Security System
- SDG Toolkits
SDG Portal (prototype) Image Process Tool
1.0 Storage Sharing Service
42(No Transcript)
43Pilot Applications
- Virtual Observatory
- High Energy Physics
- Global Climate Data Integration
- Bioinformatics Integration
- Resources and Environment Monitoring
44China Virtual Observatory Demo
45Conclusion
- Today and tomorrows research demands global
collaboration e-Science. - The progress of Information Technology make it
possible. - CAS is making great efforts on e-Science with its
Informatization Program in the 10th Five-year
Plan. - The e-Science Program in the 11th Five-year Plan
(2006-2010) is being worked out. e-Science will
become the groundwork of research in the future
five years. - Scientific Data Grid is the first experimental
project for CAS e-Science. - A few of science applications on SDG would be our
exploration towards e-Science.
46Thank you!