Title: Scientific%20Data%20Infrastructure%20in%20CAS
1Scientific Data Infrastructure in CAS
- Dr. Jianhui Li(lijh_at_cnic.cn)
- Scientific Data Center
- Computer Network Information Center
- Chinese Academy of Sciences
2Scientific Data infrastructure
Application enabled environments and typical
applications
Middle ware (Scientific data grid middleware,
internet-based storage service middleware)
Software and Toolkits (scientific data
collection, curation, and publishing, data
analyzing and visualization)
Massive storage system Data-intensive computing
facilities High speed network
3DRC Data Resource Center
- A new organization responsible for data
preservation, curation and access service in CAS
Long-term preservation of important data
Data Resource Center
collaborator
Technology service
Network storage space
Management system
staff
mass data
Application service
Data online service
Mass data analysis and process
system environment
Mass data backup
4Infrastructure for DRC
- High Speed Network
- 2Gbps linked with CSTNET
- 2 Gbps linked with CSTNET-CNGI
- GLORIAD
- Data Intensive Computing facilities
- 1000 CPU Core Clusters Scientific Computing
Grid(200Tflops) - Massive Storage System
- 1PB online disk 5PB Tape
- A storage network will start to build this year
- 1 center 1 archive center 10 storage nodes
around China - Over 20PB
5Scientific Databases (SDB)
- A Long-term mission started in 1986 which funded
by CAS - many institutes involved
- long-term, large-scale collaboration
- data from research, for research
- Collecting multi-discipline research data and
promoting data sharing - More than 350 research databases and 400 datasets
by 61 institutes - Over 60TB data available to open access and
download
http//www.csdb.cn
6Scientific Databases (cont.)
- SDB Contents
- Physics Chemistry, Geosciences, Biosciences,
Atmospheric Ocean Science, Energy Science,
Material Science, Astronomy Space Science
7Scientific Databases (cont.)
- Database integration
- Resource database
- Reference database
- Application oriented database
Application oriented database
Reference database
Resource database
Research database
Research database
8Scientific Databases (cont.)
- 2 Reference databases
- China Species
- compound
- 4 application-Oriented databases
- High Energy (ITER)
- Western Environment Research
- Ecology research
- Qinghai Lake Research
- 8 Resource databases
- Geo-Science
- Biodiversity
- Chemistry
- Astronomy
- Space Science
- Micro biology and virus
- Material science
- Environment
9CAS Scientific Data Grid
- Based on Scientific Data Grid Middleware (SDG)
- SDG is built upon the Scientific Database,
supporting to find and access large scale,
distributed and heterogeneous scientific data
uniformly and conveniently in a SECURE and proper
way - Building scientific data application grid
according to domain requirements - Integrate distributed data, analysis tools and
storage and computing facilities, providing a
uniform data service interface - 4 pilot grids
- bioscience grid
- geoscience grid
- Chemistry grid
- Astronomy and space science grid
10Function Framework of SDG
- A scalable and integrated data sharing
environment - Providing services for grid users, grid managers
and resource provides - Operating by the operation center, science
gateways and data nodes
User
Grid Manager
Resource Provider
Operation Center
Science Gateway
Data Node
11Access Scientific Data Grid
12VisualDB - Powered your database
- A toolkit to manage, publish and share
scientific database by visual configure interface
without writing codes - A database integration access broker
- A data quality assessment tool
- A database access and usage statistics tool
13Function Framework of VisualDB
14Catalog Builder
15 Security Center
16Data Forge
17vReport
18Application enabled environments and typical
applications
- Domain specific data intensive application
environment - Support one specific research area
- Integrated scientific data, storage, computing
analysis model and tools - An easily and friendly interactive interface
- Scalable user defined data process workflow
- Typical pilot systems
- Remote sensing data on-demand accessing and
processing service environment - CFCI - China FLUX Cyber-Infrastructure
- DarwinTreeMolecular data analysis and
application environment - Atmospheric science data integration analysis
platform
19Atmospheric science data integration analysis
platform
20Atmospheric science data integration analysis
platform
- Problems
- The size of Atmospheric data has reached TB level
and they are distributed. - The personal computer hard disk, memory limit of
the research work - Many algorithm finished by scientific researcher
cant be shared easily.
21Architecture
Scientific Data Analysis Online Platform
22 work flow
Five step
Iterative
23Select data
24Choose algorithm
25Config param
26plot and result
27Thank you!