Title: Grid Technology and Applications Developed in Academia Sinica
1Grid Technology and Applications Developed in
Academia Sinica
Eric Yen Computing Centre, Academia Sinica Jan.
23th, 2003
2Requirements
- Basic researches for larger scale problems, e.g.,
High Energy Physics, Bioscience (Genomic
Protein Research), Remote Instrumentation, etc. - Real Applications,
- Generic Types of Problems, and demanding
- Computing power from Teraflops to PetaFlops
- Storage Capacity from Terabytes to PetaBytes
- More and more network bandwidth
- Reliability
- Security
3What Grid can do ?
- Coordinating the sharing of distributed resources
and flexible collaboration thru virtual
organization - Effective management of distributed heterogeneous
resources - Solving larger scale problem which is beyond the
provision of any single institute/supercomputer
in the world - Construction of a secure, reliable, efficient,
and scalable mass storage system environment - Optimize the Usage of Resources
- Facilitate better Sharing and Integration of
Information Resources - Demands of IT for scientific researches in the
new millennium - Management of PetaByte scale storage system
- Collaborative processing
- Sharing and collaborating distributed resources
- Grid is the mainstream for IT infrastructure
4Computer and Network System Resources in AS
5CPU Utilization _at_ Gauss
6CPU Utilization _at_ Euler
7Job Types in HPC _at_ IBM SP
8(No Transcript)
9Biomedical Scientific Network
10TW IPv6 Network Logical Map
11Grid Applications in AS
- High Energy Physics(LCG) Computational Grid,
Data Grid, Access Grid - BioGrid Computational Grid, Data Grid, Access
Grid - In charge of coordination of National Genomic and
Protein Project - Bio-Computing
- Bio-Informatics
- Bio-Diversity
- Bio-Portal
- Computational Chemistry and Computational
Physics Computational Grid, Access Grid - National Digital Archives Data Grid, Access Grid
- In charge of the National Digital Archive Project
- Earth Science and Astronomy Research
Computational Grid, Data Grid - Earthquake Data Center
- Broadband Array in Taiwan for Seismology (BATS)
- Strong Motion Networks
- Taiwan Telemetered Seismographic Network (TTSN,
19731992) - Geospatial Information Science Applications
Data Grid, Access Grid - NSDI
- Web-based Space, Time and Language Content
Architecture - eLearning Access Grid, Data Grid and less
Computational Grid
12The Infrastructure for Integrating Web Services
Grid Technology
Web Services Grid Protocols
Courtesy by IBM Taiwan
13Open Grid Services Architecture
- Objectives
- Manage resources across distributed heterogeneous
platforms - Deliver seamless QoS
- Provide a common base for autonomic management
solutions - Define open, published interfaces
- Exploit industry-standard integration
technologies - Web Services, SOAP, XML,...
- Integrate with existing IT resources
14Open Grid Infrastructure (OGSI)
Grid Service Implementation - Examples
Courtesy by IBM Taiwan
15Architecture Framework
OGSA Software Evolution
Courtesy by IBM Taiwan
16Grid Technology Development in AS
- Technology Developed for PC Cluster
- Load Balance
- Remote Execution Environment (LERR)
- Meta-Queuing System (pQS)
- Resource Metadata and Management System for Grid
- Design and Operation of high performance network
- Construction of Storage Area Network
- We are now porting all these to the Grid
platform, they will be Globus enabled.
17GRID Deployment in AS
- LCG test-bed for both Computing Centre and
Institute of Physics started from 2002 - Globus Toolkit 2.2.x test-bed for parallel
computing environment has been established - Globus Toolkit 2.2.x test-bed for BIO-Cluster
(ready from 2003) - Globus Toolkit 3.0 testing began from Jan. 2003
- Other Works before July 2003
- Building pQS Globus toolkit mixed environment
- Porting LERR-G
- Working on Data Management issue
- Promote GRID technology to our partners
18LHC Computing Grid (LCG)HEP
19HEP group in Taiwan and collaboration joined
- Academia Sinica Joined Fermilab CDF
Collaboration in 1993. - National Central University L3 at CERN (1990)
- National Taiwan University Belle at KEK (1995)
- All three groups join the LHC at CERN.
- LHC is next generation high energy particle
collider. - AS ATLAS(A Toroidal LHC ApparatuS )
- NCU and NTU CMS
- CDF
- Top quake discovery in 1995, Taiwan is one of the
5 countries in CDF. - Evident of CP violation in B sector (1997)
(Sin(2beta) measurement) - Belle
- CP Violation in B Physics(2002).
- L3
- Higgs Search Find one Higgs candidate (2000)
20CDF
- CDF The Collider Detector at Fermilab
- More than 500 physicists work on the
collaboration - The discovery of the top quark was one of the
major results of the CDF collaboration.
- Installation of Silicon Detector into the CDF
Detector
21Requirements
- Sufficient bandwidth for downloading data set
with size of 2 20 GBytes for multiple
researchers at the same time. - Sufficient bandwidth and efficient management to
support stable multipoint video conferencing - Within 2 3 years, the network should be able to
support Tera Byte data file transferring
bi-directionally, not including GRID requirement.
22Taiwan LCG Structures Taiwan domestic
network. Minimum bandwidth is 2.5Gbps. Taipei
GigaPoP is a Metropolitan Fiber Ring, with the
capability to upgrade from 10Gbps to Multi-Lambda
network.
Taiwan International Connectivity
Broadband connections to US, Europe, Japan and
Hong Kong are in place and will be upgraded when
necessary.
CN CERnet
NCTU
AU
NCU Tear 2/3
EU
YMU
NTOU
10G
JP
Academia Sinica (AS) Tear 1/2
1.2G or 2.5G via StarLight in Ph1
US
MOECC
155M ? 622M
622M
Taipei GigaPoP (10G 2.5G)
CGU
155M ? 622M
HK
TANet Schools
Taipei City School Net, GSN, ISPs
CN CSTnet
SG
TH
NTU Tear 2/3
23Resources
Year 2002 2003 2004 2005
Processors 30 90 200 400
SI2000 15K 75K 220K 680K
Disk (TB) 2 10 30 80
Tape (TB) 30 60 120 240
Local Network Bandwidth (MB/s) 1200 1200 1200 1200
Manpower (SysAdmOpSupport) 112 112 112 112
Funding Status F E E E
24Bio-Computing and CRASA
25Conceptual Bio-Grid Application Infrastructure
26Bio-Grid Application Platform
- Bio-Cluster
- PC Farm Project (1996)
- More 330 computing nodes (2002)
- Dedicated 64 CPUs BioCluster (Mar. 2002)
- BioGrid in AS (IBMS, ASCC)
- Business Process
- Oracle (Compaq, IBM, SUN, Linux)
- 64 CPUs BioCluster
- Parallel CRASA (IBMS)
- IBM DiscoveryLink pilot project (dbEST, dbSNP,
Swiss-Prot) - NGC LIMS
- ENU mouse database design
- Microarray Database (SMD on Linux)
- http//bits.sinica.edu.tw
27A New Tool for Sequence Analysis - CRASA
- What is CRASA?
- The advantages of CRASA
- Complexity Reduction Algorithm for Sequence
Analysis - A homology based tool for annotating long
genomic sequence (e.g. Human Chromosome) - Global sequence alignment for genome annotation
- Dynamic(Progressive) data structure (Multi-level
Pyramid Data Structure) - Parallel processing
- Low memory requirement
- Long genomic sequence annotation
- High accuracy of gene prediction
28System Overview (CRASA)
Masking
Genomic DNA Sequences
cDNA Database (HGI)
RepBase Database
CR Pyramid constructing (256 patterns)
CR Processing
Query
CR Pyramid Database
Pattern alignment
No
Match length ? 60 bp ? Match fragments ? 3 ?
Filtering
Yes
Exons
31
89/12/20 pm
29Introduction
- Using Globus Toolkit v2.2
- Compiling CRASA program with MPICH-G/PGI compiler
- Using globusrun to run the CRASA program on GRID
30Start the Grid proxy
31Compose the RSL script
32Globusrun
33CRASA is running on another machine
34Benchmark of CRASA Part I
- Gene Prediction Performance
- For Human Chromosome 21 (33.9Mbps, gene poor)
- 48 minutes (11.5 kbps/sec)
- For Human Chromosome 22 (34.0Mbps, gene rich)
- 100 minutes (5.7 kbps/sec)
35Benchmark of CRASA Part II
36Benchmark of CRASA Part IV
- Length of Query Sequence v.s. Elapsed Time
37Digital Archive Data Grid
38Scope of Digital Archives
Domain Expertise
e-Research
Culture and Knowledge Background
Being Digitised
e-Learning
Digital Archives
Enterprise Intelligence
Born Digital
General Knowledge Base
Business Process and Lifecycle
39Why Knowledge-based Approach for Digital Archives
- Passive Requirements for long-term scalable and
persistent archives while the technology evolves - Active Requirements for generation of new
knowledge (for easily discover new and unexpected
patterns, trends and relationships that can be
hidden deep within very large and diverse
datasets)
40Content Management Challenges1
- Separating content from presentation
- Versioning, Roll-back
- Data/Information re-use
- Re-purposing of Information, flexible Output
- Workflow, submit, review, approve, store
41Content Management Challenges2
- Integrating diversified contents and external
sources - System and roles-based security
- Metadata Management
- Compute and Storage resources on demand
- Reliability and Scalability
42Basic Functions of a CMS
- A CMS manages the path from authoring through to
publishing using a scheme of workflow and by
providing a system for content storage and
integration. - Authoring/Capturing
- Workflow
- Integration and Storage
- Publishing/Dissemination
43 The CMS Feature List
44Access Grid
45Access Grid for Collaborative Env.
- Multi-point Video Conference Facilities
- MCU-based 24 concurrent sessions
- VRVS
- H.320/H.323
- WhiteBoard
- Video Server
- Web-based Content Retrieval and Dissemination
46eLearning Data Grid
47Challenge and Goals of eLearning
- Challenge
- Building Knowledge Society
- Ubiquitous Learning
- Emergence of New Learning Models --gt Workflow
Analysis - The most efficient implementation
- Adaptation to technology changes
- Goals
- Learning how to learn
- Helping people with disabilities more easier to
learn - Life Long Learning and Life Long Teaching
- Training at All Levels
- Formation of Learning Society
48Basic Requirements of eLearning
- Combination of either Learner Centric or Teacher
Centric, for making the most outcome - Diversified, Large Amount, Distributed and better
accessed Learning Resources - Well Organized and Complete Content Description
- Integration of heterogeneous Information
Resources - On Demand and Ubiquitous Learning for anyone
- Toward Effective Knowledge Discovery and Well
Knowledge Organization Management
49How to Get There?
- Open Source eLearning Platform
- Web-based virtual learning, teaching and
informing - Robust, distributed collaborated and ubiquitous
computing environment as the infrastructure --gt
demands for Grid Infrastructure ! - Standardization
- Well-defined specification
- Interoperability Mechanism for conversion,
transformation, and exchange, etc. - Integration
- Building Community for
- Developing Common tools
- Technical Study Support
- Requirements Collection
- Planning
- Suggestions to National Strategy
- Grid Infrastructure
- Learning Resources
- eLearning Services
50Progress of eLearning in Taiwan
- Master Plan of Information Technology in
Education for Primary and Secondary Schools - Ministry of Education, 2001 --2005
- 20 of curriculum time of using IT
- 600 seed schools
- Training teacher teams
- Equipping teachers with notebook computers
- Program of Science and Technology for e-Learning
(2003) - Cross Ministry initiative
- 130 million US for 5 years
- Led by the President Liu of NCU
51Pilot Projects for eLearning in AS
- Social University for Adults Learning
- Community University for Minority, e.g.,
Indigenous People - Parallel Programming and Computing Applications
- Survey of the standardization of metadata for
eLearning
52Grid Architecture for eLearning
53GBIF --gt Biodiversity Data Grid
54International Biodiversity Collaboration
55Earth Science Data Center