Grid Technology and Applications Developed in Academia Sinica - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Grid Technology and Applications Developed in Academia Sinica

Description:

Biomedical & Scientific Network. TW IPv6 Network Logical Map. Grid Applications in AS ... Bio-Informatics. Bio-Diversity. Bio-Portal ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 56
Provided by: eric66
Category:

less

Transcript and Presenter's Notes

Title: Grid Technology and Applications Developed in Academia Sinica


1
Grid Technology and Applications Developed in
Academia Sinica
Eric Yen Computing Centre, Academia Sinica Jan.
23th, 2003
2
Requirements
  • Basic researches for larger scale problems, e.g.,
    High Energy Physics, Bioscience (Genomic
    Protein Research), Remote Instrumentation, etc.
  • Real Applications,
  • Generic Types of Problems, and demanding
  • Computing power from Teraflops to PetaFlops
  • Storage Capacity from Terabytes to PetaBytes
  • More and more network bandwidth
  • Reliability
  • Security

3
What Grid can do ?
  • Coordinating the sharing of distributed resources
    and flexible collaboration thru virtual
    organization
  • Effective management of distributed heterogeneous
    resources
  • Solving larger scale problem which is beyond the
    provision of any single institute/supercomputer
    in the world
  • Construction of a secure, reliable, efficient,
    and scalable mass storage system environment
  • Optimize the Usage of Resources
  • Facilitate better Sharing and Integration of
    Information Resources
  • Demands of IT for scientific researches in the
    new millennium
  • Management of PetaByte scale storage system
  • Collaborative processing
  • Sharing and collaborating distributed resources
  • Grid is the mainstream for IT infrastructure

4
Computer and Network System Resources in AS
5
CPU Utilization _at_ Gauss
6
CPU Utilization _at_ Euler
7
Job Types in HPC _at_ IBM SP
8
(No Transcript)
9
Biomedical Scientific Network
10
TW IPv6 Network Logical Map
11
Grid Applications in AS
  • High Energy Physics(LCG) Computational Grid,
    Data Grid, Access Grid
  • BioGrid Computational Grid, Data Grid, Access
    Grid
  • In charge of coordination of National Genomic and
    Protein Project
  • Bio-Computing
  • Bio-Informatics
  • Bio-Diversity
  • Bio-Portal
  • Computational Chemistry and Computational
    Physics Computational Grid, Access Grid
  • National Digital Archives Data Grid, Access Grid
  • In charge of the National Digital Archive Project
  • Earth Science and Astronomy Research
    Computational Grid, Data Grid
  • Earthquake Data Center
  • Broadband Array in Taiwan for Seismology (BATS)
  • Strong Motion Networks
  • Taiwan Telemetered Seismographic Network (TTSN,
    19731992)
  • Geospatial Information Science Applications
    Data Grid, Access Grid
  • NSDI
  • Web-based Space, Time and Language Content
    Architecture
  • eLearning Access Grid, Data Grid and less
    Computational Grid

12
The Infrastructure for Integrating Web Services
Grid Technology
Web Services Grid Protocols
Courtesy by IBM Taiwan
13
Open Grid Services Architecture
  • Objectives
  • Manage resources across distributed heterogeneous
    platforms
  • Deliver seamless QoS
  • Provide a common base for autonomic management
    solutions
  • Define open, published interfaces
  • Exploit industry-standard integration
    technologies
  • Web Services, SOAP, XML,...
  • Integrate with existing IT resources

14
Open Grid Infrastructure (OGSI)
Grid Service Implementation - Examples
Courtesy by IBM Taiwan
15
Architecture Framework
OGSA Software Evolution
Courtesy by IBM Taiwan
16
Grid Technology Development in AS
  • Technology Developed for PC Cluster
  • Load Balance
  • Remote Execution Environment (LERR)
  • Meta-Queuing System (pQS)
  • Resource Metadata and Management System for Grid
  • Design and Operation of high performance network
  • Construction of Storage Area Network
  • We are now porting all these to the Grid
    platform, they will be Globus enabled.

17
GRID Deployment in AS
  • LCG test-bed for both Computing Centre and
    Institute of Physics started from 2002
  • Globus Toolkit 2.2.x test-bed for parallel
    computing environment has been established
  • Globus Toolkit 2.2.x test-bed for BIO-Cluster
    (ready from 2003)
  • Globus Toolkit 3.0 testing began from Jan. 2003
  • Other Works before July 2003
  • Building pQS Globus toolkit mixed environment
  • Porting LERR-G
  • Working on Data Management issue
  • Promote GRID technology to our partners

18
LHC Computing Grid (LCG)HEP
19
HEP group in Taiwan and collaboration joined
  • Academia Sinica Joined Fermilab CDF
    Collaboration in 1993.
  • National Central University L3 at CERN (1990)
  • National Taiwan University Belle at KEK (1995)
  • All three groups join the LHC at CERN.
  • LHC is next generation high energy particle
    collider.
  • AS ATLAS(A Toroidal LHC ApparatuS )
  • NCU and NTU CMS
  • CDF
  • Top quake discovery in 1995, Taiwan is one of the
    5 countries in CDF.
  • Evident of CP violation in B sector (1997)
    (Sin(2beta) measurement)
  • Belle
  • CP Violation in B Physics(2002).
  • L3
  • Higgs Search Find one Higgs candidate (2000)

20
CDF
  • CDF The Collider Detector at Fermilab
  • More than 500 physicists work on the
    collaboration
  • The discovery of the top quark was one of the
    major results of the CDF collaboration.
  • Installation of Silicon Detector into the CDF
    Detector

21
Requirements
  • Sufficient bandwidth for downloading data set
    with size of 2 20 GBytes for multiple
    researchers at the same time.
  • Sufficient bandwidth and efficient management to
    support stable multipoint video conferencing
  • Within 2 3 years, the network should be able to
    support Tera Byte data file transferring
    bi-directionally, not including GRID requirement.

22
Taiwan LCG Structures Taiwan domestic
network. Minimum bandwidth is 2.5Gbps. Taipei
GigaPoP is a Metropolitan Fiber Ring, with the
capability to upgrade from 10Gbps to Multi-Lambda
network.
Taiwan International Connectivity
Broadband connections to US, Europe, Japan and
Hong Kong are in place and will be upgraded when
necessary.
CN CERnet
NCTU
AU
NCU Tear 2/3
EU
YMU
NTOU
10G
JP
Academia Sinica (AS) Tear 1/2
1.2G or 2.5G via StarLight in Ph1
US
MOECC
155M ? 622M
622M
Taipei GigaPoP (10G 2.5G)
CGU
155M ? 622M
HK
TANet Schools
Taipei City School Net, GSN, ISPs
CN CSTnet
SG
TH
NTU Tear 2/3
23
Resources
Year 2002 2003 2004 2005
Processors 30 90 200 400
SI2000 15K 75K 220K 680K
Disk (TB) 2 10 30 80
Tape (TB) 30 60 120 240
Local Network Bandwidth (MB/s) 1200 1200 1200 1200
Manpower (SysAdmOpSupport) 112 112 112 112
Funding Status F E E E
24
Bio-Computing and CRASA
25
Conceptual Bio-Grid Application Infrastructure
26
Bio-Grid Application Platform
  • Bio-Cluster
  • PC Farm Project (1996)
  • More 330 computing nodes (2002)
  • Dedicated 64 CPUs BioCluster (Mar. 2002)
  • BioGrid in AS (IBMS, ASCC)
  • Business Process
  • Oracle (Compaq, IBM, SUN, Linux)
  • 64 CPUs BioCluster
  • Parallel CRASA (IBMS)
  • IBM DiscoveryLink pilot project (dbEST, dbSNP,
    Swiss-Prot)
  • NGC LIMS
  • ENU mouse database design
  • Microarray Database (SMD on Linux)
  • http//bits.sinica.edu.tw

27
A New Tool for Sequence Analysis - CRASA
  • What is CRASA?
  • The advantages of CRASA
  • Complexity Reduction Algorithm for Sequence
    Analysis
  • A homology based tool for annotating long
    genomic sequence (e.g. Human Chromosome)
  • Global sequence alignment for genome annotation
  • Dynamic(Progressive) data structure (Multi-level
    Pyramid Data Structure)
  • Parallel processing
  • Low memory requirement
  • Long genomic sequence annotation
  • High accuracy of gene prediction

28
System Overview (CRASA)
Masking
Genomic DNA Sequences
cDNA Database (HGI)
RepBase Database
CR Pyramid constructing (256 patterns)
CR Processing
Query
CR Pyramid Database
Pattern alignment
No
Match length ? 60 bp ? Match fragments ? 3 ?
Filtering
Yes
Exons
31
89/12/20 pm
29
Introduction
  • Using Globus Toolkit v2.2
  • Compiling CRASA program with MPICH-G/PGI compiler
  • Using globusrun to run the CRASA program on GRID

30
Start the Grid proxy
31
Compose the RSL script
32
Globusrun
33
CRASA is running on another machine
34
Benchmark of CRASA Part I
  • Gene Prediction Performance
  • For Human Chromosome 21 (33.9Mbps, gene poor)
  • 48 minutes (11.5 kbps/sec)
  • For Human Chromosome 22 (34.0Mbps, gene rich)
  • 100 minutes (5.7 kbps/sec)

35
Benchmark of CRASA Part II
  • Parallelization Speed Up

36
Benchmark of CRASA Part IV
  • Length of Query Sequence v.s. Elapsed Time

37
Digital Archive Data Grid
38
Scope of Digital Archives
Domain Expertise
e-Research
Culture and Knowledge Background
Being Digitised
e-Learning
Digital Archives
Enterprise Intelligence
Born Digital
General Knowledge Base
Business Process and Lifecycle
39
Why Knowledge-based Approach for Digital Archives
  • Passive Requirements for long-term scalable and
    persistent archives while the technology evolves
  • Active Requirements for generation of new
    knowledge (for easily discover new and unexpected
    patterns, trends and relationships that can be
    hidden deep within very large and diverse
    datasets)

40
Content Management Challenges1
  • Separating content from presentation
  • Versioning, Roll-back
  • Data/Information re-use
  • Re-purposing of Information, flexible Output
  • Workflow, submit, review, approve, store

41
Content Management Challenges2
  • Integrating diversified contents and external
    sources
  • System and roles-based security
  • Metadata Management
  • Compute and Storage resources on demand
  • Reliability and Scalability

42
Basic Functions of a CMS
  • A CMS manages the path from authoring through to
    publishing using a scheme of workflow and by
    providing a system for content storage and
    integration.
  • Authoring/Capturing
  • Workflow
  • Integration and Storage
  • Publishing/Dissemination

43
The CMS Feature List
44
Access Grid
45
Access Grid for Collaborative Env.
  • Multi-point Video Conference Facilities
  • MCU-based 24 concurrent sessions
  • VRVS
  • H.320/H.323
  • WhiteBoard
  • Video Server
  • Web-based Content Retrieval and Dissemination

46
eLearning Data Grid
47
Challenge and Goals of eLearning
  • Challenge
  • Building Knowledge Society
  • Ubiquitous Learning
  • Emergence of New Learning Models --gt Workflow
    Analysis
  • The most efficient implementation
  • Adaptation to technology changes
  • Goals
  • Learning how to learn
  • Helping people with disabilities more easier to
    learn
  • Life Long Learning and Life Long Teaching
  • Training at All Levels
  • Formation of Learning Society

48
Basic Requirements of eLearning
  1. Combination of either Learner Centric or Teacher
    Centric, for making the most outcome
  2. Diversified, Large Amount, Distributed and better
    accessed Learning Resources
  3. Well Organized and Complete Content Description
  4. Integration of heterogeneous Information
    Resources
  5. On Demand and Ubiquitous Learning for anyone
  6. Toward Effective Knowledge Discovery and Well
    Knowledge Organization Management

49
How to Get There?
  • Open Source eLearning Platform
  • Web-based virtual learning, teaching and
    informing
  • Robust, distributed collaborated and ubiquitous
    computing environment as the infrastructure --gt
    demands for Grid Infrastructure !
  • Standardization
  • Well-defined specification
  • Interoperability Mechanism for conversion,
    transformation, and exchange, etc.
  • Integration
  • Building Community for
  • Developing Common tools
  • Technical Study Support
  • Requirements Collection
  • Planning
  • Suggestions to National Strategy
  • Grid Infrastructure
  • Learning Resources
  • eLearning Services

50
Progress of eLearning in Taiwan
  • Master Plan of Information Technology in
    Education for Primary and Secondary Schools
  • Ministry of Education, 2001 --2005
  • 20 of curriculum time of using IT
  • 600 seed schools
  • Training teacher teams
  • Equipping teachers with notebook computers
  • Program of Science and Technology for e-Learning
    (2003)
  • Cross Ministry initiative
  • 130 million US for 5 years
  • Led by the President Liu of NCU

51
Pilot Projects for eLearning in AS
  • Social University for Adults Learning
  • Community University for Minority, e.g.,
    Indigenous People
  • Parallel Programming and Computing Applications
  • Survey of the standardization of metadata for
    eLearning

52
Grid Architecture for eLearning
53
GBIF --gt Biodiversity Data Grid
54
International Biodiversity Collaboration
55
Earth Science Data Center
Write a Comment
User Comments (0)
About PowerShow.com