Title: Cloud Computing Application in High Energy Physics
1Cloud Computing Application in High Energy Physics
- Yaodong Cheng
- IHEP, CAS
- 2012-4-23
2Outline
- From Grid to Cloud
- Some cloud projects in HEP
- Cloud activities at IHEP, CAS
3Outline
- From Grid to Cloud
- Some cloud projects in HEP
- Activities at IHEP, CAS
4Terminology
- What is a grid?
- A platform for scientific collaboration
- A scientific tool to help with data manipulation
and processing - A computational platform
- And how it compares with a cloud?
- A source for computing and storage capacity
- Flexible, easy to access resource
- And a cluster?
- A building block for both grids and clouds
5Grid as a collaborative platform
- State before the Grid
- Scientists/teams have resources or access to them
- Teams are working independently, they do not
share their resources (no technology support) - Data sharing very simply, e.g. secure copy (scp)
between teams - State with a Grid
- Teams resources are connected
- Sharing is easy
- Scientists could focus on their science and not
on the technology behind it
6Grid View
7Clouds
- A rather recent commercial platform
- (Large) Pool of virtualized servers
- Users submits not jobs, but full virtual machines
with jobs inside them - Targets real-time requirements
- Fast deployment of new virtual server
- Can quickly react on users changing requirements
- Standard clouds rather simple
- Easy to use web interface
- No collaboration support (standard Consumer
Provider model, ideal for commercial use)
8What is cloud
- The NIST definition lists five essential
characteristics of cloud computing - on-demand self-service
- broad network access
- resource pooling
- rapid elasticity or expansion
- measured service
- Three "service models"
- software, platform and infrastructure
- Four "deployment models"
- private, community, public and hybrid
9Cloud and Grid A Comparison
Grid Middleware
Cloud Middleware
Computing/Data Center
Computing/Data Center
Computing/Data Center
Computing/Data Center
10From Grid to Cloud
- Grid has been the necessary infrastructure for
many scientific research, e.g.. HEP - But, there are still some disadvantages
- How to schedule jobs efficiently to improve the
resource utilization (vs. static policy) - diversified service model on demand (vs. job
submission) - Compatible with legacy programs (vs. unified
system environment) -
- Virtualization/Cloud is feasible solution
11Outline
- From Grid to Cloud
- Some cloud projects in HEP
- Activities at IHEP, CAS
12CernVM
- CernVM is a baseline Virtual Software Appliance
for the participants of CERN LHC experiments - Motivation
- Software _at_LHC large, complicated
Install/update/configure, - Multi-core with hardware support for
virtualization - Using virtualization and extra cores to get
extra comfort - zero configuration, reduce compiler-platform
combinations - CernVM Build a thin Virtual Software
Appliance for use by the LHC experiments - provide a complete, portable and easy to
configure user environment - independent of physical software and hardware
platforms - http// cernvm.cern.ch/
13Thin Software Appliance
H T T P D
LAN/WAN (HTTP)
Software Repository
Cache
10 GB
1 GB
0.1 GB
14CVMFS CernVM File System
On same host
On File Server
/opt/lcg -gt /chirp/localhost/opt/lcg
/opt/lcg -gt /grow/host/opt/lcg
App
CernVM Fuse
open(/opt/lcg)
!Cache
Kernel
Cache
NFS
LFS
FUSE
15Bridging Grids Clouds
- Volunteer Computing
- uses computers belonging to ordinary people
- BOINC
- Open-source software for Volunteer Computing and
Grid computing - CernVM is extended to support BOINC client
- CernVM CoPilot development
- Based on BOINC, LHC_at_home experience and CernVM
image - Image size is of utmost importance to motivate
volunteers - Can be easily adapted to Pilot Job frameworks
(AliEn,Dirac, PanDA)
16CernVM CoPilot Architecture
17lxcloud
- CERN Internal Cloud
- Highly scalable, Linux (KVM) based cloud-like
infrastructure - Optimized for efficiency/speed
18Resource Pool details
- Quattor managed pool of resources
- Hardware (cheap) CPU server type, local disks
- LANDB integration
- Pre-allocation of VM slots in landb
- Hypervisor knows the name of guests
- Disk management
- Use of LVM snapshots
- All free disk space in one big LV
- Pre-stage raw images on LV on the hypyerviors
- Fast installation of VMs Using LV snapshots
19Image management
- Central image catalogue (VMIC)
- Close collaboration with HEPiX
- No direct user access/user images
- Images require endorsement by IT
- Image distribution system
- Image distribution repository of trusted images
- Fast distribution using Bit-torrent (rtorrent)
- Pull model Hypervisors ask if there are updates
- Transparent update of images using LV tools
- Hypervisors advertise existing images
20Virtual Machine Management
- OpenNebula
- an open source Cloud Data Center Management
Solution - provides a powerful, scalable and secure
multi-tenant cloud platformfor fast delivery
and elasticity of virtual resources - OpenStack
- The Open Source Cloud Operating System
- The Main components
- Compute, Object Storage, Image service
- A interesting product worth to be checked
21Lxcloud ecosystem
Quattor
Image creation and endorsement
Enduser VO
Golden Nodes
OpenNebula
ONE EC2 Interface
CernVM
ONE 3.0 Master
Image repository (VMIC)
Application manager
VM Provisioning
Image creation
lxcloud
Physical Resource
22Clever A New VIM
- CLEVER A CLoud-Enabled Virtual EnviRonment
- To simplify the access management of
private/hybrid clouds - To provide simple and easily accessible
interfaces to interact with different
interconnected clouds, deploy Virtual Machines
and perform load balancing through migration
23Clever on Grid
Administration tool
Host1
job Submission
XMPP
Ejabberd XMPP Server
Host Manager (HM)
CLEVER.jar and X.509 Certificate
User Interface
Host2
Resource Broker
Host Manager (HM)
Matchmaking and Jobs Scheduling
HostN-1
HM
HM
XQuery/XPath
Sedna Distributed Databases
CE
Cluster Manager (CM)
jobs Running
HM
Worker Nodes
HostN
Computing Element
Host Manager (HM)
tiny.vdi
Storage Element
24Outline
- From Grid to Cloud
- Some cloud projects in HEP
- Activities at IHEP, CAS
25Virtual Cluster
- Motivation
- Build Virtual machine pool on physical machines,
elastic to expand or shrink on demand - Flexible to support more kinds of applications
- Compatible with legacy programs
- RD cloud for users
- Key technologies
- hypervisors (KVM, XEN, ) evaluation suitable
for HEP - VIM management (OpenNebula, OpenStack, )
- Monitoring and accounting
- Interface to PBS, WLCG, and other services
- dynamic scheduling
- Live migration
- VM resource adjustment (CPU, Memory, Network, )
26Architecture of Virtual Server
WLCG
Scheduling policy
Grid Job
Scheduler
PBS Client
Query and Modify Queue
Submit Job
PBS Server
VIM (VM create, start, pause, destroy, migration)
Power Management
VM
VM
VM
VM
Physical Machine
Physical Machine
27PBS/Torque integration
- Each batch queue has basic resources (physical
nodes or Virtual machines) - If the jobs are too many in one queue, the
scheduler will create some extra virtual machines
according with scheduling policy and
requirements, then added the new resources into
the queue - The queue with higher priority needs more
resources, the VM resources in queues with lower
priority will be paused, even destroyed - Fair scheduling is very important here!
- WLCG interface is simply via PBS/torque
28GUI
29BESIII Cloud
- Integrated with Grid, volunteer computing, and
virtualization - User submits jobs to BESIII portal, then these
jobs will be dispatched to different computing
resource - Volunteer computing (small sites and personal
computers) - Local cluster (managed by LRMS)
- WLCG
- CNGrid
- plugin framework
- gLite, PBS, GOS plugins already completed!
- Recently, BESIII Offline Software System (BOSS)
has successfully run on CernVM-based CAS_at_home - BONIC plugin is ready!
30CAS_at_HOME
- CAS_at_home is the first volunteer computing
platform in China - Use BOINC as its middleware
- Launched by IHEP in January 2010
- To help scientists from CAS or other research
organizations in China to to run their scientific
researches on volunteer computing resources - More than 9,000 user, 16,000 computer joined
CAS_at_home
31Architecture of BESIII Cloud
BESIII portal
Plugins (gLite, GOS, PBS, BOINC, )
BOINC Server
PBS Server
gLite WMS
GOS
Small sites and Personal Computer
CNGrid
Local Cluster
WLCG
32Future Cloud-Grid Integration
Web Application Service
Collaboration Services
DatacenterInfrastructure
Compute Service
Database service
Cloud Grid Computing
Service Catalog
Job Scheduling Service
Storage service
Computing centerInfrastructure
Storage backup, archive service
Virtual Client service
Content Classification
33