Title: Lecture 5: Build-A-Cloud
1Lecture 5 Build-A-Cloud
- http//www.cs.columbia.edu/sambits/
2Life Cycle in a Cloud
- Build a image(s) for the software/application
that we want to host on cloud (lecture 4) - Request a VM pass appropriate parameters such
as resource needs and image details (lecture 3) - When the VM is started up, parameters are passed
to it at appropriate run levels to auto-configure
the software image (lecture 4) - Now in this lecture
- Lets monitor the provisioned VM
- Manage it at run time
- As workload changes, make changes to the amount
of requested resource
3What we shall learn
- We shall put together a cloud piece by piece
- Open Nebula as the cluster manager
- KVM as the hypervisor for host machines
- Creating and managing guest VMs
- Creating Cluster Application(s) using VMs
- Application level management
- Interesting Sub-topics which we will touch
- Monitoring cluster and applications in such an
environment - Example application level management
- How to add on-demand resource scaling using Open
Nebula and Ganglia
4Cloud Setup
- Basic Management
- Image Management
- VM Monitoring Management
- Host Monitoring Management
private cloud client
Management Layer
Image Management
VM Management
Host Management
VN Management
Infrastructure Info
5Our stack for the cloud
- Open Nebula for managing a set of host machines
that have hypervisor on them - KVM hypervisor on the host machines
- Ganglia for monitoring the guest VMs
- Glue code for implementing Application
management e.g. resource scaling
6OpenNebula Setup
- Install OpenNebula management node
- Download and compile the src on the mgmt-node
(easy installation, install root as oneadmin) - Setup sshd on all hosts which have to be added
(also install ruby on them) - Allow root of the mgmt-node to have password-less
access to all the managed hosts - Setup image repository (shared FS based setup is
required for live migration) - If you do not have linux-server (download
VirtualBox) and create a linux VM on your laptop - Open Nebula Architecture
- Tools written om top of OpenNebula interact with
core via XML-RPC - The core exposes VM, Host, Network management
APIs - Core stores all installation and monitoring
information in SQLite3 (or MySQL) DB. - Most of the DB information can be accessed using
XML-RPC calls - All the drivers are written in ruby as run as
daemons, which in-turn call small shell-scripts
to get the work-done.
7Create a Cloud
- Start the one daemon
- Edit ONEHOME/etc/oned.conf for necessary changes
(quite intuitive) - Put loginpasswd in ONEHOME/etc/one_auth
- one start does that
- Keeps all the DB and logs in ONEHOME/var/
- NOTE if you want to do a fresh setup, simply
stop oned and delete ONEHOME/var/ and again
start the OpenNebula daemon - Setup ssh on host machines (allow oneadmin as
password-less entry) - Concatenate the .ssh/id_rsa of admin-node on the
host-servers .ssh/authorized_keys - chmod 600 .ssh/authorized_keys
- Add hosts to OpenNebula
- Use command onehost
- Command is written in Ruby
- Command basically makes XMLRPC call to the
OpenNebula servers HostAllocate call - E.g.
8- Configure network
- Fixed defines fixed set of IP-MAC pairs
- Ranged defines a class network
- e.g. fixed set network setting (assuming you have
a set of static IP addresses allotted to you then
how will you set it up).
Note good site for helphttp//www.opennebula.org
/documentationrel1.4vgg
9How to access OpenNebula
- All API can be called using XML-RPC client
libraries - Nebula command line client (Ruby)
- Java Client
10Setup Monitoring
- Requirements of Monitoring
- Need something which stores resource monitoring
data as a time series - Exposes interfaces for querying it and simple
aggregation of data - Automatically archives the older data
- How to achieve it?
- Install Ganglia !
- Tune the VM-images to automatically report their
monitoring via ganglia - Install gmond on host-servers
- What is Ganglia
- Its an open-source S/W (BSD License)
- Distributed monitoring of clusters and grids
- Stores time-series data and historical data as
archives (RRDs) - How to get Ganglia
- Download the source-code from (http//ganglia.info
/downloads.php) - For some Linux distributions, RPMs are available
11Components of ganglia
- It has two prime daemons
- Gmond a multi-threaded daemon, which runs on
monitored nodes - Collects data on monitored notes and broadcasts
the monitored data as XML (can be accessed at
port 8649) - Configuration script (/etc/gmond.conf)
- Gmetad
- periodically polls a collection of children
data-sources - parses the collected XML and saves all numeric
metrics to round-robin databases - exports the aggregated XML over a TCP socket to
clients (8651) - Configuration file /etc/gmetad.conf
- One for each cluster
- Round Robin Database
- RRDtool is a well known tool for creating and
storing and retrieving/plotting RRD data - Maintains data at various granularities e.g.
defaults are - 1-hour data averaged over 15-sec (rra0)
- 1-day data averaged over 6-min (rra1)
- 1-week-data averaged over 42-min (rra2)
- The web GUI tools
- These are a collection of PHP scripts started by
the Webserver to extract the ganglia data and
generate the graphs for the website - Additional tools
Note good site for help http//www.ibm.com/devel
operworks/wikis/display/WikiPtype/ganglia
12How to get monitoring-data?
- How to get the time-series data?
- Ganglia stores all RRDs in /var/lib/ganglia/rrds/
cluster_name/machine_ip - There is a rrd file for each metric
- Data is collected at a fixed time-interval
(default is 15 sec ) - One ca retrieve the complete time series of
monitored data using rrdtool from each rrd file
e.g. - Get average load_one for every 15 sec of the the
last 1-hour - rrdtool fetch load_one.rrd AVERAGE -end now
-start e-1h -r 15
13How to get monitoring-data?
- How to access this data from inside a program
- Either use sshlib (for perl, python or Java) and
remotely execute the rrdtool command with correct
parameters - Write a small XML-RPC server which exposes a
function to run rrdtool fetch queries. - E.g. perl XML RPC server
use FrontierDaemon my d FrontierDaemon-gtn
ew( methods gt sum gt \sum,,
LocalAddr gt server_ip, LocalPort gt
server_port, debug gt 1, ) sub sum
my (auth, arg1, arg2) _at__ my bool
1 my (package, filename, line,
subroutine, hasargs, wantarray, evaltext,
is_require, hints, bitmask) caller(0)
log-gtinfo("subroutine - " . _1)
log-gtdebug("subroutine _at__")
log-gtdebug("subroutine - " . join("",_at__))
return SUCCESSgtbool, MESSAGEgtarg1 arg2
14Create a Multi-tiered Clustered Application
- Lets us consider a two-tired TPC-W (web server
and database performance benchmark) - How to create an application on custom-images
- Create a 6-GB file using dd (utility for
converting and copying files) - Attach a loop-back device to it
- Format it like a file-system (say ext3)
- Partition it into 3(swap, boot and root)
- Install complete OS and application stack on the
relevant partitions. - Install gmond and configure it.
- Save it as a custom-image
- For TPC-W one will need,
- apache tomcat-server,
- java-implementation of TPC-W
- MySQL Server.
- We will need a load-balancer, which can route
http-packets to various backend-servers (and also
http-session aware) - I am using HAProxy (easy to install and
configure) - Nginx, lighttpd are also other popular http-proxy
servers.
15Installing a multi-tier application
Client
- Install a two-tiered Application
- Create a template of load-balancer
- Create a template of TPCW
- Deploy the LB-VM (using OpenNebula)
- Deploy the TPCW-VM (using OpenNebula)
- Attach TPCW application VMs to LB-VM
- Test using Web-browser if setup is working
- Create a Client-template
- Deploy the client VM
- Test client
LoadBalancer
TPCW-0
TPCW-1
16Application Level Operation
- One needs to maintain Application level
information, for e.g. which VM is a load-balancer
and which VMs are backend servers). - Keep Application level knowledge in some local
database. - Application Level Operation e.g. Dynamic
provisioning - Case 1 increasing capacity using replication
- Monitor the average utilization of VMs over say
1-min (using ganglia) - If the average utilization of all the VMs under
the load-balancer is above say 70 - provision a new VM using OpenNebula (reactive
provisioning also supported by EC2) - Run the post-install script to add the new VM to
the application - Case 2 increase capacity using
migration/resizing - Monitor the average utilization of VMs over say
1-min (using ganglia) - If only one-vm is over-utilized and the host does
not have more resources - migrate it to another host and re-size it to
higher capacity (note nebula does not support it) - Migrate-and-resize VM
- Migrate the image to another host
- Change the VM-configuration file to new
configuration - Start the VM with new configuration file (with
more RAM and CPU)
17Application Level Operations (e.g. Dynamic
Provisioning)
- Where and How to implement the application-scaling
logic - Application scaling logic needs knowledge of
application topology - It obviously resides above Infrastructure
management layer (I.e. OpenNebula) - Choose an easy to build language (Perl, Python,
Ruby, Java etc). - XML-RPC client is required to make access to
OpenNebula - Write a management program using language of your
choice which - Installs a multi-tier Application and stores
application topology in local DB - Periodically monitors
- average load on each server
- Proxy errors
- Implement case-1 and case-2
- Post-install script is adding the VM to the
load-balancer and restarting it. - Problem live-resize or migrate-and-resize are
not present in OpenNebula - Hack create a script which does the following
(very dirty but it works) - Migrate the current VM to destination host
- Alter the configuration file of this migrate VM
- Destroy and recreate the VM.
- Neater solution
- Add a class in include/RequestManager.h (say
VirtualMachineResize similar to that of class
VirtualMachineMigrate)
18Solution Architecture
- Application Manager (written above OpenNebula)
The high level control flow is - Periodically monitor the workload-change and
Application performance - Manage the current configuration and actuate
configuration change - Calculate the changed capacity (using some model
and feedback from monitoring block) - Find the new configuration of application
- Go ahead and start the process of actuating the
new change
19How to use (demo!)
- Command line scripts
- VM Lifecycle steps
- Creation show template and image-naming
- Suspension just the command
- Migration migration (suspend and migrate)
- Deletion removing the image
- Show ganglia monitoring
- Host monitoring through VM-lifecycle
- VM monitoring
20Cloud Management using this Setup
- Integrate Nebula monitoring with ganglia and make
it more efficient - Use monitoring for VM placement on hosts.
- Use monitoring to do reactive provisioning