Lecture 5: Build-A-Cloud - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Lecture 5: Build-A-Cloud

Description:

Lecture 5: Build-A-Cloud http://www.cs.columbia.edu/~sambits/ Life Cycle in a Cloud Build a image(s) for the software/application that we want to host on cloud ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 21

Provided by: A266

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 5: Build-A-Cloud

1
Lecture 5 Build-A-Cloud

http//www.cs.columbia.edu/sambits/

2
Life Cycle in a Cloud

Build a image(s) for the software/application
that we want to host on cloud (lecture 4)
Request a VM pass appropriate parameters such
as resource needs and image details (lecture 3)
When the VM is started up, parameters are passed
to it at appropriate run levels to auto-configure
the software image (lecture 4)
Now in this lecture
Lets monitor the provisioned VM
Manage it at run time
As workload changes, make changes to the amount
of requested resource

3
What we shall learn

We shall put together a cloud piece by piece
Open Nebula as the cluster manager
KVM as the hypervisor for host machines
Creating and managing guest VMs
Creating Cluster Application(s) using VMs
Application level management
Interesting Sub-topics which we will touch
Monitoring cluster and applications in such an
environment
Example application level management
How to add on-demand resource scaling using Open
Nebula and Ganglia

4
Cloud Setup

Basic Management
Image Management
VM Monitoring Management
Host Monitoring Management

private cloud client
Management Layer
Image Management
VM Management
Host Management
VN Management
Infrastructure Info
5
Our stack for the cloud

Open Nebula for managing a set of host machines
that have hypervisor on them
KVM hypervisor on the host machines
Ganglia for monitoring the guest VMs
Glue code for implementing Application
management e.g. resource scaling

6
OpenNebula Setup

Install OpenNebula management node
Download and compile the src on the mgmt-node
(easy installation, install root as oneadmin)
Setup sshd on all hosts which have to be added
(also install ruby on them)
Allow root of the mgmt-node to have password-less
access to all the managed hosts
Setup image repository (shared FS based setup is
required for live migration)
If you do not have linux-server (download
VirtualBox) and create a linux VM on your laptop
Open Nebula Architecture
Tools written om top of OpenNebula interact with
core via XML-RPC
The core exposes VM, Host, Network management
APIs
Core stores all installation and monitoring
information in SQLite3 (or MySQL) DB.
Most of the DB information can be accessed using
XML-RPC calls
All the drivers are written in ruby as run as
daemons, which in-turn call small shell-scripts
to get the work-done.

7
Create a Cloud

Start the one daemon
Edit ONEHOME/etc/oned.conf for necessary changes
(quite intuitive)
Put loginpasswd in ONEHOME/etc/one_auth
one start does that
Keeps all the DB and logs in ONEHOME/var/
NOTE if you want to do a fresh setup, simply
stop oned and delete ONEHOME/var/ and again
start the OpenNebula daemon
Setup ssh on host machines (allow oneadmin as
password-less entry)
Concatenate the .ssh/id_rsa of admin-node on the
host-servers .ssh/authorized_keys
chmod 600 .ssh/authorized_keys
Add hosts to OpenNebula
Use command onehost
Command is written in Ruby
Command basically makes XMLRPC call to the
OpenNebula servers HostAllocate call
E.g.

Configure network
Fixed defines fixed set of IP-MAC pairs
Ranged defines a class network
e.g. fixed set network setting (assuming you have
a set of static IP addresses allotted to you then
how will you set it up).

Note good site for helphttp//www.opennebula.org
/documentationrel1.4vgg
9
How to access OpenNebula

All API can be called using XML-RPC client
libraries
Nebula command line client (Ruby)
Java Client

10
Setup Monitoring

Requirements of Monitoring
Need something which stores resource monitoring
data as a time series
Exposes interfaces for querying it and simple
aggregation of data
Automatically archives the older data
How to achieve it?
Install Ganglia !
Tune the VM-images to automatically report their
monitoring via ganglia
Install gmond on host-servers
What is Ganglia
Its an open-source S/W (BSD License)
Distributed monitoring of clusters and grids
Stores time-series data and historical data as
archives (RRDs)
How to get Ganglia
Download the source-code from (http//ganglia.info
/downloads.php)
For some Linux distributions, RPMs are available

11
Components of ganglia

It has two prime daemons
Gmond a multi-threaded daemon, which runs on
monitored nodes
Collects data on monitored notes and broadcasts
the monitored data as XML (can be accessed at
port 8649)
Configuration script (/etc/gmond.conf)
Gmetad
periodically polls a collection of children
data-sources
parses the collected XML and saves all numeric
metrics to round-robin databases
exports the aggregated XML over a TCP socket to
clients (8651)
Configuration file /etc/gmetad.conf
One for each cluster
Round Robin Database
RRDtool is a well known tool for creating and
storing and retrieving/plotting RRD data
Maintains data at various granularities e.g.
defaults are
1-hour data averaged over 15-sec (rra0)
1-day data averaged over 6-min (rra1)
1-week-data averaged over 42-min (rra2)
The web GUI tools
These are a collection of PHP scripts started by
the Webserver to extract the ganglia data and
generate the graphs for the website
Additional tools

Note good site for help http//www.ibm.com/devel
operworks/wikis/display/WikiPtype/ganglia
12
How to get monitoring-data?

How to get the time-series data?
Ganglia stores all RRDs in /var/lib/ganglia/rrds/
cluster_name/machine_ip
There is a rrd file for each metric
Data is collected at a fixed time-interval
(default is 15 sec )
One ca retrieve the complete time series of
monitored data using rrdtool from each rrd file
e.g.
Get average load_one for every 15 sec of the the
last 1-hour
rrdtool fetch load_one.rrd AVERAGE -end now
-start e-1h -r 15

13
How to get monitoring-data?

How to access this data from inside a program
Either use sshlib (for perl, python or Java) and
remotely execute the rrdtool command with correct
parameters
Write a small XML-RPC server which exposes a
function to run rrdtool fetch queries.
E.g. perl XML RPC server

use FrontierDaemon my d FrontierDaemon-gtn
ew( methods gt sum gt \sum,,
LocalAddr gt server_ip, LocalPort gt
server_port, debug gt 1, ) sub sum
my (auth, arg1, arg2) _at__ my bool
1 my (package, filename, line,
subroutine, hasargs, wantarray, evaltext,
is_require, hints, bitmask) caller(0)
log-gtinfo("subroutine - " . _1)
log-gtdebug("subroutine _at__")
log-gtdebug("subroutine - " . join("",_at__))
return SUCCESSgtbool, MESSAGEgtarg1 arg2
14
Create a Multi-tiered Clustered Application

Lets us consider a two-tired TPC-W (web server
and database performance benchmark)
How to create an application on custom-images
Create a 6-GB file using dd (utility for
converting and copying files)
Attach a loop-back device to it
Format it like a file-system (say ext3)
Partition it into 3(swap, boot and root)
Install complete OS and application stack on the
relevant partitions.
Install gmond and configure it.
Save it as a custom-image
For TPC-W one will need,
apache tomcat-server,
java-implementation of TPC-W
MySQL Server.
We will need a load-balancer, which can route
http-packets to various backend-servers (and also
http-session aware)
I am using HAProxy (easy to install and
configure)
Nginx, lighttpd are also other popular http-proxy
servers.

15
Installing a multi-tier application
Client

Install a two-tiered Application
Create a template of load-balancer
Create a template of TPCW
Deploy the LB-VM (using OpenNebula)
Deploy the TPCW-VM (using OpenNebula)
Attach TPCW application VMs to LB-VM
Test using Web-browser if setup is working
Create a Client-template
Deploy the client VM
Test client

LoadBalancer
TPCW-0
TPCW-1
16
Application Level Operation

One needs to maintain Application level
information, for e.g. which VM is a load-balancer
and which VMs are backend servers).
Keep Application level knowledge in some local
database.
Application Level Operation e.g. Dynamic
provisioning
Case 1 increasing capacity using replication
Monitor the average utilization of VMs over say
1-min (using ganglia)
If the average utilization of all the VMs under
the load-balancer is above say 70
provision a new VM using OpenNebula (reactive
provisioning also supported by EC2)
Run the post-install script to add the new VM to
the application
Case 2 increase capacity using
migration/resizing
Monitor the average utilization of VMs over say
1-min (using ganglia)
If only one-vm is over-utilized and the host does
not have more resources
migrate it to another host and re-size it to
higher capacity (note nebula does not support it)
Migrate-and-resize VM
Migrate the image to another host
Change the VM-configuration file to new
configuration
Start the VM with new configuration file (with
more RAM and CPU)

17
Application Level Operations (e.g. Dynamic
Provisioning)

Where and How to implement the application-scaling
logic
Application scaling logic needs knowledge of
application topology
It obviously resides above Infrastructure
management layer (I.e. OpenNebula)
Choose an easy to build language (Perl, Python,
Ruby, Java etc).
XML-RPC client is required to make access to
OpenNebula
Write a management program using language of your
choice which
Installs a multi-tier Application and stores
application topology in local DB
Periodically monitors
average load on each server
Proxy errors
Implement case-1 and case-2
Post-install script is adding the VM to the
load-balancer and restarting it.
Problem live-resize or migrate-and-resize are
not present in OpenNebula
Hack create a script which does the following
(very dirty but it works)
Migrate the current VM to destination host
Alter the configuration file of this migrate VM
Destroy and recreate the VM.
Neater solution
Add a class in include/RequestManager.h (say
VirtualMachineResize similar to that of class
VirtualMachineMigrate)

18
Solution Architecture

Application Manager (written above OpenNebula)
The high level control flow is
Periodically monitor the workload-change and
Application performance
Manage the current configuration and actuate
configuration change
Calculate the changed capacity (using some model
and feedback from monitoring block)
Find the new configuration of application
Go ahead and start the process of actuating the
new change

19
How to use (demo!)