Title: Fault%20Tolerance%20Design%20Techniques
1An Introduction to Cloud Computing
Seattle UniversityCourse-Computing
SystemProfessor-Dr. Yingwu ZhuBy Navsimrat
KaurPooja SinghalSangeetha Codla Diwakar
2Outline
- Introduction
- Key Characteristics and Benefits of Cloud
- Different Cloud Service Models
- Case Study A Amazon EC2 - Pooja
- Case Study B Google App Engine - Navi
- Case Study C Microsoft Azure - Sangeetha
- Current Issues and Limitations of Cloud
- Summary
- References
3What is Cloud Computing
- Cloud computing is a technology that uses the
internet and central - remote servers to maintain data and applications.
- It provides on demand resources and services
over the internet with the power of scalability
and reliability.
4Key Characteristics and Benefits
- Economy
- The most frequent reason cited was that the cloud
wins in cost. - Zero Upfront Infrastructure Cost
- Pay as per Use
- Time Just in time Infrastructure
- Elasticity Scale up, scale down, on-demand
Improved Testability, Experimentation - Better Resource Utilization
- Potential for shrinking the Processing Time
- Overflow the traffic to Cloud
5http//www.dotcominfoway.com/technology/cloud-comp
uting
6Different Cloud Service Models
http//hrushikeshzadgaonkar.wordpress.com/
73 Building Blocks
- SaaS on-demand software delivery model in which
software and its associated data are hosted
centrally on a cloud and are typically accessed
by users using a client, normally using a web
browser over the Internet. - Allows users to run existing online applications
- PaaS This includes hardware (servers, networks,
load balancers etc) and software (operating
systems, databases, application servers etc).
There are a number of PaaS providers including
Google App Engine, Microsoft Azure and
Salesforce.coms Force.com. - Allows User to create their own cloud
applications using supplier specific tools and
languages. - IaaS This includes hardware (servers, networks,
load balancers etc) and software (operating
systems, databases, application servers etc). - Allows Users to run any application they please
on cloud hardware of their choice
8Case Study C
CASE STUDY 1 Amazon EC2
9Amazon EC2
- EC2 Amazon Elastic Compute Cloud is a web
service that provides resizable compute capacity
in the cloud. It is designed to make web scale
computing easier for developers. - Gives a virtual instance of the machine on the
cloud to host and run applications on the virtual
instance. - Uses XEN Para-Virtualization Architecture
10XEN Para-Virtualization Architecture
http//tr.opensuse.org/An_Introduction_to_Virtuali
zation
11Amazon EC2 Core Features
- Amazon Machine Images Contains all the
information necessary to boot instances of users
software. It is also possible to use templated
images that are already available for usage and
allow instance usage of EC2 - Amazon EC2 Instance The running system based on
AMI is referred to as an instance. - Amazon Elastic Block Store offers persistent
storage for EC2 Instances. Designed to protect
data by automatically creating replicas. EC2
instances can be stopped and restarted. - Elastic Load Balancing Automatically distributes
incoming application traffic across multiple
Amazon EC2 instances. - Auto Scaling Automatic Scaling up/down of EC2
Instances, provided by Amazon Cloud Watch.
12Amazon EC2 Functionality
- Select a pre-configured, templated image to get
up and running immediately. (Or Configure a new
AMI) - Configure security and network access on Amazon
EC2 instance. - Choose instance type(s) and operating system.
- Start, terminate, and monitor instances of AMI as
per need - Determine whether want to run in multiple
locations, utilize static IP endpoints, or attach
persistent block storage to your instances. - Pay only for the resources that are actually
consumed -
13Amazon EC2 Demo
Lets Launch an EC2 Instance
14Amazon EC2 Failure Analysis
- On April 21st 2011, an Amazons Data Center
failure in Northern Virginia caused dozens of
popular websites to be out of service for a
considerable amount of time. - Affected Foursquare , Reddit, Hootsuite, Quora,
many other companies - Unaffected Netflix, SimpleGeo, SmugMug
- What really happened?
- Amazon Engineers were attempting to upgrade
primary EBS networks, accidentally routed some
traffic to a backup network with insufficient
capacity - A large number of EBS nodes lost their connection
to the replicas they had created, causing them
immediately look for somewhere to make their
replicas. - Instances which were trying to read/write these
volumes also get stuck. - In order to stabilize and restore EBS cluster,
all control APIs were disabled in the affected
Availability zone giving rise to unavailability
of service. - Amazon team took 12 hours to control Replication
Storm - Took much more then that to recover Customers
data, 0.07 EBS volumes were unrecoverable.
15Amazon EC2 Failure
- Lessons Learned
- Better Communication with Clients in Crisis
- Amazon Harshest Criticism Lack of any response
for more than 40 minutes - Incident showed weaknesses of a cloud, also
highlights liabilities in those who have become
totally dependent upon Clouds. - Cloud is still maturing and evolving.
- Dont store data on Instance or if store then
back it up frequently. Also make an AMI of your
instance for easy recreation or cloning. - Design your systems keeping Cloud in mind Each
component (EC2 Instance) should be able to die
without affecting the whole system. - Netflix uses Chaos Monkey (set of Scripts) that
runs through AWS processes and occasionally shuts
them down to ensure that rest of the system is
able to keep running. Also, uses Amazons Cloud
Redundant Backup Infrastructure.
16Pay Model Free Tier
17Pay Model On Demand Instances
18Pay Model Reserved Instances
19SO Whats SO Amazing in Amazon EC2?
- Elastic
- Completely Controlled
- Flexible
- Reliable
- Secure
- Designed to use with other Web Services
- Inexpensive
-
20Case Study C
CASE STUDY 2 Google App Engine
21Overview
- Run your application on Google infrastructure.
- Build your app using
- JVM bases interpreter or compiler.
- Python
- Go
- Applications build on Google infrastructure are
- Easy to build, maintain and scale.
- User has a choice either the app to be served by
free domain appspot.com or he can his own domain
name. - Starting package is free
- 10 applications
- 500 mb storage
- 5 million page views per month
22High Level Architecture
http//www.byteonic.com/2009/why-java-is-a-better-
choice-than-using-python-on-google-app-engine/
23How does it work?
- Dynamic web serving
- Persistent storage
- Automatic scaling and load balancing
- APIs for user authentications and sending emails.
- Fully featured local development environment.
- Task queues
- Scheduled tasks
- Secure Environment
- Sandbox
- Sandbox isolates the application from operating
system, hardware and physical location of the
server in very secure and reliable way. - This makes load balancing easy
24DataStore
- A powerful distributed data storage service.
- Grows with the amount of traffic.
- Stores data objects as entities. An entity can
have more than one property of different types. - Create, update or delete happens in a
transaction. - Entity can belong to entity groups also which are
defined as hierarchy of relationships between
entities. - Uses optimistic concurrency
25Types of Datastore
- High Replication datastore
- Synchronous
- Highly available and reliable
- Available for reads and writes during planned
downtime also - Data replicated using Paxos algorithm.
- 3 times expensive than Master/Slave
- Master/Slave datastore
- Asynchronous
- One datacenter is master at any given time for
write queries. Therefore offers strong
consistency. - .
26Services
- Memcache
- When to use?
- Speed up common datastore queries
- Session data, user preferences and frequently
performed queries - When not to use?
- Values can expire unexpectedly from cache. Make
sure that your application runs normally if the
value is suddenly not available. - Quota
-
27URL Fetch
- Communicate with other hosts using http or https
requests. - URL to be fetched can use any port in the range
80-90,440-450 and 1024-65535. - Fetch can use any of GET,POST,PUT, HEAD and
DELETE. - A request handler cannot call its own URL.
- Default deadline for response for URL fetch is 5
seconds and maximum is 10 seconds for online and
10 minutes for offline. - Supports both synchronous and asynchronous
requests. - Quota
-
28Mail
- Sending emails
- The message to be sent is queued and call
returns immediately. - Mail service contact each recipients mail
server, delivers the message and retries if the
server was unavailable. - If mail service fails in sending message, then
error message is sent to the address of the
sender of the message. - Receiving emails
- Receive emails of the form string_at_appid.appspotmai
l.com - Received as HTTP requests
- Quota
-
29BlobStore
- Allows the app to serve data objects that can be
upto 2 gigabytes in size. - Useful for serving large files , e.g., Videos or
image files or allowing users to upload large
files. - Cannot be modified once they are created.
- Quota
-
30Capabilities Images
- Capabilities
- Detect outages and scheduled downtime.
- Reduce downtime by detecting if capability is
available or not. - Images
- Manipulate images(rotate, crop, resize) using
Image service - Support JPEG, PNG,GIF,BMP,TIFF and ICO formats.
31Channel
- Creates a persistent API between application and
Google servers.
http//code.google.com/appengine/docs/java/channel
/overview.html
32Channel Quota
33OAuth
- Protocol that allows a user to grant third party
limited permission without the user to give
his/her username or password to the third party. - Various steps between user and the service
provider - Consumer calls a web service to get request token
for app. - Redirection of user browser to authentication
URL, user signs in and tells Google accounts that
consumer is authorized to access service on
users behalf. - Consumer calls web service to get access token
- Consumer is authorized to call the service now.
34Task Queues
- Apps perform tasks other than from the user
requests, e.g., for some background work.
Efficient and powerful tool for background
processing - Push Queues
- Configure a queue and add tasks to it. App engine
takes care of rest. - Easy to implement but restricted to use within
app engine. - Pull Queues
- Best choice if using a different system to
consume tasks. - Task consumer leases specific number of tasks
from the queue and is responsible for deleting it
afterwards. - Gives more flexibility and control over when and
where tasks will be processed. - Quota
35Users, Multitenancy and XMPP
- Authenticate users
- Google accounts
- Google Apps domain
- OpenId()
- One instance of an application servers many
clients. - XMPP Send and receive messages to and from any
XMPP compatible chatting service , e.g. Google
talk - XMPP quota
-
36Billing Model
http//code.google.com/appengine/docs/billing.html
37CASE STUDY 3 Microsoft Azure
38Microsoft Azure Platform
- The Windows Azure platform is a simple, reliable,
and powerful Microsoft platform for creating
cloud applications, online services, and
websites. - Core products
- Windows Azure
- SQL Azure
- Windows Azure Platform AppFabric
39Windows Azure
- Windows Azure is the operating system that helps
developers build, host and scale applications
through Microsoft datacenters - Applications are run through internet accessible
data centers. - Data stored on machines in a internet accessible
data center.
40Windows Azure apps run in data centers accessed
via internet
White Paper Introducing windows azure
41Main Components of Windows Azure
White Paper Introducing windows azure
42Windows Azure Components
- Compute runs apps in the cloud.
- Storage Stores data in the cloud.(Blobs, tables,
queues) - Fabric Controller Deploys , monitors and manages
the apps in the cloud. - Content Delivery Network For faster access to
the data storage by maintaining cached copies of
data. - Connect allows connection between on-premises
computers and applications.
43Application Roles
- An application can have one or more instances of
each of these roles. - Web Roles This makes it easy for web based
applications. It has IIS configured within it.
This is like front end. - So creating WCF, ASP.NET apps is easy.
- Worker Roles For windows based code. This does
not have IIS configured. Handles all processing
like user interactions, video processing etc. - When user submits request for some task. That
task can involve front end and back end tasks.
Web role takes care of the front end tasks and it
hands over the processing tasks to the Worker
role. - VM Roles- Helps moving windows server apps to
windows azure.
44Submitting App to Azure
- Submits
- App
- Config- tells platform how many instances of each
role (web, worker)to run. - Fabric Controller based on the config file
info, creates a VM for each instance. -
45Setting up a simple Azure Application
- http//www.microsoft.com/windowsazure/getstarted/
46Browser Output
47Sample Application to store file data to Azure
- Create a simple text file myfile.txt
- Create a console application in VS2010
- Reference Microsoft Azure storage DLL.
- Create a blob (These are used for storage. They
can interact with storage as if they were a local
system file). - Set file reference to the blob.
- Upload file from local to blob.
- Get the URI of the blob.
- Now we can access data through this URI.
48Current Problems/Limitations of Cloud
- EC2 Limitations
- Not easy to recover if something goes wrong after
creating instance. - Random Loss of Instances
- Server Configuration Woes Configuring, Running
and Monitoring EC2 Instances - Azure Limitations
- Azure provides application level cloud computing
not infrastructural cloud computing like amazon.
Can only select applications, no choice of OS. - Security concerns We can not be sure who has
access to data. - Learning curve working with storage like blobs,
tables, queues needs some experience. - Poor debugging and logging support for deployed
applications - Untested compared to Google and Amazons
offerings. - App Engine Limitations
- Returns stale results in case of non ancestoral
queries in High Replication Datastore. - Data may be unavailable during planned downtime
or failures in case of Master/Slave data store.
49Summary
- Cloud Computing Introduction, Benefits, Cloud
Service Models - Case study 1 Amazon EC2
- EC2 Architecture
- EC2 Core Features and Functionality
- Demo of Launching an EC2 Instance
- April 2011 Failure Analysis and Lessons Learned
- Different Pay Models
- Highlights
- Case study 2Google AppEngine
- Overview
- High Level Architecture
- Data Stores
- High Reliable
- Master/Slave
- Services and Quotas
- Billing Model
50Summary
- Case study 3 Microsoft Azure
- Windows Azure Definition
- Main Components
- Roles
- Web Roles
- Worker Roles
- VM Roles
- Sample Applications
- Hello World!
- Accessing file data.
- Limitations
-
51References
- Google ?
- http//www.microsoft.com/windowsazure
- http//www.jackofallclouds.com/
- http//aws.amazon.com/ec2/
- http//cloud-computing.learningtree.com/tag/amazo
n-ec2/ - White Paper on AWS Cloud Best Practices, 2010 By
Jinesh Varia - White Paper on Amazon EC2 on Red Hat Enterprise
Linux - http//www.microsoft.com/en-us/cloud/developer/
- http//www.microsoft.com/en-us/cloud/developer/res
ource.aspx?resourceIdintroducing-windows-azurefb
idkRj7B2TdjLB - http//www.microsoft.com/windowsazure/getstarted/
- http//www.microsoft.com/windowsazure/sdk/
- http//kasunpanorama.blogspot.com/2010/07/underst
anding-cloud-computing-feel-easy.html - http//code.google.com/appengine/docs/
- Google App engine paper by Alexander Zahariev
Helsinki University of Technology
52Team members contributions
- Pooja Amazon EC2
- Navi Google App Engine
- Sangeetha Microsoft Azure
53Thanks