Cloud Computing Skepticism - PowerPoint PPT Presentation

1 / 69

About This Presentation

Title:

Cloud Computing Skepticism

Description:

'Cloud computing is simply a buzzword used to repackage grid computing and ... It's complete gibberish. It's insane. When is this idiocy going to stop?' Larry Ellison ... – PowerPoint PPT presentation

Number of Views:182

Avg rating:3.0/5.0

Slides: 70

Provided by: cyb42

Category:

more less

Transcript and Presenter's Notes

Title: Cloud Computing Skepticism

1
Cloud Computing Skepticism

Abhishek Verma, Saurabh Nangia

2
Outline

Cloud computing hype
Cynicism
MapReduce Vs Parallel DBMS
Cost of a cloud
Discussion

3
Recent Trends
Amazon S3 (March 2006)
Amazon EC2 (August 2006)
Salesforce AppExchange (March 2006)
Google App Engine (April 2008)
Microsoft Azure (Oct 2008)
Facebook Platform (May 2007)
4
Tremendous Buzz
5
Gartner Hype Cycle
From http//en.wikipedia.org/wiki/Hype_cycle
6
Blind men and an Elephant
7

Cloud computing is simply a buzzword used to
repackage grid computing and utility computing,
both of which have existed for decades.

whatis.com Definition of Cloud Computing
8

The interesting thing about cloud computing is
that weve redefined cloud computing to include
everything that we already do.
The computer industry is the only industry that
is more fashion-driven than womens fashion.
Maybe Im an idiot, but I have no idea what
anyone is talking about. What is it? Its
complete gibberish. Its insane. When is this
idiocy going to stop?

Larry Ellison During Oracles Analyst Day
From http//blogs.wsj.com/biztech/2008/09/25/larry
-ellisons-brilliant-anti-cloud-computing-rant/
9
From http//geekandpoke.typepad.com
10
Reliability

Many enterprise (necessarily or unnecessarily)
set their SLAs uptimes at 99.99 or higher, which
cloud providers have not yet been prepared to
match

Amazons cloud outages receive a lot of exposure Amazons cloud outages receive a lot of exposure
July 20, 2008 Failure due to stranded zombies, lasts 5 hours
Feb 15, 2008 Authentication overload leads to two-hour service outage
October 2007 Service failure lasts two days
October 2006 Security breach where users could see other users data
and their current SLAs dont match those of enterprises and their current SLAs dont match those of enterprises and their current SLAs dont match those of enterprises and their current SLAs dont match those of enterprises
Amazon EC2 99.95 Amazon S3 99.9

Not clear that all applications require such
high services
IT shops do not always deliver on their SLAs
but their failures are less public and customers
cant switch easily

SLAs expressed in Monthly Uptime Percentages
Source McKinsey Company
11
A Comparison of Approaches to Large-Scale Data
Analysis

Andrew Pavlo, Erik Paulson, Alexander Rasin,
Daniel J. Abadi, David J. DeWitt, Samuel Madden,
Michael Stonebraker
To appear in SIGMOD 09

Basic ideas from MapReduce - a major step
backwards, D. DeWitt and M. Stonebraker
12
MapReduce A major step backwards

A giant step backward
No schemas, Codasyl instead of Relational
A sub-optimal implementation
Uses brute force sequential search, instead of
indexing
Materializes O(m.r) intermediate files
Does not incorporate data skew
Not novel at all
Represents a specific implementation of well
known techniques developed nearly 25 years ago
Missing most of the common current DBMS features
Bulk loader, indexing, updates, transactions,
integrity constraints, referential Integrity,
views
Incompatible with DBMS tools
Report writers, business intelligence tools, data
mining tools, replication tools, database design
tools

13
Architectural Element Parallel Databases MapReduce
Schema Support Structured Unstructured
Indexing B-Trees or Hash based None
Programming Model Relational Codasyl
Data Distribution Projections before aggregation Logic moved to data, but no optimizations
Execution Strategy Push Pull
Flexibility No, but Ruby on Rails, LINQ Yes
Fault Tolerance Transactions have to be restarted in the event of a failure Yes Replication, Speculative execution
14
MapReduce II

MapReduce didn't kill our dog, steal our car, or
try and date our daughters.
MapReduce is not a database system, so don't
judge it as one
Both analyze and perform computations on huge
datasets
MapReduce has excellent scalability the proof is
Google's use
Does it scale linearly?
No scientific evidence
MapReduce is cheap and databases are expensive
We are the old guard trying to defend our
turf/legacy from the young turks
Propagation of ideas between sub-disciplines is
very slow and sketchy
Very little information is passed from generation
to generation

http//www.databasecolumn.com/2008/01/mapreduce-
continued.html
15
Tested Systems

Hadoop
0.19 on Java 1.6, 256MB block size, JVM reuse
Rack-awareness enabled
DBMS-X (unnamed)
Parallel DBMS from a major relational db vendor
Row based, compression enabled
Vertica (co-founded by Stonebraker)
Column oriented
Hardware configuration 100 nodes
2.4 GHz Intel Core 2 Duo
4GB RAM, 2 250GB SATA hard disks
GigE ports, 128Gbps switching fabric

16
Data Loading

Hadoop
Command line utility
DBMS-X
LOAD SQL command
Administrative command to re-organize data

Grep Dataset
Record 10b key 90b random value
5.6 million records 535MB/node
Another set 1TB/cluster

17
Grep Task Results
SELECT FROM Data WHERE field LIKE XYZ
18
Select Task Results
SELECT pageURL, pageRank FROM Rankings WHERE
pageRank gt X
19
Join Task
SELECT INTO Temp sourceIP, AVG(pageRank) as
avgPageRank, SUM(adRevenue) as totalRevenue FROM
Rankings AS R, UserVisits AS UV WHERE R.pageURL
UV.destURL AND UV.visitDate BETWEEN
Date(2000-01-15) AND Date(2000-01-22) GROUP
BY UV.sourceIP SELECT sourceIP, totalRevenue,
avgPageRank FROM Temp ORDER BY totalRevenue DESC
LIMIT 1
20
Concluding Remarks

DBMS-X 3.2 times, Vertica 2.3 times faster than
Hadoop
Parallel DBMS win because
B-tree indices to speed the execution of
selection operations,
novel storage mechanisms (e.g.,
column-orientation)
aggressive compression techniques with ability to
operate directly on compressed data
sophisticated parallel algorithms for querying
large amounts of relational data.
Ease of installation and use
Fault tolerance?
Loading data?

21
The Cost of a Cloud Research Problem in Data
Center Networks

Albert Greenberg, James Hamilton, David A. Maltz,
Parveen Patel
MSR Redmond

Presented by Saurabh Nangia
22
Overview

Cost of cloud service
Improving low utilization
Network agility
Incentive for resource consumption
Geo-distributed network of DC

23
Cost of a Cloud?

Where does the cost go in todays cloud service
data centers?

24
Cost of a Cloud
Amortized Costs (one time purchases amortized
over reasonable lifetimes, assuming 5 cost of
money)
45
25
15
15
25
Are Clouds any different?

Can existing solutions for the enterprise data
center work for cloud service data centers?

26
Enterprise DC vs Cloud DC (1)

In enterprise
Leading cost operational staff
Automation is partial
IT staff Servers 1100
In cloud
Staff costs under 5
Automation is mandatory
IT staff Servers 11000

27
Enterprise DC vs Cloud DC (2)

Large economies of scale
Cloud DC leverage economies of scale
But up front costs are high
Scale Out
Enterprises DC scale up
Cloud DC scale out

28
Types of Cloud Service DC (1)

Mega data centers
Tens of thousands (or more) servers
Drawing tens of Mega-Watts of power (at peak)
Massive data analysis applications
Huge RAM, Massive CPU cycles, Disk I/O operations
Advantages
Cloud services applications build on one another
Eases system design
Lowers cost of communication needs

29
Types of Cloud Service DC (2)

Micro data centers
Thousands of servers
Drawing power peaking in 100s of Kilo-Watts
Highly interactive applications
Query/response, office productivity
Advantages
Used as nodes in content distribution network
Minimize speed-of-light latency
Minimize network transit costs to user

30
Cost Breakdown
31
Server Cost (1)

Example
50,000 servers
3000 per server
5 cost of money
3 year amortization
Amortized cost 50000 3000 1.05 / 3
52.5 million dollars per year!!
Utilization remarkably low, 10

32
Server Cost (2)

Uneven Application Fit
Uncertainty in demand forecasts
Long provisioning time scales
Risk Management
Hoarding
Virtualization short-falls

33
Reducing Server Cost

Solution Agility
to dynamically grow and shrink resources to meet
demand, and
to draw those resources from the most optimal
location.
Barrier Network
Increases fragmentation of resources
Therefore, low server utlization

34
Infrastructure Cost

Infrastructure is overhead of Cloud DC
Facilities dedicated to
Consistent power delivery
Evacuating heat
Large scale generators, transformers, UPS
Amortized cost 18.4 million per year!!
Infra cost 200M
5 cost of money
15 year amortization

35
Reducing Infrastructure Cost

Reason of high cost requirement for delivering
consistent power
Relaxing the requirement implies scaling out
Deploy larger numbers of smaller data centers
Resilience at data center level
Layers of redundancy within data center can be
stripped out (no UPS generators)
Geo-diverse deployment of micro data centers

36
Power

Power Usage Efficiency (PUE)
(Total Facility Power)/(IT Equipment Power)
Typically PUE 1.7
Inefficient facilities, PUE of 2.0 to 3.0
Leading facilities, PUE of 1.2
Amortized cost 9.3million per year!!
PUE 1.7
.07 per KWH
50000 servers each drawing average 180W

37
Reducing Power Costs

Decreasing power cost -gt decrease need of
infrastructure cost
Goal Energy proportionality
server running at N load consume N power
Hardware innovation
High efficiency power supplies
Voltage regulation modules
Reduce amount of cooling for data center
Equipment failure rates increase with temp
Make network more mesh-like resilient

38
Network

Capital cost of networking gear
Switches, routers and load balancers
Wide area networking
Peering traffic handed off to ISP for end users
Inter-data center links b/w geo distributed DC
Regional facilities (backhaul, metro-area
connectivity, co-location space) to reach
interconnection sites
Back-of-the-envelope calculations difficult

39
Reducing Network Costs

Sensitive to site selection industry dynamics
Solution
Clever design of peering transit strategies
Optimal placement of micro mega DC
Better design of services (partitioning state)
Better data partitioning replication

40
Perspective

On is better than off
Server should be engaged in revenue production
Challenge Agility
Build in resilience at systems level
Stripping out layers of redundancy inside each
DC, and instead using other DC to mask DC failure
Challenge Systems software Network research

41
Cost of Large Scale DC
http//perspectives.mvdirona.com/2008/11/28/CostO
fPowerInLargeScaleDataCenters.aspx
42
Solutions!
43
Improving DC efficiency

Increasing Network Agility
Appropriate incentives to shape resource
consumption
Joint optimization of Network DC resources
New mechanisms for geo-distributing states

44
Agility

Any server can be dynamically assigned to any
service anywhere in DC
Conventional DC
Fragment network server capacity
Limit dynamic growth and shrink of server pools

45
Networking in Current DC

DC network two types of traffic
Between external end systems internal servers
Between internal servers
Load Balancer
Virtual IP address (VIP)
Direct IP address (DIP)

46
Conventional Network Architecture
47
Problems (1)

Static Network Assignment
Individual applications mapped to specific
physical switches routers
Adv performance security isolation
Disadv Work against agility
Policy-overloaded (traffic, security,
performance)
VLAN spanning concentrates traffic on links high
in tree

48
Problems (2)

Load Balancing Techniques
Destination NAT
All DIPs in a VIPs pool be in the same layer 2
domain
Under-utilization fragmentation
Source NAT
Servers spread across layer 2 domain
But server never sees IP
Client IP required for data mining response
customization

49
Problems (3)

Poor server to server connectivity
Connection b/w servers in diff layer 2 must go
through layer 3
Links oversubscribed
Capacity of links b/w access router border
routers lt output capacity of servers connected to
access router
Ensure no saturation in any of network links!

50
Problems (4)

Proprietary hardware scales up, not out
Load balancers used in pairs
Replaced when load becomes too much

51
DC Networking Design Objectives

Location-independent Addressing
Decouple servers location in DC from its address
Uniform Bandwidth Latency
Servers can be distributed arbitrarily in DC
without fear of running into bandwidth choke
points
Security Performance Isolation
One service should not affect others performance
DoS attack

52
Incenting Desirable Behavior (1)

Yield management
to sell the right resources to the right customer
at the right time for the right price
Trough filling
Cost determined by height of peaks, not area
Bin packing opportunities
Leasing committed capacity with fixed minimum
cost
Prices varying with resource availability
Differentiate demands by urgency of execution

53
Incenting Desirable Behavior (2)

Server allocation
Large unfragmented servers Agility
Less requests for servers
Eliminating hoarding of servers
Cost for having a server
Seasonal peaks
Internal auctions may be fairest
But, how to design!

54
Geo-Distribution

Speed latency matter
Google 20 revenue loss for 500ms delay!!
Amazon 1 sales decrease for 100ms delay!!
Challenges
Where to place data centers
How big to make them
Using it as a source of redundancy to improve
availability

55
Optimal Placement Sizing (1)

Importance of Geographical Diversity
Decreasing latency b/w user and DC
Redundancy (earthquakes, riots, outages, etc)
Size of data center
Mega DC
Extracting maximum benefit from economies of
scale
Local factors like tax, power concessions, etc.
Micro DC
Enough servers to provide statistical
multiplexing gains
Given a fixed budget, place closes to each
desired population

56
Optimal Placement Sizing (2)

Network cost
Performance vs cost
Latency vs Internet peering dedicated lines
between data centers
Optimization should also consider
Dependencies of services offered
Email -gt buddy list maintenance, authentication,
etc
Front end micro data centers (low latency)
Back end mega data centers (greater resources)

57
Geo-Distributing State (1)

Turning geo-diversity to geo-redundancy
Distribute critical state across sites
Facebook
Single master data center replicating data
Yahoo! Mail
Partitions data across DCs based on user
Different solutions for Different data
Buddy status replicated weak consistency
assurance
Email mailbox by user ids, strong consistency

58
Geo-Distributing State (2)

Tradeoffs
Load distribution vs service performance
eg Facebooks single master coordinate
replication
Speeds up lookup but loads on master
Communication cost vs service performance
Data replication-more inter data center
communication
Longer latency
Higher cost message over inter DC links

59
Summary

Data center costs
Server, Infrastructure, Power, Networking
Improving efficiency
Network Agility
Resource Consumption Shaping
Geo-diversifying DC

60
Opinions
61

Richard Stallman, GNU founder
Cloud Computing is a trap
.. cloud computing was simply a trap aimed at
forcing more people to buy into locked,
proprietary systems that would cost them more and
more over time.
"It's stupidity. It's worse than stupidity it's
a marketing hype campaign"

Open Cloud Manifesto
a document put together by IBM, Cisco, ATT, Sun
Microsystems and over 50 others to promote
interoperability
"Cloud providers must not use their market
position to lock customers into their particular
platforms and limit their choice of providers,
Failed? Google, Amazon, Salesforce and Microsoft,
four very big players in the area, are notably
absent from the list of supporters

Larry Ellison, Oracle founder
"fashion-driven" and "complete gibberish
What is it? What is it? ... Is it - 'Oh, I am
going to access data on a server on the
Internet.' That is cloud computing?
Then there is a definition What is cloud
computing? It is using a computer that is out
there. That is one of the definitions 'That is
out there.' These people who are writing this
crap are out there. They are insane. I mean it is
the stupidest.

Sam Johnston, Strategic Consultant Specializing
in Cloud Computing,
Oracle would be out badmouthing cloud computing
as it has the potential to disrupt their entire
business.
"Who needs a database server when you can buy
cloud storage like electricity and let someone
else worry about the details? Not me, that's for
sure - unless I happen to be one of a dozen or so
big providers who are probably using open source
tech anyway,

Marc Benioff, head of salesforce.com
Cloud computing isn't just candyfloss thinking
it's the future. If it isn't, I don't know what
is. We're in it. You're going to see this model
dominate our industry."
Is data really safe in the cloud? "All complex
systems have planned and unplanned downtime. The
reality is we are able to provide higher levels
of reliability and availability than most
companies could provide on their own," says
Benioff

John Chambers, Cisco Systems CEO
"a security nightmare.
cloud computing was inevitable, but that it
would shake up the way that networks are
secured

James Hamilton, VP Amazon Web Services
any company not fully understanding cloud
computing economics and not having cloud
computing as a tool to deploy where it makes
sense is giving up a very valuable competitive
edge
No matter how large the IT group, if I led the
team, I would be experimenting with cloud
computing and deploying where it make sense

68
To Cloud or Not to Cloud?
69
References