Title: What is cloud computing Is it a hype Or should I learn
1What is cloud computing? Is it a hype? Or should
I learn?
2Datacenter as server
- Program Web search, email, map/GIS,
- Computer 1000s computers, storage, network
- Warehouse-sized facilities and workloads
- How to enable innovation in new services without
first building capitalizing a large company?
2
3Solutions!
- Software as service (for us DB people--SQL Azure)
- Computing/Storage as service
- Infrastructure as service
- ? pay-as-you-grow (think web hosting going
extreme..) - (Think electricity)
4Advantages
- Cost management
- Economies of scale, out-sourced resource
management - Reduced Time to deployment
- Ease of assembly, works out of the box
- Scaling
- On demand provisioning, co-locate data and
compute - Reliability
- Massive, redundant, shared resources
- Sustainability
- Hardware not owned
5Some say
- Its simply a hype, repackaging the old as if
its new
6Is it a hype?
7But can we ignore?
8Yes, it bears similarity, but why take off Now
(not then)?
- The Web Space Race Build-out of extremely
large datacenters (10,000s of commodity PCs) - More pervasive broadband Internet
- Standardized software stacks
8
9Software Stacks?
- Many choices there already
Pig, Hive
Sawzall
DryadLINQ
MapReduce
Dryad
Hadoop
Build-your-own
Amazon EC2
Microsoft Azure
Little
Temporarily free
Open-source, Build or emulate!
10So, we will experience the Azure stack today
11Illustrative video
12I cannot understand technology, until I build
Hello,world
- Lets try with SQL Azure
- At current stage, its almost transparent that
you wont notice youre on a cloud
13SQL Azure Demo
14I said, almost
- Your DB application will be more reliable,
secure, and scalable - But you have less control on where your data
reside (some data need to be stored together for
efficiency) e.g., affinity group - Some advanced features are missing (e.g.,
geometric data)
15You think this is cool?
- Lets talk lower level-- Outsourcing computation
and storage
16Windows Azure Storage Overview
- Windows Azure Storage provides three data
abstractions - Blobs Provide a simple interface for storing
named files along with metadata for the file. - Tables Provide structured storage. A Table is
a set of entities, which contain a set of
properties. - Queues Provide reliable storage and delivery of
messages for an application. - (BigTable?)
17Windows Azure Storage Goals
- To let users and applications
- Access their data efficiently from anywhere at
any time using simple and familiar programming
API - Scale to store any amount of data for any length
of time knowing that the data will not be lost. - Pay for what they use.
18Windows Azure Storage Account
- To store data securely in the cloud
- Use developer portal to create a globally unique
account name and receive a 256 bit secret key. - Use the secret key to create a HMAC SHA256
signature to authenticate each request to the
storage service.
19Azure Blob Feature Summary
Windows
- Account can have many containers
- A container
- Is a set of blobs
- Can have metadata (8K limit)
- Boundary for access control
- A blob
- Stores large objects (50GB limit)
- Can have metadata (8K limit)
- Consists of lists of blocks providing robust blob
upload - Standard REST API
20Windows Azure Table Storage
- Primary key is composite of Partition key and Row
key
21Computation..
- While many parallelization details are
encapsulated (scheduling, load balancing, fault
tolerance) you still need to think in
mapping-reducing framework (MapReduce, Hadoop,
Dryad) - Complete encapsulation? Compiler?
22Code Example
- http//research.microsoft.com/pubs/66811/tr-2008-7
4.pdf - Let us assume that the input is a large text file
distributed over many machines. We want to compute
a histogram of the words in the web pages, and ex
tract the top k words and their counts.
23Parallel Plan
24Offline Simulation?
- Betty Botter bought a bit of butterThe butter
Betty Botter bought was a bit bitterAnd made her
batter bitter.But a bit of better butter makes
better batter.So Betty Botter bought a bit of
better butterMaking Betty Botter's bitter batter
better.
25Similar in Hadoop/MapReduce
- http//code.google.com/intl/ko-KR/edu/parallel/ind
ex.html - map(String input_key, String input_value)
- // input_key document name
- // input_value document contents
- for each word w in input_value
- EmitIntermediate(w, "1")
- reduce(String output_key, Iterator
intermediate_values) - // output_key a word
- // output_values a list of counts
- int result 0
- for each v in intermediate_values
- result ParseInt(v)
- Emit(AsString(result))
26When done, what if you want to deploy it into
histogram.com and make LOTS and LOTS of money?
(DEMO)
27If into Bioinformatics..
A set of specialized tools available publically
available on CodePlex and as a web application