Roger Barga Architect, Cloud Computing Futures Group Microsoft Research (MSR) PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Roger Barga Architect, Cloud Computing Futures Group Microsoft Research (MSR)


1
Roger BargaArchitect, Cloud Computing Futures
GroupMicrosoft Research (MSR)
Cloud Computing A Microsoft Research Perspective
Contributors to this presentation includeDan
Reed, Dennis Gannon, Navendu Jain, and Tony Hey
(MSR)
2
eXtreme Computing, MSR
Rethink the nature of computing at extreme scale,
from alternative, quantum computing models,
through the transformative effects of manycore
parallelism on programming systems and
architectures, through massive cloud computing
infrastructure designs. eXtreme Computing
Division Dan Reed, CVP Microsoft Research
ab initio research and development on cloud
hardware and software infrastructure. Investigate
cloud computing for research empowerment with
worldwide government academic
partnerships. Cloud Computing Futures Group
3
Talk Outline
  • Data Center landscape
  • Cloud computing spectrum, the rise of a new
    platform
  • Data intensive research, role of cloud computing
  • Key takeaways
  • Data centers and HPC, like twins separated at
    birth Dan Reed
  • Data centers evolving at a blistering pace,
    driven by economics
  • The Application Model for Cloud Computing Is
    Evolving
  • Economic landscape increasingly favors pay as
    you go
  • There are many obstacles, but economic forces
    will dominate the obstacles
  • Emergence of the Fourth Paradigm, synergistic
    with cloud computing

4
HPC and Clouds Select Comparisons
  • Node and system architectures
  • Communication fabric
  • Storage systems and analytics
  • Physical plant and operations
  • Reliability and resilience
  • Programming models

5
HPC Node Architecture
  • Moores Law favored commodity systems
  • Specialized processors and systems faltered
  • Killer micros and industry standard blades led
  • Inexpensive clusters now dominate

www.top500.org
6
HPC Interconnects
  • Ethernet for low end (cost sensitive)
  • High end expectations
  • Nearly flat networks and very large switches
  • Operating system bypass for low latency
    (microseconds)

www.top500.org
7
Modern Data Center Network
Internet
Monsoon network with Valiant routing
  • Key
  • CR (L3 Border Router)
  • AR (L3 Access Router)
  • S (L2 Switch)
  • LB (Load Balancer)
  • A (20 Server Rack/TOR)

Source Albert Greenberg and Cisco
8
HPC Interconnects
  • Ethernet for low end (cost sensitive)
  • High end expectations
  • Nearly flat networks and very large switches
  • Operating system bypass for low latency
    (microseconds)

www.top500.org
9
HPC Storage Systems
  • Local disk
  • Scratch or non-existent
  • Secondary storage
  • SAN and parallel file systems
  • Hundreds of TBs (at most)
  • Tertiary storage
  • Tape robot(s)
  • 3-5 GB/s bandwidth

60 PB capacity
www.nersc.gov
10
I/O Implications and Scale
  • Typical HPC scenario
  • MPI computation
  • Domain decomposition
  • SAN-based parallel file system
  • Periodic checkpoints
  • Scaling challenges
  • System MTBF approaching zero
  • Checkpoint frequency increasing
  • I/O demand becoming intolerable
  • Implications
  • Unlikely to extend to exascale
  • Loosely consistent models required

Slide by Dan Reed
11
Cloud/HPC Hardware Comparison
Attribute HPC Cloud
Processor High-end x86 x86
Memory 1-8 GB 8 GB
Local Disk Scratch only Permanent storage
SAN Storage Common Rare
Tertiary Storage Common Rare
Interconnect Infiniband or 10 GigE 1 GigE/10GigE
Network Flat Hierarchical
Physical Plant Traditional Optimized
Efficient Virtualization
  • Predominate differences
  • Network architecture and SAN storage

12
Virtualization as Enabler
  • EMULATION OF EXISTING APPS
  • Hardware via existing ISA, memory mapped ports,
    etc.
  • Storage via SCSI LUN or other disk interface
  • Application via underlying API

ENABLEMENT OF NEW SERVICES
  • Resource utilization ? pool concrete resources
  • Decouples concrete resources ? enables
    migration
  • Extend existing abstractions ? e.g. LUN
    expansion

13
HPC Physical Plant
  • Facilities
  • Co-located with operating institution
  • Standard raised floor and CRAC units
  • Limited UPS support
  • Typically constrained to 3-5 MW
  • Designed as Lab showpieces

LBL
38,640 cores
ORNL
LANL
150,152 cores
130,000 cores
ANL
163,840 cores
14
The Data Center Landscape
  • Range in size from edge facilities to
    megascale.
  • Unprecedented economies of scale
  • Approximate costs for a medium size center (1000
    servers) and large, 50K server center.

Technology Cost in Medium-sized Data Center Cost in Very Large Data Center Ratio
Network 95 per Mbps/ month 13 per Mbps/ month 7.1
Storage 2.20 per GB/ month 0.40 per GB/ month 5.7
Administration 140 servers/ Administrator gt1000 Servers/ Administrator 7.1
James Hamilton, LADIS 08
15
Modern Data Center Containers Separating Concerns
16
Data Center Design Issues
  • Where are the costs?
  • Mid-sized facility (20 containers)
  • Cost of power (/kwh) 0.07
  • Cost of facility 200,000,000
    (amortize 15 years)
  • Number of Servers 50,000 (3 year life)
    _at_2K each
  • Power critical load
    15MW
  • Power Usage Effectiveness (PUE)

    1.7
  • Observe
  • Fully burdened cost of power power consumed
    cost of cooling and power distribution
    infrastructure
  • As cost of servers drops and power
  • costs rise, power will dominate all
  • other costs.

17
Power
  • EPA released a report saying
  • In 2006 data centers used 61 Terawatt-hours of
    power
  • Total power bill 4.5 billion
  • 7 GW peak load (15 power plants)
  • This was 1.5 of all US electrical energy use.
  • Expected to double by 2011.
  • Power accounts for 30 of Data Center costs
  • Only 20-30 CPU utilization
  • Causes Uneven app fit, demand varies,
    over-provisioning, etc.
  • A deeper look and a few ideas .

18
Power and Cooling Is Expensive!
  • Infrastructure for power cooling cost a lot
  • Infrastructure PLUS Energy gt Server Costs Since
    2001
  • Infrastructure Alone gt Server Costs Since 2004
  • Energy Alone gt Server Cost Since 2008
  • Cost Effective to discard energy inefficient
    servers
  • Power Savings ? Infrastructure Savings!
  • Like Airlines Retiring Fuel-Guzzling Airplanes

19
What can we do about power costs?
  • Data Centers use 1.5 of US electricity
  • 4.5 billion annually
  • 7 GW peak load (15 power plants)
  • 44.4 million mt CO2 (0.8 emissions)
  • Rethink Environmentals
  • Run them in a wider rage of conditions
  • Christian Beladys In Tent data center
    experiment.
  • Rethink UPS
  • Googles battery per server.
  • Rethink Architecture
  • Intel Atom and power states.
  • Marlowe Project

20
Marlowe the Big Sleep
  • Adaptive Resource Management
  • Monitor the data center and its apps.
  • Use rules engine fuzzy logic to control
    resources
  • for most current workloads
  • Spare capacity available
  • Sleep/hibernate 3 4 watts (vs. 28 36 watts
    for Atom servers)
  • 5 45 sec. to reactivate server

Created by Navendu Jain, CJ Williams, Dan Reed
and Jim Larus
21
Microsofts Data Center Evolution
Data Center Evolution
22
What is a "cloud computing"?
23
So What is Cloud Computing?...
  • Using a remote data center to manage scalable,
    reliable, on-demand access to application
    services and data.
  • Scalable means
  • Possibly millions of simultaneous users of app.
  • Exploiting thousand-fold parallelism in the app.
  • Reliable means on-demand means 5 nines
    available right now
  • Three New Aspects to Cloud Computing
  • Illusion of infinite computing resources
    available on demand
  • Elimination of an upfront commitment
  • Ability to pay for use of computing resources on
    a short-term basis as needed

24
Platform Extension to the Cloud is a Continuum
  • New capabilities
  • New cost structure
  • Requires embracing a specific app model
  • Hosted version of what you have been using so
    far
  • Requires few changes if any to what you know and
    do

What Youve Been Using So Far
25
Spectrum of Application Models
26
Azure Programming Model
Abstract Programming Model
In-band communication software control
Load-balancers
Switches
Highly-available Fabric Controller
27
The Azure Fabric
  • Consists of a (large) group of machines, all of
    which are managed by software called the fabric
    controller
  • The fabric controller is replicated across a
    group of five to seven machines, and it owns all
    of the resources in the fabric
  • Because it can communicate with a fabric agent on
    every computer, its also aware of every Windows
    Azure application in this fabric

28
RolesScalable, Fault Tolerant, Stateless
  • A Scalable architecture is critical to take
    advantage of scalable infrastructure
  • Queues decouple different parts of app, making it
    easier to scale app parts independently
  • Flexible resource allocation, different priority
    queues and separation of backend servers to
    process different queues.
  • Queues mask faults in worker roles.
  • Roles are a mostly stateless process running in a
    Windows Server 2008 VM on one or more cores
  • Web Roles provide web service access to app Web
    roles generate tasks for worker roles
  • Worker Roles do heavy lifting and manage data
    in tables/blobs
  • Communication is through queues.
  • The number of instances can scale with load.

29
Storage Blobs, Tables and Queues, and a full
relational database
  • The simplest way to store data in Azure storage
    is to use blobs
  • A blob contains binary data, up to 50GB
  • Each table holds some number of entities. An
    entity contains zero or more properties
  • SQL Data Services provide the SQL data platform
    in the cloud

Blobs can be bigup to 50 gigabytes each They can
also have associated metadata
30
Back to the Future (again.)
  • Mid 1980's The invention of client/server
    databases
  • Data locked up in mainframe DBs
  • Closed monolithic trust boundary
  • PCs? Spreadsheets and terminal emulation
  • Networks lots of them DECNet, IPX, SNA, Banyan
    Vines, TCP/IP
  • Client / Server database challenges
  • Had to invent network abstraction layer,
    formats, protocols
  • Had to consider latency, concurrency control
  • Had to move trust boundary
  • Wound up with only 60 of the incumbent's
    capabilitycould have been easily dismissed as a
    failure
  • End result
  • Data was made accessible where it could be used
    in a new way
  • Client / Server databases are now viewed as being
    tremendously successful

31
Data in a Cloud Services World
  • Cloud database service challenges
  • Same as Client / Server DBMS shift
  • Formats, protocols, authentication,
    authorization, latency, trust boundary
  • Will not do 100 of what client / server
    databases can do
  • Cloud database service capabilities
  • Data boundary moves from corporate LAN to
    internet
  • Utility DBMS for cloud applications
  • Expect new capabilities, new value proposition

32
Cloud Platform Strategic Differentiator and
Economics
Competitive advantageAND economics
Innovation introduced by first firm
Competitive Advantage
Time
33
The Economics of Elasticity by the numbers

Example of Elasticity
Elasticity May Be More Cost-Effective Even with a
Higher Per-Hour Charge!
Takes Weeks to Acquire and Install Equipment
34
The Cloud Empowers the Long Tail of Research


  • Research Funding
  • Have good idea
  • Write proposal
  • Wait 6 months
  • If successful, wait 3 months to get
  • Install Computers
  • Start Work
  • Science Start-ups
  • Have good idea
  • Write Business Plan
  • Ask VCs to fund
  • If successful...
  • Install Computers
  • Start Work
  • Cloud Computing Model
  • Have good idea
  • Grab nodes from Cloud provider
  • Start Work
  • Pay for what you actually used

Poised to reach a broad class of new users
Slide compliments of Paul Watson, University of
Newcastle (UK)
35
Emergence of a Fourth Research Paradigm
  • Thousand years ago Experimental Science
  • Description of natural phenomena
  • Last few hundred years Theoretical Science
  • Newtons Laws, Maxwells Equations
  • Last few decades Computational Science
  • Simulation of complex phenomena
  • Today Data-Intensive Science
  • Scientists overwhelmed with data sets from a
    variety of different sources
  • Data captured by instruments, sensor networks
  • Data generated by simulations
  • Data generated by computational models

Astronomy was one of the first disciplines to
embrace data-intensive science with the Virtual
Observatory (VO), enabling highly efficient
access to data and analysis tools at a
centralized site. The image shows the Pleiades
star cluster form the Digitized Sky Survey
combined with an image of the moon, synthesized
within the WorldWide Telescope
With thanks to Jim Gray
36
Science ExamplePhyloD as an Azure Service
  • Statistical tool used to analyze DNA of HIV from
    large studies of infected patients
  • PhyloD was developed by Microsoft Research and
    has been highly impactful
  • Small but important group of researchers
  • 100s of HIV and HepC researchers actively use it
  • 1000s of research communities rely on results

Cover of PLoS Biology November 2008
  • Typical job, 10 20 CPU hours, extreme jobs
    require 1K 2K CPU hours
  • Requires a large number of test runs for a given
    job (1 10M tests)
  • Highly compressed data per job ( 100 KB per job)

37
Metagenomics Atop Azure
Basic Map-Reduce - 2 GB database per worker -
500 MB input file.
BLAST user selects DBs and input sequence
Blast Web Role
Input Splitter Worker Role
  • Metagenomics
  • Ecosystem characterization
  • Map Reduce-style
  • Parallel BLAST
  • 50 roles, speedup 45
  • 100 roles, speedup 94

BLAST Execution Worker Role n
BLAST Execution Worker Role 1
.
Azure Blob Storage
BLAST DB Configuration
Genome DB K
Genome DB 1
Combiner Worker Role
38
Reference Data on Azure
  • Ocean Science data on Azure SDS-relational
  • Two terabytes of coastal and model data
  • Computational finance data on SDS-relational
  • BATS, daily tick data for stocks (10 years)
  • XBRL call report for banks (10,000 banks)
  • Storing select seismic data on Azure, NSF funded
    consortium that collects and distributes global
    seismological data.
  • Data sets requested by researchers worldwide
  • Includes HD videos, seismograms, images, data
    from major seismic events.

39
Takeaways
  • Data centers and HPC, like twins separated at
    birth
  • Interconnect, Storage and Efficient
    Virtualization
  • Data centers evolving at a blistering pace,
    driven by economics
  • The Economics Are Changing towards Cloud
    Computing
  • Big Data centers Offer Big Economies of Scale
  • Cloud Computing Transfers Risks Away from the
    Application Providers
  • The Application Model for Cloud Computing Is
    Evolving
  • Advantages to Being Close to the Metal versus
    Advantages to Higher Level
  • Just Because the Infrastructure Is Scalable
    Doesnt Mean the App Is!!
  • There Are Many Obstacles to Ubiquitous Cloud
    Computing
  • The Economic Forces Will Dominate the Obstacles
  • Theres Too Much to Gain It Will Grow!

40
Roger BargaArchitect, Cloud Computing Futures
GroupMicrosoft Research (MSR)
Cloud Computing for Research
Q A
Write a Comment
User Comments (0)
About PowerShow.com