Berkeley RAD Lab: Robust, Adaptive, Distributed Systems - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Berkeley RAD Lab: Robust, Adaptive, Distributed Systems

Description:

Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica ... Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson. ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 18
Provided by: georgep6
Category:

less

Transcript and Presenter's Notes

Title: Berkeley RAD Lab: Robust, Adaptive, Distributed Systems


1
Berkeley RAD LabRobust, Adaptive, Distributed
Systems
  • Armando Fox, Randy Katz, Michael Jordan, Dave
    Patterson, Scott Shenker, Ion Stoica
  • November 2005

2
RAD Lab
  • The 5-year Vision
  • Single person can go from vision to a
    next-generation IT service (the Fortune 1
    million)
  • E.g., over long holiday weekend in 1995, Pierre
    Omidyar created Ebay v1.0
  • The Vehicle
  • Interdisciplinary Center creates core technical
    competency to demo 10X to 100X
  • Researchers are leaders in machine learning,
    networking, and systems
  • Industrial Participants leading companies in HW,
    systems SW, and online services
  • Called RAD Lab for Reliable, Adaptable,
    Distributed systems

3
RAD Lab
Cap Dado (The section of a pedestal between
cap and base) Base
  • The Science
  • Both shorter-term and longer-term solutions
  • Develop using primitives ? functions (MapReduce),
    services (Craigslist)
  • Assess/debug using deterministic replay and
    finding new metrics
  • Deploy using Internet-in-a-Box via FPGAs under
    failure/slowdown workloads
  • Operate using Statistical Learning
    Theory-friendly, Control Theory-friendly software
    architectures and visualization tools
  • Added Value to Industrial Participants
  • Working with leading people and companies from
    different industries on long-range,
    pre-competitive technology
  • Training of dozens of future leaders of IT in
    multiple disciplines, and their recruitment by
    industrial participants
  • Working with researchers with successful track
    record of rapid transfer of new technology

4
Steps vs. Process
  • Steps Traditional, Static Handoff Model, N groups
  • Process SupportDADO Evolution, 1 group

5
DADO - Develop
  • Create abstractions, primitives, toolkit for
    large scale systems that make it easy to
    invent/deploy functions (e.g, MapReduce)
  • For example, Distributed Hash Tables (OpenDHT)
  • Already setting the trend for IETF standards

6
DADO - Assess
  • We improve what we can measure
  • Inspect box visibility into networks, usually
    data poor
  • Servers data rich data often discarded
  • Statistical and Machine Learning (SML) to the
    rescue. It works well when
  • You have lots of raw data
  • You have reason to believe the raw data is
    related to some high-level effect youre
    interested in
  • You dont have a model of what that relationship
    is
  • Note SML advances ? fast analysis

7
DADO - Deploy
  • Re-engineer RAMP to act like 1000 node
    distributed system under realistic failure and
    slowdown workloads
  • RAMP emulates data center wide area systems as
    well as MPP
  • Collect and apply failure data from real world
  • RAMP vs. Clusters Larger scale, easier to
    develop/debug, flexible HW/SW configuration,
    inexpensive so no need to share
  • Explore via repeatable experiments as vary
    parameters, configurations vs. observations on
    single (aging) cluster that is often idiosyncratic

8
DADO - Operate
  • Idea when site misbehaves, users notice, and
    change their behavior use as failure detector
  • Approach combine visualization with Statistical
    and Machine Learning analysis so operator see
    anomalies too
  • Experiment does distribution of hits to various
    pages match the historical distribution?
  • Each minute, compare hit counts of top N pages to
    hit counts over last 6 hours using Bayesian
    networks and ?2 test, real Ebates data

To learn more, see Combining Visualization and
Statistical Analysis to Improve Operator
Confidence and Efficiency for Failure Detection
and Localization, In Proc. 2nd IEEE Intl Conf.
on Autonomic Computing, June 2005, by Peter
Bodik, Greg Friedman, Lukas Biewald, Helen Levine
(Ebates,com), George Candea, Kayur Patel, Gilman
Tolle, Jon Hui, Armando Fox, Michael I. Jordan,
David Patterson.
9
Account page problem
anomalyscore
Novel Visualization
I see and understand Winning operator trust
10
Founding the RADLab Start 12/1
  • Looking for 3 to 5 founding companies to fund 5
    years _at_ cost of 0.5M / year
  • 25 grad students 15 undergrads 6 faculty 2
    staff
  • Founding companies Google, Microsoft, Sun
    Microsystems
  • RADS Consortium model
  • Preference to founding partner technology in
    prototypes
  • Designate employees to act as consultants
  • Head start for participants on research results
  • Putting IP in Public Domain so partners use not
    sued
  • Press release of founding RAD Lab partners
    December 1?
  • Mid project review after 3 years by founding
    partners

11
RAD Lab Opportunity New Research Model
  • Chance to Partner with the Top University in
    Computer Systems on the Next Great Thing
  • National Academy of Engineering mentions Berkeley
    in 7 of 19 1B industries that came from IT
    research
  • NAE mentions Berkeley 7 times, Stanford 5 Times,
    MIT 5, CMU 3 Timesharing (SDS 940), Client-Server
    Computing (BSD Unix), Graphics, Entertainment,
    Internet, LANs, Workstations, GUI, VLSI Design
    (Spice) ECAD 5B?/yr , RISC 10B?/yr ,
    Relational DB (Ingres/Postgres) RDB 15B?/yr,
    Parallel DB, Data Mining, Parallel Computing,
    RAID 15B?/yr , Portable Communication (BWRC),
    WWW, Speech Recognition, Broadband
  • Berkeley one of the top suppliers of systems
    students to industry and academia
  • US News World Report ranking of CS Systems
    universities 1 Berkeley, 2 CMU, 2 MIT, 4 Stanford

12
RAD Lab Interdisciplinary Center for Reliable,
Adaptive, Distributed Systems
  • Working with different industries on long-range,
    pre-competitive technology
  • Training of dozens of future leaders of IT, plus
    their recruitment
  • Working with researchers with track records of
    successful technology transfer

13
Backup Slides
14
References
To learn more, see
  • Combining Visualization and Statistical Analysis
    to Improve Operator Confidence and Efficiency for
    Failure Detection and Localization, In Proc. 2nd
    IEEE Intl Conf. on Autonomic Computing, June
    2005, by Peter Bodik, Greg Friedman, Lukas
    Biewald, Helen Levine (Ebates,com), George
    Candea, Kayur Patel, Gilman Tolle, Jon Hui,
    Armando Fox, Michael I. Jordan, David Patterson.
  • Microreboot -- A Technique for Cheap Recovery,
    George Candea, Shinichi Kawamoto, Yuichi Fujiki,
    Greg Friedman, and Armando Fox. Proc. 6th Symp.
    on Operating Systems Design and Implementation
    (OSDI), San Francisco, CA, Dec. 2004.
  • Path-Based Failure and Evolution Management,
    Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim
    Lloyd, Dave Patterson, Armando Fox, and Eric
    Brewer In Proc. 1st USENIX/ACM Symp. on Networked
    Systems Design and Implementation (NSDI '04), San
    Francisco, CA, March 2004.
  • "Scalable Statistical Bug Isolation," Ben Liblit,
    M. Naik, Alice. X. Zheng, Alex Aiken, and Micheal
    I. Jordan, PLDI, 2005.

15
Sustaining Innovation/Training Engine in 21st
Century
  • Replicate research centers based primarily on
    industrial funding to expand IT market and to
    train next generation of IT leaders
  • Berkeley Wireless Research Center (BWRC) 50
    grad students, 30 undergrads _at_ 5M per year
  • Stanford Network Research Center (SNRC) 50 Grad
    students _at_ 5M per year
  • MIT Tparty 4M per year (100 from Quanta)
  • Industry largely funds
  • N companies, where N is 5?
  • Exciting, long term technical vision
  • Demonstrated by prototype(s)

16
State of Research Funding Today
  • Most industry research shorter term
  • DARPA exiting long-term (exp.) IT research
  • 03-05 BAAs IPTO 9 AI, 2 classified, 1 SW
    radio, 1 sensor net, 1 reliability, all have 12
    to 18 month go/no go milestones
  • Academic led funding reduced 50 (so far) 2001 to
    2004
  • Faculty consultants in consortia led by defense
    contractor, get grants support 1-2 students (
    NSF funding level)
  • NSF swamped with proposals, conservative
  • 2000 to 6500 proposals in 5 years
  • IT has lowest acceptance rate at NSF (between 8
    to 16)
  • Ambitious proposal is a negative review
  • Even if get NSF funding, proposal reduced to
    stretch NSF e.g., got 3 x 1/3 faculty, 6 grad
    students, 0 staff, 3 years
  • (To learn more, see www.cra.org/research)

17
RAD Lab Timeline
  • 2005 Launch RAD Lab 12/1
  • 2006 Collect workloads, Internet in a Box
  • 2007 SLT/CT distributed architectures, Iboxes,
    annotative layer, class testing
  • 2008 Development toolkit 1.0, tuple space, class
    testing Mid Project Review
  • 2009 RAD Lab software suite 1.0, class testing
  • 2010 End of Project Party
Write a Comment
User Comments (0)
About PowerShow.com