Towards a Smart Workload Generator on RAMP - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Towards a Smart Workload Generator on RAMP

Description:

SPECweb: Caters to web servers http requests only, hard to configure, only ... Specweb caters to web servers http requests only. Doesn't handle dynamic content ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 33
Provided by: arch1
Category:

less

Transcript and Presenter's Notes

Title: Towards a Smart Workload Generator on RAMP


1
Towards a Smart Workload Generator on RAMP
  • Archana Ganapathi, David Patterson, Anthony
    Joseph
  • archanag, pattrsn, adj _at_ cs.berkeley.edu

2
RADLAB goals
Top 50 Web Domains
  • Help Single Operator/ Developer sites become
    Google-scale
  • Eliminate SW/HW obstacles for scaling
  • Tools to identify/fix problems

Source Washington Post 3/31/2006
Use RAMP to move websites from right to left
3
RADLAB Goals (2)
YouTube.com 2006 Daily Traffic Ranking
  • Challenges
  • Scalability
  • Configurability
  • Single person operation
  • Cost-effectiveness
  • Reproducibility and Observability

Source Alexa.com
4
RAMP for Time Travel
  • UCB goal for RAMP Data center in a box
  • Google/Amazon.com O(10,000) processors in data
    center
  • Anticipate load of 3-6 months in future for fast
    moving company
  • Smart Workload Generation smart gateware
    design informed by SML analysis of workload
  • Time Dilation on Emulated Machines Try
    software/ config changes and observe behavior
    prior to deployment
  • Targeted Component-specific Load Generation
    Stress-test components in the critical path to
    determine performance limitations

RAMP as emulation environment workload generator
5
Building Blocks
User scripts
Real System
Workload Generator
ML to determine interactions
  • Workload generation engine
  • Parser to extract data from server response
  • Workload description language to specify
    primitives to compile onto an FPGA
  • Machine Learning techniques to discover web
    interactions

6
Naïve Workload Generator (CS252 class project by
Lorenzo Orecchia and Madhur Tulsiani)
  • Generate the data-set using analytical models
  • server file size distribution
  • request size distribution
  • relative file popularity
  • Derive URL connectivity graph, load in memory
  • Circuit logic to perform random walk on graph.
  • We achieve our goal of scalability
  • Graph Size 1048576
  • Memory Usage 21 MB
  • Total data size 21083 MB

7
Scalability RAMP DRAM limits
  • Given 2GB DRAMs, 4 DRAM banks per FPGA,
  • 100 MHz clock cycle 10 ns.
  • Per cycle 21 bits per walk (21Mbits for 1M
    walks)
  • Assume 10 clock cycles per access gt 100 ns
  • 10 million accesses per second per bank per walk
  • Given four DRAM banks 40M accesses per sec
  • Compare Google receives 2000 requests per second

8
Scalability RAMP Ethernet limits
  • Given 20 10Gbit/sec Ethernet ports per board
  • Assume can generate 100 million accesses/sec
  • Naïve Response-ignorant workload generation
  • 4 bytes for URL check sum header 32 bytes
  • gt 3 GB for 100 million accesses (per second)
  • Smart Response-driven workload generation
  • Google 23KB Flickr 45KB CNN 100KB
  • Assume up to 200KB response
  • can receive 50K responses per second per port
  • 1M responses handled by 20 10G ports.

9
Ethernet limits cont.
  • About 1000X Google average with Smart
    (response-driven) generation
  • Mixed RAMP emulation/workload generation even
    higher BW inside box
  • Have plenty of headroom to tradeoff speed for
    greater accuracy of workload

10
Some Open Questions
  • Limits on types of workloads?
  • Workload trace sources?
  • Web services, existing traffic generators, ?
  • Role of Response-Ignorant trace generation?
  • UDP, error/congestion-free TCP, ?
  • Required level of fidelity for Response-Driven
    trace generation?
  • How much of TCP FSM to model?

11
Question/Feedback?
12
Backup Slides
13
State of the art
  • Hardware vs. software based
  • Hammer, Optixia vs SURGE, SLAMd
  • Tunability vs Automation
  • SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
  • Realistic vs Synthetic
  • SURGE, SLAMd, Harpoon vs TPC-W
  • Generic vs App-Specific
  • SLAMd, Harpoon vs TPC-W, Hammer
  • Open-loop vs Closed-loop
  • Partly-open loop is most realistic for web
    services

14
Workload Generator Next Steps
  • Handle server responses
  • Include server response states in logic
  • Parse server response to identify current state
  • Include think-time distribution
  • User think-time server response time
  • What happens when things go wrong?
  • Improve temporal/spatial locality
  • Prefetch other URLs a page is linked to
  • Take advantage of Zipfian popularity distribution

15
Sketch of Random Walk Module
MEMORY
16
Data-set parameters
17
Circuit properties
  • Device Virtex-E
  • Maximum delay
  • Walk Module 3.99 ns
  • Memory Module 5.82 ns
  • Estimated frequency (1000/9.81)
  • 101.93 MHz
  • Number of LUTs per walk 593
  • Number of slices per walk 307

18
(No Transcript)
19
(No Transcript)
20
A Framework for Workload Generation
  • Archana Ganapathi
  • Armando Fox, Dave Patterson

21
A Case for Workload Generation
  • No uniform methodology for workload generation
  • Need tools to predict scaling issues during
    develop/deploy phase
  • Obstacles for industry to share data

22
State of the art Workload Generators
  • SURGE Scalable URL Ref generator, captures file
    sz/req sz distrib, relative popularity, think
    times
  • SPECweb Caters to web servers http requests
    only, hard to configure, only captures 200 OK
    response, distribution is different from traces
    especially at high numbers, doesnt handle
    dynamic content
  • TPC-W Online bookstore, Webserving/browsing/shopp
    ing cart etc, high set up overhead
  • SLAMD Java-based, tests network-based apps
    (specifically LDAP directory servers), Also used
    for Web servers and Web-based apps, relational
    databases, and mail servers
  • Harpoon A Flow-level Traffic Generator, mimics
    internet traffic, generate representative
    background traffic for app/protocol testing
  • Optixia hardware-based, IP Performance Test
    platform, create and transmit any type of Layer
    2-3 traffic patterns at up to line rate over a
    network
  • Hammer hardware-based VoIP and PSTN telephone
    call generation

23
State of the art Workload Generators - Comparison
  • Hardware vs. software based
  • Hammer, Optixia vs SURGE, SLAMd
  • Tunability vs Automation
  • SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
  • Realistic vs Synthetic
  • SURGE, SLAMd, Harpoon vs TPC-W
  • Generic vs App-Specific
  • SLAMd, Harpoon vs TPC-W, Hammer
  • Open-loop vs Closed-loop
  • Partly-open loop is most realistic for web
    services

24
Goals for our Framework
  • Generic to accommodate existing workload
    generators
  • Re-configurable to allow black-box testing and
    targeted testing
  • Address privacy concerns

25
Block Diagram
Quantity
Quality
Request Type
Response Awareness
  • granularities
  • num users/req
  • distribution
  • burstiness
  • other metrics
  • per user/request
  • math models
  • traces
  • std protocols
  • http/ftp..
  • examples
  • traces
  • hard-coded
  • msg header
  • sender
  • type
  • msg body
  • objects
  • pattern match

App-level
Code-gen
RAMP
Target System
Source Code
Workload Generator
  • modules
  • branches
  • computation units
  • time to generate request
  • time to parse response

Coverage Statistics
Performance Metrics
26
Understanding Workload
  • Workload has static and dynamic features
  • Static features - Properties inherent in system
  • File size
  • Response type
  • Dynamic features Properties based on user
    behavior/system runtime effect
  • Response time/inter-arrival rate
  • Request type distribution

27
Formally speaking
  • Workload set of equivalence classes
  • Wstatic Wdynamic
  • Equivalence class transactions,
    distributions etc.
  • Wstatic cluster centroidi, cluster radiusi
    where 1 i N
  • N num equivalence classes
  • Metrics set of feature vectors
  • Cluster set of related metrics given
    pair-wise distance and clustering algorithm
  • Wdynamic NxN transition probability matrix
  • Dependent on real traces and Wstatic

28
Putting it all together
Traces
Wstatic
Wdynamic
Metrics
clustering
parse traces and scale Wstatic
Firewall
System Under Test
Workload Generator
Workload Model (open, closed, ajax etc.)
29
Validation
  • Create models using real traces
  • Scale up workload by generating synthetic model
  • Compare behavior of system under trace-based and
    synthetic workloads

30
Addressing Privacy
  • Industry can generate workload model and provide
    us with a digest of info
  • Anonymized clusters
  • Number and distribution of equivalence classes
  • Dont need to know what each equivalence class is
    (and types of transactions)
  • No user traces are revealed

31
Other things to consider
  • How to generically characterize resource demands
    on workload
  • Manifestation of workload on system
  • Normal operation
  • Saturation point
  • Temporal variation (time of day/week)

32
Comments/Feedback
33
Comparison of Generators
Write a Comment
User Comments (0)
About PowerShow.com