Title: Towards a Smart Workload Generator on RAMP
1Towards a Smart Workload Generator on RAMP
- Archana Ganapathi, David Patterson, Anthony
Joseph - archanag, pattrsn, adj _at_ cs.berkeley.edu
2RADLAB goals
Top 50 Web Domains
- Help Single Operator/ Developer sites become
Google-scale - Eliminate SW/HW obstacles for scaling
- Tools to identify/fix problems
Source Washington Post 3/31/2006
Use RAMP to move websites from right to left
3RADLAB Goals (2)
YouTube.com 2006 Daily Traffic Ranking
- Challenges
- Scalability
- Configurability
- Single person operation
- Cost-effectiveness
- Reproducibility and Observability
Source Alexa.com
4RAMP for Time Travel
- UCB goal for RAMP Data center in a box
- Google/Amazon.com O(10,000) processors in data
center - Anticipate load of 3-6 months in future for fast
moving company - Smart Workload Generation smart gateware
design informed by SML analysis of workload - Time Dilation on Emulated Machines Try
software/ config changes and observe behavior
prior to deployment - Targeted Component-specific Load Generation
Stress-test components in the critical path to
determine performance limitations
RAMP as emulation environment workload generator
5Building Blocks
User scripts
Real System
Workload Generator
ML to determine interactions
- Workload generation engine
- Parser to extract data from server response
- Workload description language to specify
primitives to compile onto an FPGA - Machine Learning techniques to discover web
interactions
6Naïve Workload Generator (CS252 class project by
Lorenzo Orecchia and Madhur Tulsiani)
- Generate the data-set using analytical models
- server file size distribution
- request size distribution
- relative file popularity
- Derive URL connectivity graph, load in memory
- Circuit logic to perform random walk on graph.
- We achieve our goal of scalability
- Graph Size 1048576
- Memory Usage 21 MB
- Total data size 21083 MB
7Scalability RAMP DRAM limits
- Given 2GB DRAMs, 4 DRAM banks per FPGA,
- 100 MHz clock cycle 10 ns.
- Per cycle 21 bits per walk (21Mbits for 1M
walks) - Assume 10 clock cycles per access gt 100 ns
- 10 million accesses per second per bank per walk
- Given four DRAM banks 40M accesses per sec
- Compare Google receives 2000 requests per second
8Scalability RAMP Ethernet limits
- Given 20 10Gbit/sec Ethernet ports per board
- Assume can generate 100 million accesses/sec
- Naïve Response-ignorant workload generation
- 4 bytes for URL check sum header 32 bytes
- gt 3 GB for 100 million accesses (per second)
- Smart Response-driven workload generation
- Google 23KB Flickr 45KB CNN 100KB
- Assume up to 200KB response
- can receive 50K responses per second per port
- 1M responses handled by 20 10G ports.
9Ethernet limits cont.
- About 1000X Google average with Smart
(response-driven) generation - Mixed RAMP emulation/workload generation even
higher BW inside box - Have plenty of headroom to tradeoff speed for
greater accuracy of workload
10Some Open Questions
- Limits on types of workloads?
- Workload trace sources?
- Web services, existing traffic generators, ?
- Role of Response-Ignorant trace generation?
- UDP, error/congestion-free TCP, ?
- Required level of fidelity for Response-Driven
trace generation? - How much of TCP FSM to model?
11Question/Feedback?
12Backup Slides
13State of the art
- Hardware vs. software based
- Hammer, Optixia vs SURGE, SLAMd
- Tunability vs Automation
- SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
- Realistic vs Synthetic
- SURGE, SLAMd, Harpoon vs TPC-W
- Generic vs App-Specific
- SLAMd, Harpoon vs TPC-W, Hammer
- Open-loop vs Closed-loop
- Partly-open loop is most realistic for web
services
14Workload Generator Next Steps
- Handle server responses
- Include server response states in logic
- Parse server response to identify current state
- Include think-time distribution
- User think-time server response time
- What happens when things go wrong?
- Improve temporal/spatial locality
- Prefetch other URLs a page is linked to
- Take advantage of Zipfian popularity distribution
15Sketch of Random Walk Module
MEMORY
16Data-set parameters
17Circuit properties
- Device Virtex-E
- Maximum delay
- Walk Module 3.99 ns
- Memory Module 5.82 ns
- Estimated frequency (1000/9.81)
- 101.93 MHz
- Number of LUTs per walk 593
- Number of slices per walk 307
18(No Transcript)
19(No Transcript)
20A Framework for Workload Generation
- Archana Ganapathi
- Armando Fox, Dave Patterson
21A Case for Workload Generation
- No uniform methodology for workload generation
- Need tools to predict scaling issues during
develop/deploy phase - Obstacles for industry to share data
22State of the art Workload Generators
- SURGE Scalable URL Ref generator, captures file
sz/req sz distrib, relative popularity, think
times - SPECweb Caters to web servers http requests
only, hard to configure, only captures 200 OK
response, distribution is different from traces
especially at high numbers, doesnt handle
dynamic content - TPC-W Online bookstore, Webserving/browsing/shopp
ing cart etc, high set up overhead - SLAMD Java-based, tests network-based apps
(specifically LDAP directory servers), Also used
for Web servers and Web-based apps, relational
databases, and mail servers - Harpoon A Flow-level Traffic Generator, mimics
internet traffic, generate representative
background traffic for app/protocol testing - Optixia hardware-based, IP Performance Test
platform, create and transmit any type of Layer
2-3 traffic patterns at up to line rate over a
network - Hammer hardware-based VoIP and PSTN telephone
call generation
23State of the art Workload Generators - Comparison
- Hardware vs. software based
- Hammer, Optixia vs SURGE, SLAMd
- Tunability vs Automation
- SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
- Realistic vs Synthetic
- SURGE, SLAMd, Harpoon vs TPC-W
- Generic vs App-Specific
- SLAMd, Harpoon vs TPC-W, Hammer
- Open-loop vs Closed-loop
- Partly-open loop is most realistic for web
services
24Goals for our Framework
- Generic to accommodate existing workload
generators - Re-configurable to allow black-box testing and
targeted testing - Address privacy concerns
25Block Diagram
Quantity
Quality
Request Type
Response Awareness
- granularities
- num users/req
- distribution
- burstiness
- other metrics
- per user/request
- math models
- traces
- std protocols
- http/ftp..
- examples
- traces
- hard-coded
- msg header
- sender
- type
- msg body
- objects
- pattern match
App-level
Code-gen
RAMP
Target System
Source Code
Workload Generator
- modules
- branches
- computation units
- time to generate request
- time to parse response
Coverage Statistics
Performance Metrics
26Understanding Workload
- Workload has static and dynamic features
- Static features - Properties inherent in system
- File size
- Response type
- Dynamic features Properties based on user
behavior/system runtime effect - Response time/inter-arrival rate
- Request type distribution
27Formally speaking
- Workload set of equivalence classes
- Wstatic Wdynamic
- Equivalence class transactions,
distributions etc. - Wstatic cluster centroidi, cluster radiusi
where 1 i N - N num equivalence classes
- Metrics set of feature vectors
- Cluster set of related metrics given
pair-wise distance and clustering algorithm - Wdynamic NxN transition probability matrix
- Dependent on real traces and Wstatic
28Putting it all together
Traces
Wstatic
Wdynamic
Metrics
clustering
parse traces and scale Wstatic
Firewall
System Under Test
Workload Generator
Workload Model (open, closed, ajax etc.)
29Validation
- Create models using real traces
- Scale up workload by generating synthetic model
- Compare behavior of system under trace-based and
synthetic workloads
30Addressing Privacy
- Industry can generate workload model and provide
us with a digest of info - Anonymized clusters
- Number and distribution of equivalence classes
- Dont need to know what each equivalence class is
(and types of transactions) - No user traces are revealed
31Other things to consider
- How to generically characterize resource demands
on workload - Manifestation of workload on system
- Normal operation
- Saturation point
- Temporal variation (time of day/week)
32Comments/Feedback
33Comparison of Generators