Towards a Smart Workload Generator on RAMP - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Towards a Smart Workload Generator on RAMP

Description:

SPECweb: Caters to web servers http requests only, hard to configure, only ... Specweb caters to web servers http requests only. Doesn't handle dynamic content ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 33

Provided by: arch1

Category:

more less

Transcript and Presenter's Notes

Title: Towards a Smart Workload Generator on RAMP

1
Towards a Smart Workload Generator on RAMP

Archana Ganapathi, David Patterson, Anthony
Joseph
archanag, pattrsn, adj _at_ cs.berkeley.edu

2
RADLAB goals
Top 50 Web Domains

Help Single Operator/ Developer sites become
Google-scale
Eliminate SW/HW obstacles for scaling
Tools to identify/fix problems

Source Washington Post 3/31/2006
Use RAMP to move websites from right to left
3
RADLAB Goals (2)
YouTube.com 2006 Daily Traffic Ranking

Challenges
Scalability
Configurability
Single person operation
Cost-effectiveness
Reproducibility and Observability

Source Alexa.com
4
RAMP for Time Travel

UCB goal for RAMP Data center in a box
Google/Amazon.com O(10,000) processors in data
center
Anticipate load of 3-6 months in future for fast
moving company
Smart Workload Generation smart gateware
design informed by SML analysis of workload
Time Dilation on Emulated Machines Try
software/ config changes and observe behavior
prior to deployment
Targeted Component-specific Load Generation
Stress-test components in the critical path to
determine performance limitations

RAMP as emulation environment workload generator
5
Building Blocks
User scripts
Real System
Workload Generator
ML to determine interactions

Workload generation engine
Parser to extract data from server response
Workload description language to specify
primitives to compile onto an FPGA
Machine Learning techniques to discover web
interactions

6
Naïve Workload Generator (CS252 class project by
Lorenzo Orecchia and Madhur Tulsiani)

Generate the data-set using analytical models
server file size distribution
request size distribution
relative file popularity
Derive URL connectivity graph, load in memory
Circuit logic to perform random walk on graph.
We achieve our goal of scalability
Graph Size 1048576
Memory Usage 21 MB
Total data size 21083 MB

7
Scalability RAMP DRAM limits

Given 2GB DRAMs, 4 DRAM banks per FPGA,
100 MHz clock cycle 10 ns.
Per cycle 21 bits per walk (21Mbits for 1M
walks)
Assume 10 clock cycles per access gt 100 ns
10 million accesses per second per bank per walk
Given four DRAM banks 40M accesses per sec
Compare Google receives 2000 requests per second

8
Scalability RAMP Ethernet limits

Given 20 10Gbit/sec Ethernet ports per board
Assume can generate 100 million accesses/sec
Naïve Response-ignorant workload generation
4 bytes for URL check sum header 32 bytes
gt 3 GB for 100 million accesses (per second)
Smart Response-driven workload generation
Google 23KB Flickr 45KB CNN 100KB
Assume up to 200KB response
can receive 50K responses per second per port
1M responses handled by 20 10G ports.

9
Ethernet limits cont.

About 1000X Google average with Smart
(response-driven) generation
Mixed RAMP emulation/workload generation even
higher BW inside box
Have plenty of headroom to tradeoff speed for
greater accuracy of workload

10
Some Open Questions

Limits on types of workloads?
Workload trace sources?
Web services, existing traffic generators, ?
Role of Response-Ignorant trace generation?
UDP, error/congestion-free TCP, ?
Required level of fidelity for Response-Driven
trace generation?
How much of TCP FSM to model?

11
Question/Feedback?
12
Backup Slides
13
State of the art

Hardware vs. software based
Hammer, Optixia vs SURGE, SLAMd
Tunability vs Automation
SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
Realistic vs Synthetic
SURGE, SLAMd, Harpoon vs TPC-W
Generic vs App-Specific
SLAMd, Harpoon vs TPC-W, Hammer
Open-loop vs Closed-loop
Partly-open loop is most realistic for web
services

14
Workload Generator Next Steps

Handle server responses
Include server response states in logic
Parse server response to identify current state
Include think-time distribution
User think-time server response time
What happens when things go wrong?
Improve temporal/spatial locality
Prefetch other URLs a page is linked to
Take advantage of Zipfian popularity distribution

15
Sketch of Random Walk Module
MEMORY
16
Data-set parameters
17
Circuit properties

Device Virtex-E
Maximum delay
Walk Module 3.99 ns
Memory Module 5.82 ns
Estimated frequency (1000/9.81)
101.93 MHz
Number of LUTs per walk 593
Number of slices per walk 307

18
(No Transcript)
19
(No Transcript)
20
A Framework for Workload Generation

Archana Ganapathi
Armando Fox, Dave Patterson

21
A Case for Workload Generation

No uniform methodology for workload generation
Need tools to predict scaling issues during
develop/deploy phase
Obstacles for industry to share data

22
State of the art Workload Generators

SURGE Scalable URL Ref generator, captures file
sz/req sz distrib, relative popularity, think
times
SPECweb Caters to web servers http requests
only, hard to configure, only captures 200 OK
response, distribution is different from traces
especially at high numbers, doesnt handle
dynamic content
TPC-W Online bookstore, Webserving/browsing/shopp
ing cart etc, high set up overhead
SLAMD Java-based, tests network-based apps
(specifically LDAP directory servers), Also used
for Web servers and Web-based apps, relational
databases, and mail servers
Harpoon A Flow-level Traffic Generator, mimics
internet traffic, generate representative
background traffic for app/protocol testing
Optixia hardware-based, IP Performance Test
platform, create and transmit any type of Layer
2-3 traffic patterns at up to line rate over a
network
Hammer hardware-based VoIP and PSTN telephone
call generation

23
State of the art Workload Generators - Comparison

Hardware vs. software based
Hammer, Optixia vs SURGE, SLAMd
Tunability vs Automation
SPECweb, TPC-W, Harpoon vs Optixia, SLAMd
Realistic vs Synthetic
SURGE, SLAMd, Harpoon vs TPC-W
Generic vs App-Specific
SLAMd, Harpoon vs TPC-W, Hammer
Open-loop vs Closed-loop
Partly-open loop is most realistic for web
services

24
Goals for our Framework

Generic to accommodate existing workload
generators
Re-configurable to allow black-box testing and
targeted testing
Address privacy concerns

25
Block Diagram
Quantity
Quality
Request Type
Response Awareness

granularities
num users/req
distribution
burstiness
other metrics

per user/request
math models
traces

std protocols
http/ftp..
examples
traces
hard-coded

msg header
sender
type
msg body
objects
pattern match

App-level
Code-gen
RAMP
Target System
Source Code
Workload Generator

modules
branches
computation units

time to generate request
time to parse response

Coverage Statistics
Performance Metrics
26
Understanding Workload

Workload has static and dynamic features
Static features - Properties inherent in system
File size
Response type
Dynamic features Properties based on user
behavior/system runtime effect
Response time/inter-arrival rate
Request type distribution

27
Formally speaking

Workload set of equivalence classes
Wstatic Wdynamic
Equivalence class transactions,
distributions etc.
Wstatic cluster centroidi, cluster radiusi
where 1 i N
N num equivalence classes
Metrics set of feature vectors
Cluster set of related metrics given
pair-wise distance and clustering algorithm
Wdynamic NxN transition probability matrix
Dependent on real traces and Wstatic

28
Putting it all together
Traces
Wstatic
Wdynamic
Metrics
clustering
parse traces and scale Wstatic
Firewall
System Under Test
Workload Generator
Workload Model (open, closed, ajax etc.)
29
Validation