Title: Scalability Tools: Automated Testing (30 minutes)
1Scalability Tools Automated Testing(30 minutes)
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications Gotchas
- ? engineering, QA, operations
- ? production management
- Summary Questions
2Review controlled tests actionable results
useful for many purposes
(1)
Repeatable tests, using N synchronized game
clients
Test Game
Programmer
Development Director
Executive
3Handout notes automated testing is a strong tool
for large-scale games!
- Pushbutton, large-scale, repeatable tests
- Benefit
- Accurate, repeatable measurable tests during
development and operations - Stable software, faster, measurable progress
- Base key decisions on fact, not opinion
- Augment your teams ability to do their jobs,
find problems faster - Measure / change / measure repeat
- Increased developer efficiency is key
- Get the game out the door faster, higher
stability less pain
4Handout notes more benefits of automated testing
- Comfort and confidence level
- Managers/Producers can easily judge how
development is progressing - Just like bug count reports, test reports
indicate overall quality of current state of the
game - Frequent, repeatable tests show progress
backsliding - Investing developers in the test process helps
prevent QA vs. Development shouting matches - Smart developers like numbers and metrics just as
much as producers do - Making your goals you will ship cheaper,
better, sooner - Cheaper even though initial costs may be
higher, issues get exposed when its cheaper to
fix them (and developer efficiency increases) - Better robust code
- Sooner its ok to ship now is based on real
data, not supposition
5Automated testing accelerates large-scale game
development helps predictability
Better game
earlier
Ship Date
Complete
Oops
autoTest
Time
Time
Target Launch
Project Start
6Measurable targets projected trends give you
actionable progress metrics, early enough to react
Target
Oops
Any test (e.g. clients)
Time
Any Time (e.g. Alpha)
7Success stories
- Many game teams work with automated testing
- EA, Microsoft, any MMO,
- Automated testing has many highly successful
applications outside of game development - Caveat there are many ways to fail
8How to succeed
- Plan for testing early
- Non-trivial system
- Architectural implications
- Fast, cheap test coverage is a major change in
production, be willing to adapt your processes - Make sure the entire team is on board
- Deeper integration leads to greater value
- Kearneyism make it easier to use than not to
use
9Automated testing components
Any Game
Startup Control
Test Manager Test Selection/Setup Control N
Clients RT probes
10Input systems for automated testing
scripted
algorithmic
recorders
Game code
Multiple test applications are required, but each
input type differs in value per application.
Scripting gives the best coverage.
11Hierarchical automated testing
subsystem
unit
system
Multiple levels of testing gives you
- Faster ways to work with each level of code
- Incremental testing avoids noise speeds defect
isolation
12Handout notes Input systems for automated testing
- Multiple forms of input sources
- Multiple sets of test types requirements
- Make sure the input technology you pick matches
the test types you need - Cost of systems, types of testing required,
support, cross-team needs, - A single, data-driven autoTest system is the
usually the best option
13Handout notes Input sources (algorithmic)
- Powerful low development cost
- Exploits game semantics for test coverage
- Highly useful for some test types, but limited
verification - E.g. for each ltavatarTypegt CreateNewAvatar
- for each ltobjectCategorygt
- BuyAndPlaceAllObjects ltcurrentCategorygt
- for each ltObjectgt
- UseAllActions ltcurrentAvatargt, ltcurrentObjectgt
- Broad, shallow test of all object-based content
- Combine with automated errorManagers to increase
verification, and/or currentObject.selfTest()
14Handout notes Input (recorders)
- Internal event pump / external UI actions
- Both are brittle to maintain
- Neither can effectively support load or
multi-client synchronization, and are limited for
regression testing - Best use capturing defects that are hard to
reproduce, effective in overnight random testing
of builds some play testing - Semantic recorders are much less brittle and more
useful
15Input (Scripted Test Clients)
Pseudo-code script of users play the game, and
what the game should do in response
Command steps
createAvatar sam enterLevel 99 buyObject
knife attack opponent
Validation steps
checkAvatar sam exists checkLevel 99
loaded checkInventory knife checkDamage
opponent
16Handout notes scripted test clients
- Scripts are emulated play sessions just like
somebody plays the game - Command steps what the player does to the game
- Validation steps what the game should do in
response - Scripted clients are flexible powerful
- Use for many different test types
- Quick easy to write tests
- Easy for non-engineers to understand create
17Handout notes scripted test clients
- Scriptable test clients
- Lightweight subset of the shipping client
- Instrumented spits out lots of useful
information - Repeatable
- Embedded automated debugging support helps you
understand the test results - Log both server and client output (common
format), w/timestamps! - Automated metrics collection aggregation
- High level at a glance reports with detail
drill down - Build in support for hung clients triaging
failures
18Handout notes scripted test client
- Support costs one (data driven) client better
than N test systems - Tailorable validation output is a very powerful
construct - Each test script contains required validation
steps (flexible, tunable, ) - Minimize state to regress against fewer false
positives - Presentation layer tip build a spreadsheet of
key word/actions used by your manual testers,
automate the most common/expensive
19Scripted Players Implementation
Script Engine
Game GUI
Commands
Presentation Layer
Game Logic
20Test-specific input output via a data-driven
test client gives maximum flexibility
Regression
Load
Reusable Scripts Data
Input API
Test Client
Output API
Key Game States
Pass/Fail Responsiveness
Script-Specific Logs Metrics
21A Presentation Layer is often unique to a game
- Some automation scripts should read just like QA
test scripts for your game - TSO examples
- routeAvatar, useObject
- buyLot, enterLot
- socialInteraction (makeFriends, chat, )
NullView Client
22Handout notes Scriptable tailorable for many
applications engineering, QA and management
- Unit testing 1 feature 1 script
- Load testing Representative play session, times
1,000s - Make sure your servers work, before the players
do - Integration test code changes for catastrophic
failures - Build stability quickly find problems and verify
the fix - Content testing exhaustive analysis of game play
to help tuning and ensure all assets are
correctly hooked up and explore edge cases - Multi-player testing engineers and QA can test
multi-player game code without requiring multiple
manual testers - Performance compatibility testing repeatable
tests across a broad range of hardware gives you
a precise view of where you really are - Project completeness how many features pass
their core functionality tests what are our
current FPS, network lag and bandwidth numbers,
23Input (data sets)
Repeatable tests in development, faster load,
edge conditions
?
Mock data
Unpredictable user element finds different bugs
?
Real data
24Input (client synchronization)
?
RemoteCommand (x)
Ordered actions to clients
?
waitFor (time)
Brittle, less reproducible
?
waitUntil (localStateChange)
Most realistic flexible
25Common Gotchas
- Not designing for testability
- Retrofitting is expensive
- Blowing the implementation
- Brittle code
- Addressing perceived needs, not real needs
- Use automated testing incorrectly
- Testing the wrong thing _at_ the wrong time
- Not integrating with your processes
- Poor testing methodology
26Testing the wrong time at the wrong time
Applying detailed testing while the game design
is still shifting and the code is still
incomplete introduces noise and the need to keep
re-writing tests
27More gotchas poor testing methodology tools
- Case 1 recorders
- Load regression were needed not understanding
maintenance cost - Case 2 completely invalid test procedures
- Distorted view of what really worked (GIGO)
- Case 3 poor implementation planning
- Limited usage (nature of tests led to high test
cost programming skill required) - Case 4 not adapting development processes
- Common theme no senior engineering analysis
committed to the testing problem
28Handout notes more gotchas
- Automating too late, or too much detail too early
- No ability to change the development process of
the game - Not having ways to measure the effects compared
to no automation - People and processes are funny things
- Sometimes the process is changed, and sometimes
your testing goals have to shift - Games differ a lot
- autoTest approaches will vary across games
29Handout notes BAT vs FAT
- Feature drift expensive test maintenance
- Code is built incrementally reporting failures
nobody is prepared to deal with yet wastes
everybodys time - Automated testing is a new tool, new concept
focus on a few areas first, then measure,
improve, iterate
30Automated Testing for Online Games
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications
- ? engineering, QA, operations
- ? production management
- Summary Questions
31Handout notes Applying automated testing
- Know what is automation good / not good at play
to its strengths - Change your processes around it
- Establish clear measures, iteratively improve
- Make sure everybody can use it has bought into
it - Tests become a form of communication
32Automated testing strengths
- Repeat massive numbers of simple, easily
measurable tasks - Mine the results
- Do all the above, in parallel, for rapid iteration
The difference between us and a computer is that
the computer is blindingly stupid, but it is
capable of being stupid many, many millions of
times a second. Douglas Adams (1997 SCO Forum)
33Handout notes autoTest complexity
- Automation breaks down as individual test
complexity increases - Repeating simple tests hundreds of times and
combining the results is far easier to maintain
and analyze than using long, complex tests, and
parallelism allows a dramatically accelerated
test cycle
34Semi-automated testing is best for game
development
Testing Requirements
Automation
- Rote work (does door108 still open?)
- Scale
- Repeatability
- Accuracy
- Parallelism
Integrate the two for best impact
35Handout notes Semi-automated testing
- Automation simple tasks (repetitive or
large-scale) - Load _at_ scale
- Workflow information management
- Regression
- All weapon damage / broad, shallow feature
coverage / - Integrated automated manual testing
- Tier 1 / Tier 2 automation flags potential
errors, manual investigates - Within a single test automation snapshots key
game states, manual evaluates results - Augmented / accelerated complex build steps,
full level play thru,
36Plan your attack with stakeholders(retire risk
early QA, Production, Management)
- Tough shipping requirements (e.g.)
- Scale, reliability
- Regression costs
- Development risk
- Cost / risk of engineering debugging
- Impact on content creation
- Management risk
- Schedule predictability visibility
37Handout notes plan your attack
- What are the big costs risks on your project?
- Technology development (e.g., scalable servers)
- Breadth of content to be regressed, frequency of
regressions - Your development team is significantly
handicapped without automated tests
multi-client support focus on production support
to start - Often, sufficient machines QA testers not
available - Run-time debugging of networked games often
becomes post-mortem debugging slower harder
38Factors to consider
Test applications
Test characteristics
Unit
Repeatable / random
Full system
Frequency of use
Sub system
Overlap w/other tests
Creation maintenance
Game logic
Execution
Graphics
Manual
39Handout notes design factors
- Test overlap code coverage
- Cost of running the test (graphics high,
logic/content low) vs frequency of test need - Cost of building the test vs manual cost (over
time) - Maintenance cost of the test suites, the test
system, churn rate of the game code
40Automation focus areas (Larrys top 5)
Performance
?
Scale is hard to get right
41Handout notes automation focus areas
(recommendations)
- Full system scale/stability testing
- Multi-client server code must always function,
or the team slows down - Hardest part to get right (and to debug) when
running live players - Scale will screw you, over and over again
- Non-determinism
- Difficultly in debugging slows development and
hurts system reliability - Content regression
- Build stability
- Complex systems large development teams require
extra care to keep running smoothly, or youll
pay the price in slower development and more
antacids - And for some systems, compatibility testing or
installer testing - A data-driven system is very important you can
cover all the above with one test system
42Yikes, that all sounds very expensive!
- Yes, but remember, the alternative costs are
higher and do not always work - Costs of QA for a 6 player game you need at
least 6 testers at the same time - Testers
- Consoles, TVs and disks network
- Non-determinism
- MMO regression costs yikes2
- 10s to 100s of testers
- 10 year code life cycle
- Constant release iterations
43Unstable builds are expensive slow down your
entire team!
Development
Checkin
Repeated cost of detection validation
Firefighting, not going forward
Build
Impact on others
Smoke
Regression
Dev Servers
44Stability keep the team working!(TSO use case
critical path analysis)
Test Case Can an Avatar Sit in a Chair?
use_object ()
Failures on the Critical Path block access to
much of the game
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
45Prevent critical path code breaks that take down
your team
Candidate code
Development
Safe code
Sniff Test
Pass / fail, diagnostics
Checkin
46Stability non-determinism (monkey tests)
Continual Repetition of Critical Path Unit Tests
47Handout notes build stability
- Poor build stability slows forward progress
(especially the critical path) - People are blocked from getting work done
- Uncertainty did I bust it, or did it just
happen? - A lot of developers just didnt get
non-determinism - Backsliding things kept breaking
- Monkey Tests always current baseline for
developers - Common measuring stick across builds
deployments extremely valuable - Monkey tests rock!
- Instant trip wire of problems focusing device
- Server aging fill the pipes, get some buffers
dirty - Keeps wheels in motion while developers use those
servers - Accurate measure of race condition bugs
48Build stability full testing comb filtering
Sniff Test, Monkey Tests - Fast to run -
Catch major errors - Keeps coders working
New code
- Cheap tests to catch gross errors early in the
pipeline - More expensive tests only run on known
functional builds
49Handout notes build stability
- Much faster progress after stability checkers
added - Sniff
- Hourly reference tests (sniff monkey, unit
monkey) - Comb filters kept the manpower overhead low (on
both sides), and gave quick feedback - Fewer redos for engs, fewer bugs for QA to
findprocess - Size of team gives high broken build cost
- Fewer side-effect bugs
50Handout notes dealing with stability
- Hourly stability checkers (monkey tests)
- Aging (dirty processes, growing datasets, leaking
memory) - Moving parts (race conditions)
- Stability measure what works, right now?
- Flares go off, etc
- Unit tests (against Features)
- Minimal noise / side effects
- Reference point what should work?
- Clarity in reporting / triaging
51Handout notes non-determinism is a big risk
factor in online development
- Race conditions, dirty buffers, shared state,
- Developers test with a single client against a
single server no chance to expose race
conditions - Fuzzy data views over networked connections
further complicates implementation debugging - Real-time debugging is replaced with post-mortem
analysis
52Handout notes the effects of non-determinism
- Multiple CPUs / players greatly complicates
development testing, while also increasing
system complexity - You cant reliably reproduce bugs
- Near-infinite code path coverage variable
latency transactions over time introduce
massive code complexity, very hard to get right - Also hard to test edge cases or broad coverage
- Each test can execute differently over any run
53AutoTest addresses non-determinism
- Detection reproduction of race condition
defects - Even low probability errors are exposed with
sufficient testing (random, structured, load,
aging) - Measurability of race condition defects
- Occurs x of the time, over 400x test runs
54Monkey test enterLot ()
55Monkey test 3 enterLot ()
56Four different behaviors in thirty runs!
57Handout notes non-deterministic failures
- 30 test runs, 4 behaviours
- Successful entry
- Hang or Crash
- Owner evicted, all possessions stolen
- Random results observed in all major features
- Critical Path random failures outside of Unit
Tests very difficult to track
58Content testing (areas)
- Regression
- Error detection
- Balancing / tuning
- This topic is a tutorial in and of itself
- Content regression is a huge cost problem
- Many ways to automate it (algorithmic, scripted
combined, ) - Differs wildly across game genres
59Content testing (more examples)
- Light mapping, shadow detection
- Asset correctness / sameness
- Compatibility testing
- Armor / damage
- Class balances
- Validating against old userData
- (unique to each game)
60Load testing, before paying customers show up
- Expose issues that only occur at scale
Establish hardware requirements
Establish play is acceptable _at_ scale
61Handout notes some examples of things caught
with load testing
- Non-scalable algorithms
- Server-side dirty buffers
- Race conditions
- Data bloat clogged pipes
- Poor end-user performance _at_ scale
- you never really know what, but something will
always go spang! _at_ scale
62Load testing catches non-scalable designs
Global data
(SP) all data is always available up to date
Scalability is hard shared data grows with
players, AI, objects, terrain, , more bugs!
63Handout notes why you need load testing
- SP all information is always available
- MP shared information must be packaged,
transmitted and unpackaged - Each step costs CPU bandwidth, and can happen
10s to 100s of times per minute - May also cause additional overhead (e.g. DB
calls) - Scalability is key many shared data structures
grow with the number of players, AI, objects,
terrain, - Caution early prototypes may be cheap enough,
but as game progresses, costs may explode
64Handout notes why you need load testing
- Case 1, initial design Transmit entire lotList
to all connected clients, every 30 seconds - Initial fielding no problem
- Development testing lt 1,000 Lots, lt 10 clients
- Complete disaster as clients DB scaled
- Shipping requirements 100,000 Lots, 4,000
clients - DO THE MATH BEFORE CODING
- LotElementSize LotListSize NumClients
- 20 Bytes 100,000 4,000
- 8,000,000,000 Bytes, TWICE per minute!!
65Load testing find poor resource utilization
22,000,000 DS Queries! 7,000 next highest
66Load test both client server behaviors
67Handout notes automated data mining / triage
- Test results Patterns of failures
- Bug rate to source file comparison
- Easy historical mining results comparison
- Triage debugging aids that extract RT data from
the game - Timeout crash handlers
- errorManagers
- Log parsers
- Scriptable verification conditions
68Automated Testing for Online Games(One Hour)
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications
- ? engineering, QA, operations
- ? production management
- Summary Questions
69Summary automated testing
- Start early make it easy to use
- Strongly impacts your success
- The bigger more complex your game, the more
automated testing you need - You need commitment across the team
- Engineering, QA, management, content creation
70QA other resources
- My email larry.mellon__at__emergent.net
- More material on automated testing for games
- http//www.maggotranch.com/mmp.html
- Last years online engineering slides
- This years slides
- Talks on automated testing scaling the
development process - www.amazon.com Massively Multiplayer Game
Development II - Chapters on automated testing and automated
metrics systems - www.gamasutra.com Dag Frommhold, Fabian Röken
- Lengthy article on applying automated testing in
games - Microsoft various groups writings
- From outside the gaming world
- Kent Beck anything on test-driven development
- http//www.martinfowler.com/articles/continuousInt
egration.htmlid108619 Continual integration
testing - Amazon Google inside outside our industry