Title: GDC 2005
1(No Transcript)
2GDC 2006 tutorial abstract Engineering issues
for online games
- As the size and scope of multiplayer games
continue to grow, the engineering requirements of
multiplayer development expand drastically.
Moreover, the lifecycle demands for successful
massively multiplayer games can involve more than
a decade of sustained development after launch. - This tutorial focuses on the software engineering
challenges of producing multiplayer games,
including single-player versus multi-player code,
testing and regressions, security in peer to peer
and server-oriented games, and protecting the
consumers in an increasingly dangerous online
environment. Common fail points of single-player
engineering tactics in a multi-player world are
addressed, and the longer term issues of building
and maintaining server-based games when downtime
means a direct and public loss of revenue and
market share. - This slide deck contains several background
slides, hidden in slide-show mode. Print for
complete data.
3Tutorial Takeaway Messages
- Building online games with single-player game
techniques is painful - Early changes to your development process
system architecture to accommodate online play
will greatly ease the pain - Lessons Learned from our background will
provide your project with ways to avoid
especially painful places weve found ourselves
4Todays Schedule
- 1000am ? Automated Testing for Online Games
- Larry Mellon, Emergent Game Technologies
- 1100am Coffee break
- 1115am ? Crash Course in Security What it is,
and why you need it - Dave Weinstein, Microsoft
- 1230pm Lunch Break
- 200pm ? Integrating reliability and
performance into the production - process How to survive when five nines
is a must - Neil Kirby, Bell Labs
- 300pm ? Single player woes for MP design
- Gordon Walton, Bioware (Austin)
- 400pm Snack break
- 415pm ? Building continually updated
technology the MMO lifecycle - Bill Dalton , Bioware (Austin)
- 530pm ? Questions War Stories (All
panelists) - Question to ponder for this session Inherited
problems. - What do you do once the bad decisions have
already been made, and youre the one who has to
deal with them? - 600pm End of tutorial
5Introduction
Why Online?
Our Focus
Non-determinism multi-process
?
Difficult to debug
?
Scale, reliability, long lifecycle,
Difficult to get right
6Automated testing supports your ability to deal
with all these problems
Development Operations
Automated testing
Multi-player testing
Find and reproduce hard bugs, _at_ scale
?
Scale repeatability
Speed of automation
Prediction, stability focus
?
Accurate, repeatable tests
7Handout notes automated testing benefits
- Multi-player inputs
- Expensive test cycles difficulty for both QA
engineers - Scale non-determinism
- Difficult bug reproduction long, risky
development schedules - Constantly evolving game play long-term
persistence - Large frequent regression tests
- High Quality of Service bar
- Alternatives are costly and less effective
8Automated Testing for Online Games(One Hour)
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications Gotchas
- ? engineering, QA, operations
- ? production management
- Summary Questions
9Big green autoTest button gives controlled
tests actionable results that helps across your
team
(1)
Repeatable tests, using N synchronized game
clients
Test Game
Programmer
Development Director
Executive
10Handout notes automated testing is a strong tool
for online games!
- Pushbutton, large-scale, repeatable tests
- Benefit
- Accurate, repeatable measurable tests during
development and operations - Stable software, faster, measurable progress
- Base key decisions on fact, not opinion
- Augment your teams ability to do their jobs,
find problems faster - Measure / change / measure repeat
- Increased developer efficiency is key
- Get the game out the door faster, higher
stability less pain
11Handout notes more benefits of automated testing
- Comfort and confidence level
- Managers/Producers can easily judge how
development is progressing - Just like bug count reports, test reports
indicate overall quality of current state of the
game - Frequent, repeatable tests show progress
backsliding - Investing developers in the test process helps
prevent QA vs. Development shouting matches - Smart developers like numbers and metrics just as
much as producers do - Making your goals you will ship cheaper,
better, sooner - Cheaper even though initial costs may be
higher, issues get exposed when its cheaper to
fix them (and developer efficiency increases) - Better robust code
- Sooner its ok to ship now is based on real
data, not supposition
12Automated testing accelerates online game
development helps predictability
Ship Date
Complete
Oops
autoTest
Time
Time
Target Launch
Project Start
13Measurable targets projected trends give you
actionable progress metrics, early enough to react
Target
Oops
Any test (e.g. clients)
Time
Any Time (e.g. Alpha)
14Success stories
- Many game teams work with automated testing
- EA, Microsoft, any MMO,
- Automated testing has many highly successful
applications outside of game development - Caveat there are many ways to fail
15How to succeed
- Plan for testing early
- Non-trivial system
- Architectural implications
- Fast, cheap test coverage is a major change in
production, be willing to adapt your processes - Make sure the entire team is on board
- Deeper integration leads to greater value
- Kearneyism make it easier to use than not to
use
16Automated testing components
Any Online Game
Startup Control
Test Manager Test Selection/Setup Control N
Clients RT probes
17Input systems for automated testing
scripted
algorithmic
recorders
Game code
Multiple test applications are required, but each
input type differs in value per application.
Scripting gives the best coverage.
18Handout notes Input systems for automated testing
- Multiple forms of input sources
- Multiple sets of test types requirements
- Make sure the input technology you pick matches
the test types you need - Cost of systems, types of testing required,
support, cross-team needs, - A single, data-driven autoTest system is the
usually the best option
19Handout notes Input sources (algorithmic)
- Powerful low development cost
- Exploits game semantics for test coverage
- Highly useful for some test types, but limited
verification - E.g. for each ltavatarTypegt CreateNewAvatar
- for each ltobjectCategorygt
- BuyAndPlaceAllObjects ltcurrentCategorygt
- for each ltObjectgt
- UseAllActions ltcurrentAvatargt, ltcurrentObjectgt
- Broad, shallow test of all object-based content
- Combine with automated errorManagers to increase
verification, and/or currentObject.selfTest()
20Handout notes Input (recorders)
- Internal event pump / external UI actions
- Both are brittle to maintain
- Neither can effectively support load or
multi-client synchronization, and are limited for
regression testing - Best use capturing defects that are hard to
reproduce, effective in overnight random testing
of builds some play testing - Semantic recorders are much less brittle and more
useful
21Input (Scripted Test Clients)
Pseudo-code script of users play the game, and
what the game should do in response
Command steps
createAvatar sam enterLevel 99 buyObject
knife attack opponent
Validation steps
checkAvatar sam exists checkLevel 99
loaded checkInventory knife checkDamage
opponent
22Handout notes scripted test clients
- Scripts are emulated play sessions just like
somebody plays the game - Command steps what the player does to the game
- Validation steps what the game should do in
response - Scripted clients are flexible powerful
- Use for many different test types
- Quick easy to write tests
- Easy for non-engineers to understand create
23Handout notes scripted test clients
- Scriptable test clients
- Lightweight subset of the shipping client
- Instrumented spits out lots of useful
information - Repeatable
- Embedded automated debugging support helps you
understand the test results - Log both server and client output (common
format), w/timestamps! - Automated metrics collection aggregation
- High level at a glance reports with detail
drill down - Build in support for hung clients triaging
failures
24Handout notes scripted test client
- Support costs one (data driven) client better
than N test systems - Tailorable validation output is a very powerful
construct - Each test script contains required validation
steps (flexible, tunable, ) - Minimize state to regress against fewer false
positives - Presentation layer tip build a spreadsheet of
key word/actions used by your manual testers,
automate the most common/expensives
25Scripted Players Implementation
Script Engine
Game GUI
Commands
Presentation Layer
Client-Side Game Logic
26Test-specific input output via a data-driven
test client gives maximum flexibility
Regression
Load
Reusable Scripts Data
Input API
Test Client
Output API
Key Game States
Pass/Fail Responsiveness
Script-Specific Logs Metrics
27A Presentation Layer is often unique to a game
- Some automation scripts should read just like QA
test scripts for your game - TSO examples
- routeAvatar, useObject
- buyLot, enterLot
- socialInteraction (makeFriends, chat, )
NullView Client
28Handout notes Scriptable tailorable for many
applications engineering, QA and management
- Unit testing 1 feature 1 script
- Load testing Representative play session, times
1,000s - Make sure your servers work, before the players
do - Integration test code changes for catastrophic
failures - Build stability quickly find problems and verify
the fix - Content testing exhaustive analysis of game play
to help tuning and ensure all assets are
correctly hooked up and explore edge cases - Multi-player testing engineers and QA can test
multi-player game code without requiring multiple
manual testers - Performance compatibility testing repeatable
tests across a broad range of hardware gives you
a precise view of where you really are - Project completeness how many features pass
their core functionality tests what are our
current FPS, network lag and bandwidth numbers,
29Input (data sets)
Repeatable tests in development, faster load,
edge conditions
?
Mock data
Unpredictable user element finds different bugs
?
Real data
30Input (client synchronization)
?
RemoteCommand (x)
Ordered actions to clients
?
waitFor (time)
Brittle, less reproducible
?
waitUntil (localStateChange)
Most realistic flexible
31Common Gotchas
- Not designing for testability
- Retrofitting is expensive
- Blowing the implementation
- Code blowout
- Addressing perceived needs, not real needs
- Use automated testing incorrectly
- Testing the wrong thing _at_ the wrong time
- Not integrating with your processes
- Poor testing methodology
32Testing the wrong time at the wrong time
Applying detailed testing while the game design
is still shifting and the code is still
incomplete introduces noise and the need to keep
re-writing tests
33More gotchas poor testing methodology tools
- Case 1 recorders
- Load regression were needed not understanding
maintenance cost - Case 2 completely invalid test procedures
- Distorted view of what really worked (GIGO)
- Case 3 poor implementation planning
- Limited usage (nature of tests led to high test
cost programming skill required) - Common theme limited or no senior engineering
committed to the testing problem
34Handout notes more gotchas
- Automating too late, or too much detail too early
- No ability to change the development process of
the game - Not having ways to measure the effects compared
to no automation - People and processes are funny things
- Sometimes the process is changed, and sometimes
your testing goals have to shift - Games differ a lot
- autoTest approaches will vary across games
35Handout notes BAT vs FAT
- Feature drift expensive test maintenance
- Code is built incrementally reporting failures
nobody is prepared to deal with yet wastes
everybodys time - New tools, new concept focus on a few areas
first, then measure, improve, iterate
36Automated Testing for Online Games(One Hour)
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications
- ? engineering, QA, operations
- ? production management
- Summary Questions
37Handout notes Applying automated testing
- Know what is automation good / not good at play
to its strengths - Change your processes around it
- Establish clear measures, iteratively improve
- Make sure everybody can use it has bought into
it - Tests become a form of communication
38The strength of automated testing is the ability
to repeat massive numbers of simple, easily
measurable tasks and mine results
The difference between us and a computer is that
the computer is blindingly stupid, but it is
capable of being stupid many, many millions of
times a second. Douglas Adams (1997 SCO Forum)
39Handout notes autoTest complexity
- Automation breaks down as individual test
complexity increases - Repeating simple tests hundreds of times and
combining the results is far easier to maintain
and analyze than using long, complex tests, and
parallelism allows a dramatically accelerated
test cycle
40Semi-automated testing is best for game
development
Testing Requirements
Automation
- Rote work (does door108 still open?)
- Scale
- Repeatability
- Accuracy
- Parallelism
Integrate the two for best impact
41Handout notes Semi-automated testing
- Automation simple tasks (repetitive or
large-scale) - Load _at_ scale
- Workflow information management
- Regression
- All weapon damage / broad, shallow feature
coverage / - Integrated automated manual testing
- Tier 1 / Tier 2 automation flags potential
errors, manual investigates - Within a single test automation snapshots key
game states, manual evaluates results - Augmented / accelerated complex build steps,
full level play thru,
42Plan your attack (retire risk early)
- Tough shipping requirements (e.g.)
- Scale, reliability
- Regression costs
- Development risk
- Cost / risk of engineering debugging
- Impact on content creation
- Management risk
- Schedule predictability visibility
43Handout notes plan your attack
- What are the big costs risks on your project?
- Technology development (e.g., scalable servers)
- Breadth of content to be regressed, frequency of
regressions - Your development team is significantly
handicapped without automated tests
multi-client support focus on production support
to start - Often, sufficient machines QA testers not
available - Run-time debugging of networked games often
becomes post-mortem debugging slower harder
44Factors to consider
Test applications
Test characteristics
Unit
Repeatable / random
Full system
Frequency of use
Sub system
Overlap w/other tests
Creation maintenance
Game logic
Execution
Graphics
Manual
45Handout notes design factors
- Test overlap code coverage
- Cost of running the test (graphics high,
logic/content low) vs frequency of test need - Cost of building the test vs manual cost (over
time) - Maintenance cost of the test suites, the test
system, churn rate of the game code
46Automation focus areas (Larrys top 5)
Load testing
?
Scale is hard to get right
47Handout notes automation focus areas
(recommendations)
- Full system scale/stability testing
- Multi-client server code must always function,
or the team slows down - Hardest part to get right (and to debug) when
running live players - Scale will screw you, over and over again
- Non-determinism
- Difficultly in debugging slows development and
hurts system reliability - Content regression
- Build stability
- Complex systems large development teams require
extra care to keep running smoothly, or youll
pay the price in slower development and more
antacids - And for some systems, compatibility testing or
installer testing - A data-driven system is very important you can
cover all the above with one test system
48Yikes, that all sounds very expensive!
- Yes, but remember, the alternative costs are
higher and do not always work - Costs of QA for a 6 player game you need at
least 6 testers at the same time - Testers
- Consoles, TVs and disks
- Network connections
- MMO regression costs yikes2
- 10s to 100s of testers
- 10 year code life cycle
- Constant release iterations
49Stability analysis (code servers)What brings
down the team?
Test Case Can an Avatar Sit in a Chair?
use_object ()
Failures on the Critical Path block access to
much of the game
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
50Unstable builds are expensive slow down your
entire team!
Development
Checkin
Repeated cost of detection validation
Firefighting, not going forward
Build
Impact on others
Smoke
Regression
Dev Servers
51Prevent critical path code breaks that take down
your team
Candidate code
Development
Safe code
Sniff Test
Pass / fail, diagnostics
Checkin
52Stability non-determinism (monkey tests)
Continual Repetition of Critical Path Unit Tests
53Handout notes build stability
- Poor build stability slows forward progress
(especially the critical path) - People are blocked from getting work done
- Uncertainty did I bust it, or did it just
happen? - A lot of developers just didnt get
non-determinism - Backsliding things kept breaking
- Monkey Tests always current baseline for
developers - Common measuring stick across builds
deployments extremely valuable - Monkey tests rock!
- Instant trip wire of problems focusing device
- Server aging fill the pipes, get some buffers
dirty - Keeps wheels in motion while developers use those
servers - Accurate measure of race condition bugs
54Build stability full testing comb filtering
Sniff Test, Monkey Tests - Fast to run -
Catch major errors - Keeps coders working
New code
- Cheap tests to catch gross errors early in the
pipeline - More expensive tests only run on known
functional builds
55Handout notes build stability
- Much faster progress after stability checkers
added - Sniff
- Hourly reference tests (sniff monkey, unit
monkey) - Comb filters kept the manpower overhead low (on
both sides), and gave quick feedback - Fewer redos for engs, fewer bugs for QA to
findprocess - Size of team gives high broken build cost
- Fewer side-effect bugs
56Handout notes dealing with stability
- Hourly stability checkers (monkey tests)
- Aging (dirty processes, growing datasets, leaking
memory) - Moving parts (race conditions)
- Stability measure what works, right now?
- Flares go off, etc
- Unit tests (against Features)
- Minimal noise / side effects
- Reference point what should work?
- Clarity in reporting / triaging
57Handout notes event ordering and poor
transactional atomicity increase both coding
errors and the difficulty in reproducing them
Near-endless (and illogical) orderings are
possible
Client A
Client B
A1,A2,B1,B2 A1,B1,B2,A2 B1,B2,A1,A2 A1,A2,B1,B2 A
2,A1,B1,B2 B2,A2,B1,A1
A.2
B.2
A.1
B.1
Server Thread
58Handout notes non-determinism is a big risk
factor in online development
- Race conditions, dirty buffers, shared state,
- Developers test with a single client against a
single server no chance to expose race
conditions - Fuzzy data views over networked connections
further complicates implementation debugging - Real-time debugging is replaced with post-mortem
analysis
59Handout notes the effects of non-determinism
- Multiple CPUs / players greatly complicates
development testing, while also increasing
system complexity - You cant reliably reproduce bugs
- Near-infinite code path coverage variable
latency transactions over time introduce
massive code complexity, very hard to get right - Also hard to test edge cases or broad coverage
- Each test can execute differently over any run
60AutoTest addresses non-determinism
- Detection reproduction of race condition
defects - Even low probability errors are exposed with
sufficient testing (random, structured, load,
aging) - Measurability of race condition defects
- Occurs x of the time, over 400x test runs
61Monkey test enterLot ()
62Monkey test 3 enterLot ()
63Four different failures in thirty runs!
64Handout notes non-deterministic failures
- 30 test runs, 4 behaviours
- Successful entry
- Hang or Crash
- Owner evicted, all possessions stolen
- Random results observed in all major features
- Critical Path random failures outside of Unit
Tests very difficult to track
65Content testing (areas)
- Regression
- Error detection
- Balancing / tuning
- This topic is a tutorial in and of itself
- Content regression is a huge cost problem
- Many ways to automate it (algorithmic, scripted
combined, ) - Differs wildly across game genres
66Content testing (more examples)
- Light mapping, shadow detection
- Asset correctness / sameness
- Compatibility testing
- Armor / damage
- Class balances
- Validating against old userData
- (unique to each game)
67Load testing, before paying customers show up
- Expose issues that only occur at scale
Establish hardware requirements
Establish play is acceptable _at_ scale
68Handout notes some examples of things caught
with load testing
- Non-scalable algorithms
- Server-side dirty buffers
- Race conditions
- Data bloat clogged pipes
- Poor end-user performance _at_ scale
- you never really know what, but something will
always go spang! _at_ scale
69Load testing catches non-scalable designs
Global data
(SP) all data is always available up to date
Scalability is hard shared data grows with
players, AI, objects, terrain, , more bugs!
70Handout notes why you need load testing
- SP all information is always available
- MP shared information must be packaged,
transmitted and unpackaged - Each step costs CPU bandwidth, and can happen
10s to 100s of times per minute - May also cause additional overhead (e.g. DB
calls) - Scalability is key many shared data structures
grow with the number of players, AI, objects,
terrain, - Caution early prototypes may be cheap enough,
but as game progresses, costs may explode
71Handout notes why you need load testing
- Case 1, initial design Transmit entire lotList
to all connected clients, every 30 seconds - Initial fielding no problem
- Development testing lt 1,000 Lots, lt 10 clients
- Complete disaster as clients DB scaled
- Shipping requirements 100,000 Lots, 4,000
clients - DO THE MATH BEFORE CODING
- LotElementSize LotListSize NumClients
- 20 Bytes 100,000 4,000
- 8,000,000,000 Bytes, TWICE per minute!!
72Load testing find poor resource utilization
22,000,000 DS Queries! 7,000 next highest
73Load test both client server behaviors
74Handout notes automated data mining / triage
- Test results Patterns of failures
- Bug rate to source file comparison
- Easy historical mining results comparison
- Triage debugging aids that extract RT data from
the game - Timeout crash handlers
- errorManagers
- Log parsers
- Scriptable verification conditions
75Automated Testing for Online Games(One Hour)
- Overview
- Hooking up your game
- ? external tools
- ? internal game changes
- Applications
- ? engineering, QA, operations
- ? production management
- Summary Questions
76Summary automated testing
- Start early make it easy to use
- Strongly impacts your success
- The bigger more complex your game, the more
automated testing you need - You need commitment across the team
- Engineering, QA, management, content creation
77Resources
- Slides are on the web at www.emergent.net
- My email larry.mellon__at__emergent.net
- More material on automated testing for games
- http//www.maggotranch.com/mmp.html
- Last years online engineering slides
- Talks on automated testing scaling the
development process - www.amazon.com Massively Multiplayer Game
Development II - Chapters on automated testing and automated
metrics systems - www.gamasutra.com Dag Frommhold, Fabian Röken
- Lengthy article on applying automated testing in
games - Microsoft various groups writings
- From outside the gaming world
- Kent Beck anything on test-driven development
- http//www.martinfowler.com/articles/continuousInt
egration.htmlid108619 Continual integration
testing - Amazon Google inside outside our industry